Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After reboot, pressure data is missing for several seconds #28

Open
kedder opened this issue Nov 2, 2020 · 24 comments
Open

After reboot, pressure data is missing for several seconds #28

kedder opened this issue Nov 2, 2020 · 24 comments

Comments

@kedder
Copy link
Member

kedder commented Nov 2, 2020

Apparently #25 introduced a regression: after the boot, sensord does not emit any actual pressure or vario data to the NMEA stream for several seconds. Sometimes the period is quite long - I've seen 20 or 30 seconds.

NMEA stream consist of repeated sentences like

$POV,E,+0.00*39
$POV,P,+994.33,Q,+0.00*4C

This is evident by looking at vario needle in xcsoar: it is frozen at 0 m/s for several seconds after the boot. Reverting 6fa4005 fixes the issue: the pressure/vario data starts coming immediately.

This is also reported in Openvario/meta-openvario#107

@hpmax, can you have a look?

@kedder kedder assigned kedder and unassigned kedder Nov 2, 2020
@hpmax
Copy link
Contributor

hpmax commented Nov 2, 2020

@kedder Sure... Can you or @Blaubart provide more detail:

  1. Does the delay always happen?
  2. How long is the delay? Is it consistent or does it vary? Min/Max/Average?
  3. Are those sentences repeated every time? Or is E always 0 and P some value that stays constant during the delay period? Is Q value constantly 0?
  4. Does it always go back to normal?

Any additional helpful details? How reproducible?

Honestly, I have a new version that I'm getting very close to releasing. I can probably release it tomorrow. I'm not really interested in debugging the current version at this point. So, I might prefer to make the new release and then if the problem still exists try to debug it.

@kedder
Copy link
Member Author

kedder commented Nov 2, 2020

  1. Yes, it happens consistently, every time
  2. I just did 3 reboots and delay was 33, 27 and 40 seconds.
  3. $POV,P differs slightly between reboots, but while the delay happens, the value is always the same. $POV,E is always 0.00.
  4. For me it has always started working normally after the initial delay after the reboot

It is easy to reproduce:

  1. build the latest image,
  2. boot, observe xcsoar vario needle stuck, despite the input to the sensors (I'm pressing on the Ptek hose nozzle with my finger).

With 6fa4005 reverted, sensors become responsive immediately.

@hpmax
Copy link
Contributor

hpmax commented Nov 2, 2020

Okay... If for some reason the pressure readings were constant, that might explain why the vario is zero. I may have increased the startup time, but I wouldn't expect it to be outputting anything during that time.

What about the pitot/airspeed reading? Is that 0 the whole time (did you try blowing into the pitot to force it to read something?

@hpmax
Copy link
Contributor

hpmax commented Nov 2, 2020

I think I may know the cause and think it's been corrected in the new version. There's an "if (glitch<2)" in the pressure_measurement_handler(). When I was looking at the code during the latest (not yet released) update, I couldn't figure out why it was there and removed it. It could definitely suppress the output on start up.

@hpmax
Copy link
Contributor

hpmax commented Nov 3, 2020

I have pushed a new version into hpmax/sensord. Please try it and see if it fixes it and if everything seems to be working. Also, please try "compdata"... Usage is: "compdata -s -c /opt/conf/sensord.conf" It'll ask how many thousands of data points. It'll take about one minute per thousand data points, so start off with a few to see how it goes. I am now suspecting there is a temperature and/or pressure dependent on the compensation. I'm going to experiment with that to try to improve things in the next week or two.

@kedder
Copy link
Member Author

kedder commented Nov 3, 2020

@hpmax the version in master branch of https://github.com/hpmax/sensord does not build cleanly:

# make
mkdir -p obj
cc -DVERSION_GIT=\"0.3.3-30-g9677f99\" -c -o obj/ms5611.o ms5611.c -g -Wall
mkdir -p obj
cc -DVERSION_GIT=\"0.3.3-30-g9677f99\" -c -o obj/main.o main.c -g -Wall
main.c: In function 'main':
main.c:428:6: warning: unused variable 'j' [-Wunused-variable]
  int j=0;
      ^
mkdir -p obj
cc -DVERSION_GIT=\"0.3.3-30-g9677f99\" -c -o obj/nmea.o nmea.c -g -Wall
mkdir -p obj
cc -DVERSION_GIT=\"0.3.3-30-g9677f99\" -c -o obj/cmdline_parser.o cmdline_parser.c -g -Wall
mkdir -p obj
cc -DVERSION_GIT=\"0.3.3-30-g9677f99\" -c -o obj/configfile_parser.o configfile_parser.c -g -Wall
cc -g -o sensord obj/ms5611.o obj/ams5915.o obj/ads1110.o obj/main.o obj/nmea.o obj/timer.o obj/KalmanFilter1d.o obj/cmdline_parser.o obj/configfile_parser.o obj/vario.o obj/AirDensity.o obj/24c16.o -lrt -lm
make: *** No rule to make target 'cal', needed by 'all'.  Stop.

@kedder
Copy link
Member Author

kedder commented Nov 3, 2020

The sensord binary gets built though. with it XCsoar shows the vario movement immediately after the boot, however the vario needle does several sudden jumps between -1 and +1 in the first 30 seconds or so. After that it behaves normal (FWICT).

@hpmax do you have a way to test your version of sensord?

@hpmax
Copy link
Contributor

hpmax commented Nov 3, 2020

The complaint about int j=0; doesn't bother me much -- just an extraneous variable declaration.

I'm not sure what to make of the can't make target cal (I presume that's sensorcal).

I don't know what to say about that. When analyzing data, I typically ignore the first 1000 samples (which should roughly correlate to 20 seconds). There is a lot of ugliness when the thing first starts running related to the glitch behavior. I.e. it has to "warm up" and it takes some time after you establish a constant polling rate before the ms5611s stabilize. Also, for reasons that I still don't understand, running it in the foreground does give better results. The standard deviation of the data is lower. Not a lot lower, but it is consistently lower.

It is possible that without having a stabilized value, the anti-glitch compensation is actually making things worse during startup. The glitch compensation is pretty straightforward but knowing when to turn it on and off is a big deal, because when it's turn on the D2 filter shuts off, which means if the D2 drifts too far from the filtered value it may never shut off. The new code (the one you just tried) will shut off automatically after some period of time even if it doesn't think the glitch is over. There's really no perfect solution here, but I can try to improve the startup behavior.

Right now, my audio amplifier broke... again. So I'm going to have to pull it out of the glider to fix it, but I'm working on a test fixture that I can run it in my house.

@hpmax
Copy link
Contributor

hpmax commented Nov 3, 2020

I pulled my OV out of the glider and have it wired up to the power supply so I can now operate it in doors.

With OV running XCSoar, I logged into the device, killed sensord and restarted 5 times, and each time I saw the behavior:

  1. Sat there for a few seconds.
  2. Dropped to -1 m/s over a period of maybe 1-2 seconds.
  3. Went back to 0 over a period of maybe 3-4 seconds.
  4. Jumped between 0 and -0 repeatedly, with occasionally blips to .1 m/s.

Pretty sure I'm using the code I gave you.

Am I missing something?

@Blaubart
Copy link

Sorry, I'm not a programmer and not able to understand everything you are talking about. Is the problem solved?

Thanks!
Dirk

@hpmax
Copy link
Contributor

hpmax commented Nov 10, 2020

As stated, I am unable to replicate the problem in hpmax/sensord. I am working on a new version to try to deal with the fact that glitch compensation is pressure dependent -- once complete, I will do a PR and hopefully get everything caught up.

@Blaubart
Copy link

thanks

@kedder
Copy link
Member Author

kedder commented Nov 11, 2020

@hpmax, can you replicate the problem with sensord/master (e1ecf57) though?

@Blaubart
Copy link

I compiled now including commit d85a7a6 and sensord works perfect

@hpmax
Copy link
Contributor

hpmax commented Dec 31, 2020

Sorry, I basically dropped off here. Been trying to deal with other issues, mainly the audio. I no longer have any confidence in Stefan's audio amplifier. I've built my own which I think will be more reliable. The new board being layout out by DanD222 should take advantage of what I've learned and be pretty solid, particularly if an 8 ohm speaker is used.

What is the current status of the hpmax/sensord code? Are we still having the problem or do I need to look into this more?

@Blaubart
Copy link

Blaubart commented Dec 31, 2020

the error still exist. Sensord dosen't send data during first couple of seconds. Sometimes it takes more than 2 minutes.

@Blaubart
Copy link

Stefan used the AS1701. I made good experiences with PAM8302. I use that one for the FreeVario

@hpmax
Copy link
Contributor

hpmax commented Dec 31, 2020

Stefan does not use the AS1701, he uses the MAX9718 (he may have previously used the AS1701, they are pin compatible), and I wouldn't trust either the AS1701 or the MAX9718 at this point. I've killed two MAX9718s and an AS1701 at this point. The new design uses the TPA6211. From a thermal perspective, PAM8302 DFN package looks decent, the others seem more questionable. Unfortunately, DFN is obsolete according to Digikey.

I can't find EMI information in the datasheet. I could have sworn I had seen some information showing it was pretty bad. Stefan told me that the Adafruit PAM8302 was a drop in replacement for his audio amp, but that it had EMI issues in some installations.

Personally, I wish there was a MAX98304 with a DFN package, I'd buy that in a second.

@hpmax
Copy link
Contributor

hpmax commented Jan 1, 2021

Okay, I just committed a new release to hpmax/sensord. I restarted at least 10 times, each time I got data out within 2-3 seconds. It's worth pointing out that because of the hardware design it necessarily will take some time for the sensor readings to be valid. Once data comes out, it can take another 4-5 seconds before the Kalman filter stabilizes

This version has a switch, that can be used to deliberately inject timing glitches and provides more data in the debug output when you engage that. I've also added quadratic pressure compensation since the glitch compensation is pressure dependent and I didn't realize that in earlier versions.

I still need to do more research on the pressure based compensation and compdata. it's quite possible there will be more changes. But frankly, I'm getting sick of this... Hopefully the new hardware will be available soon.

Please try it and let me know. Once I've gotten good reports, and I verify compdata.c, etc I'll go for another pull request.

@hpmax
Copy link
Contributor

hpmax commented Jan 1, 2021

Looks like everything is in better shape than I remembered it being. All the hooks for quadratic pressure compensation are in it.

Let me give a more thorough explanation of what's going on here in case you haven't been paying attention to everything up to this point:

If the static/TE sensors are not polled at a constant rate, glitches will occur. (this is a hardware issue with the MS5611 sensor) The gltiches are ugly and will be seen in the vario. The timing glitches can be mitigated by running sensord (and variod and pulseaudio) from the commandline or (I think) in forking mode -- but not completely eliminated.

The MS5611 has both a pressure and a temperature (die, not air) sensor in it. Under normal use, we alternate taking a pressure and temperature reading and the temperature readings are used in accordance with the MS5611 datasheet to compensate the pressure readings for temperature. At any given time, we are sampling the pressure on one sensor and the temperature on the other. It has been observed that a timing glitch will result in a change of deltaT on one sensor and a change of deltaP on the other sensor, where deltaP ~ deltaT * K. If we filter pressure (and temperature) we can determine deltaP = (Filtered Pressure - Current pressure) and deltaT = (Filtered Temp - Current Temp). So we take Corrected Pressure = Current Pressure + (Filtereed Temp - Current Temp) * K. It should be noted that the Pressure and Temp in this case are from alternate sensors and there are two different K's (one for dTE_temp, and one for dStatic_temp). In actual implementation K is K0, K1, K2, such that we have dt^2 * K2 + dt * K1 + K0, allowing for quadratic term, although it's not a major contributor in all but the worst cases. These terms are stored in the sensord.conf file as: "static_comp", and "tek_comp" (see the file).

The correction is pretty easy to implement. Knowing when a glitch occurs is pretty easy since we can detect the timing glitch. But determining when the the glitch is over is harder than it sounds.

After doing all this, I decided to hook up a vacuum pump and brought all the sensors (if you want to try this -- make sure to connect ALL of the pressure connections to the vacuum source, including the pitot/dynamic) down to .4 atmospheres, and realized the correction factor was pretty far off. My plumbing slowly leaked, and so over the course of a couple hours I had data over a wide pressure range. I was able to determine quadratic coefficients (called "Pcomp") to compensate the static_comp and tek_comp. i.e.

deltaTEPressure = (TE_Pressure^2TE_Pcomp2+TE_PressureTE_Pcomp1+TE_Pcomp0) * (static_comp2dTstatic^2+static_comp1dTstatic+static_comp0)
deltastaticPressure = (static_pressure^2static_Pcomp2 + static_pressurestatic_Pcomp1 + static_Pcomp0) * (tek_comp2dTte^2 + tek_comp1dTte+tek_comp0)

Pcomp2 does very little, but is included, because... it's easy.

The comp values will ~10 percent from sensor to sensor, so it's important to get correct values. Pcomp probably varies a few percent too, but remember, this is a few percent of a few percent. So figuring out exact values is less critical. It's also a pain because you'd need to rig up a vacuum pump and leaky "manifold" and give it a few hours. Without it, I just use Pcomp values that looked good on my OV.

compdata will generate a full set of static_comp and tek_comp, and if run as: "compdata -c /opt/conf/sensord.conf" will read out all the necessary values from the config file and install the newly calculated ones. A typical run looks like:

This program calculates compenation data to compensate for timing glitches on the static and tek sensors.
The pressure sensors are sensitive to air movement.
For best results, ensure the air is as still as possible when taking measurements.
How many thousand data points? 0003
Collecting 3000 data points.
0 points collected.
1000 points collected.
2000 points collected.
3000 points collected.
Total data points (the more the better): static: 3000, tek: 2973
GOOD Total RMS Error (lower is better, ideally below 80): static: 52.144119, tek: 51.931074
GOOD Mean Error (This should be very close to zero): static 0.000000, tek: -0.000000
GOOD Std Deviation of Error (should be RMS error or slightly less): static: 52.144119, tek: 51.931074
GOOD No overruns/underruns errors found during glitches.
static_comp -0.0000016212 -0.2540718204 -38.0683168928
tek_comp -0.0000004225 -0.2863214788 -24.7390261638
Empirical testing seems to show:
GOOD 2.5e-6 > quadrature term (-1.621e-06 and -4.225e-07) > -2.5e-6
BAD -0.28 > linear term (-0.25407 and -0.28632) > -0.35
BAD 0 > constant term (-38.06832 and -24.73903) > -10

The number of data points can be entered either with a regular keyboard or using the stick/rotary knobs.

Total RMS error of 50-60 is good, 80 is likely the limit of what I'd expect.
Mean Error should essentially always be 0.
Std Deviation should be the same as Total RMS error (and would only be different if mean error wasn't 0)
Overrun or underrun glitches are bad. This generally indicates the glitches were substantially different from what they were expected and may indicate unreliable data.

If any of these are outside expectations, I'd re-run.
The latter results are about the actual static_comp an tek_comp values. If they are a little (which is deliberately vague) out, I wouldn't worry too much if the all the other stuff is good. It's worth noting that you can do two back to back runs, and get significantly different coefficients, and both results may still be okay.

And hopefully later this year, all this nightmarish nonsense will go away with the new board.

@Blaubart
Copy link

Blaubart commented Jan 1, 2021

Thanks for your work!! I like to test hpmax/sensord and changed the following line at meta-openvario/recipes-apps/sensord/sensord-testing_git.bb
SRC_URI = "git://github.com/Openvario/sensord.git;protocol=git;branch=master
to
SRC_URI = "git://github.com/hpmax/sensord.git;protocol=git;branch=master \

But it is not so easy ;-)
Can you give me a short instruction how to compile?

@hpmax
Copy link
Contributor

hpmax commented Jan 1, 2021

git clone https://github.com/hpmax/sensord
cp sensord/* <poky_root>/poky/build/tmp/work/armv7vet2hf-neon-ovlinux-linux-gnueabi/sensord/0.3.4-r0/git/.

cd <poky_root>
repo init -u git://github.com/Openvario/ovlinux-manifest.git -b warrior
repo sync
docker run -it --rm -v $(pwd):/workdir linuxianer99/ovbuild --workdir=/workdir
cd poky
TEMPLATECONF=meta-openvario/conf source oe-init-build-env
export MACHINE=openvario-7-CH070
bitbake sensord -c devshell
make

I can push the compiled code onto github if you'd like -- note that for some reason my executables are much larger, I suspect I am linking additional unnecessary debug stuff into the executable but it doesn't affect anything.

I also just finished a version that supports the DS18B20 temp sensor. Haven't pushed it onto github or tested it, but it compiles, and it should work.

@hpmax
Copy link
Contributor

hpmax commented Jan 2, 2021

  1. I pushed the binaries into hpmax/sensord.

  2. I noticed a very minor issue with hpmax/variod (which will also be in openvario/variod) that will cause it to report incorrect deadband values if run in foreground mode -- I did a printf ("%d") instead of ("%f"), it does not affect any actual operation.

  3. I have tested my sensord variant with the temperature sensor code added. I don't have my temperature sensor attached, so it simply reports a failed temperature sensor but otherwise no other differences are apparent. I may push the variant. With the OUTPUT_POV_T disabled, there should be very low risk -- but can't guarantee it will behave correctly with the temperature sensor.

@Blaubart
Copy link

Blaubart commented Jan 4, 2021

I put your binaries to /opt/bin and it works perfectly!! Thanks. I think you can make a PR.
I testet the temperature sensor, but navbox OAT has no value. Did I have to configure something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants