Implemented Kahan summation algorithm for adding process noise #601

RomanBapst · 2019-05-08T11:43:24Z

Each time there is a covariance matrix update we add process noise to the variance of each state.
For the delta angle and the delta velocity bias variances these contributions can be so small compared to the actual variance that they get lost due to numerical limitations when dealing with floating point variables.

One solution for this problem is to use the Kahan summation algorithm which relies on an accumulator variable which captures the small increments that would get lost otherwise.

For more details, see https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Thanks to @priseborough for helping debug this issue and thanks to @jkflying for suggesting the algorithm.

Delta velocity z bias variance prior to this change:

Delta velocity z bias variance after this change:

I also compared the results of this PR against the previous code when the delta velocity bias noise was set very high (0.3 m/s/s/s) in order to see if both implementations lead to similar variances. The figure below shows that this was the case.

This is a replay log file from an S500 which has been used for validation:
https://logs.px4.io/plot_app?log=7ebfbbc0-d9ed-4e4f-92f0-e9ab73f0016f

CarlOlsson · 2019-05-08T11:47:45Z

Cool! Does this solve a "real" issue you experienced in flight?

RomanBapst · 2019-05-08T13:13:46Z

@CarlOlsson This all started with a log that revealed a covariance matrix instability from a quadcopter.
(accel z bias grew to the limits which triggered the reset).
After inspecting the variances we noticed that the delta velocity variance was pulled to its lower bound in a very short time and was stuck there for the remainder of the flight. I then talked to Paul who told me that he once tuned the process noise so that the variance would be pulled to the lower bound but not as hard as what we have currently (variance should be "bouncing off the limit"). Sorry for the loose formulations :)
One reason why the tuning is currently off could be the fact that we fuse baro at about 50Hz while we used to fuse it at a much lower rate. (That is related to the min_observation_interval definition)
So we started increasing the delta velocity bias process noise but noticed that it hardly had an effect on the actual state variance. In the end we figured out that it was a numerical issue and thus this PR.

We still need to go through the process noise tuning though.

jkflying

Looks good. Maybe it would also be good to add a little comment about why Kahan is used on some variables, and not others? Just to make it easier in the future if we want to add more states to the EKF or something.

EKF/covariance.cpp

EKF/ekf.h

RomanBapst · 2019-05-09T05:25:06Z

@jkflying @priseborough I addressed your comments.

EKF/covariance.cpp

priseborough · 2019-05-14T09:35:24Z

There are no issues with the code, however I am hesitant to merge this so close to a major release because it requires a change to the default parameter in ekf2 module which will need flight hours on different platforms to validate the new process noise value.

priseborough · 2019-05-23T09:41:18Z

PX4/Firmware v1.9.0-rc2 has been tagged and from now on and i have been told it will now only update with bug fixes, so we can now carry on with getting a sensible value for the noise variance, include that default with this PR and also the corresponding one on the firmware side. I will do some replay testing tomorrow.

priseborough · 2019-05-24T01:52:54Z

My testing on replay is showing that the kahanSummation function is not working as expected and the _delta_vel_bias_var_accum variable is always returning as zero.

See debug output added here: https://github.com/priseborough/ecl/tree/pr601

Replay with with EKF2_ACC_B_NOISE = 0.0006 gives:

At a larger values of EKF2_ACC_B_NOISE = 0.06 that just raise the variance off the lower limit I get the same variance before and after the PR changes are applied, eg:

Before:

After:

bkueng · 2019-05-24T06:06:39Z

@priseborough in order to ensure the compiler does not optimize the algorithm away, we'll need to disable -funsafe-math-optimizations (in particular -fassociative-math, which is set by -funsafe-math-optimizations). It's here: https://github.com/PX4/Firmware/blob/master/cmake/px4_add_common_flags.cmake#L54

jkflying · 2019-05-24T10:26:27Z

@bkueng Maybe another way would be to declare the accumulator volatile, or to add __attribute__ ((optimize("no-associative-math"))) above the Kahan function declaration?

bkueng · 2019-05-24T11:09:46Z

That's also fine with me, but I think we need to be very careful with enabling flags like -fassociative-math on a global scope.

jkflying · 2019-05-24T11:30:25Z

It was probably added to reduce flash size, I know that often things like matrix calculations get enormously reduced using these flags. But there might be other cases where this is causing an issue, so yeah, we really need to be careful.

jkflying

Didn't realise -funsafe-math-optimizations was applied, it will remove the Kahan summation (since it assumes ideal numbers, not floating point).

Either of these two changes will fix it (top one, or the two bottom ones)

EKF/ekf_helper.cpp

priseborough · 2019-05-28T09:06:37Z

I will try the method using volatiles in replay.

priseborough · 2019-05-28T11:09:41Z

I have confirmed on replay that making the t and y floats volatiles fixes the problem reported.

Tomorrow I will have another look at the tuning of the bias process noise with the Kahan summation changes.

dagar · 2019-05-28T14:16:06Z

I'd rather explicitly disable the optimization with the attribute than use a volatile.

Yet another option would be moving Ekf::kahanSummation to a dedicated compilation unit and removing the optimization flag at the cmake level .

jkflying · 2019-05-28T14:58:59Z

IMO the optimization flag itself is dangerous, but if it reduces binary size a lot and potentially increases performance I understand, even if it might be problematic here.

The attribute won't work on clang builds, and there is no equivalent way of doing it inline like that for clang. I really don't want there to be different behavior in something as fundamental as the ECL based on compiler used. It would be impossible to debug.

The CMake and different compilation unit will work, but then the kahan summation is scattered across a few different files, and it seems easy for a change to be made in the build system (CMake include orders, for example) which would silently break this. We don't have unit tests to make sure it is working, and if we did they would need to be in Release builds with all optimizations on for me to really trust them.

volatile is ugly from a "that's not what volatile is meant for" perspective, but seems to me the most likely to be robust, least effort, and won't silently break in the future.

I'm a bit torn here. If we aren't planning to support clang, then the attribute is better, but then we need to make it clear that clang is broken.

dagar · 2019-05-28T15:53:30Z

I'm not opposed to dropping -fassociative-math either across the entire ecl/EKF or PX4-wide for that matter. The flash savings are minor and so far I'm not seeing any performance differences on the bench (obviously needs significantly more testing).

Relevant GCC math optimizations

-funsafe-math-optimizations # Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math
- -fno-signed-zeros # Allow optimizations for floating-point arithmetic that ignore the signedness of zero
- -fno-trapping-math # Compile code assuming that floating-point operations cannot generate user-visible traps
- -fassociative-math # Allow re-association of operands in series of floating-point operations
- -freciprocal-math # Allow the reciprocal of a value to be used instead of dividing by the value if this enables optimizations
-fno-math-errno # Do not set errno after calling math functions that are executed with a single instruction, e.g., sqrt

LorenzMeier · 2019-05-28T15:55:08Z

Yes, sounds like we should be dropping those in general.

- includes ECL #601: Implemented Kahan summation algorithm for adding process noise - PX4/PX4-ECL#601

dagar · 2019-05-28T15:57:57Z

Here's a PX4/Firmware side PR for testing. PX4/PX4-Autopilot#12096

jkflying · 2019-05-29T11:14:31Z

Ok, so how about we use the volatile as a temporary workaround, and once we've updated the flags in the firmware we can get rid of the volatile.

dagar · 2019-05-29T13:25:05Z

Ok, so how about we use the volatile as a temporary workaround, and once we've updated the flags in the firmware we can get rid of the volatile.

That's fine as long as we don't completely move on and forget about PX4/PX4-Autopilot#12096.

Here's the cmake level solution. https://github.com/PX4/ecl/blob/e4c31c30f746205a973dfae7aaef8dc5c3429d0c/EKF/CMakeLists.txt#L58

jkflying · 2019-05-29T13:32:24Z

Here's the cmake level solution.

https://github.com/PX4/ecl/blob/e4c31c30f746205a973dfae7aaef8dc5c3429d0c/EKF/CMakeLists.txt#L58

I think you mean -fno-associative-math?

dagar · 2019-05-29T14:42:19Z

Here's the cmake level solution.
https://github.com/PX4/ecl/blob/e4c31c30f746205a973dfae7aaef8dc5c3429d0c/EKF/CMakeLists.txt#L58

I think you mean -fno-associative-math?

Yep 😏. It's just an example of how additional flags can be set per source file.

…-math-optimizations) - includes ECL #601: Implemented Kahan summation algorithm for adding process noise - PX4/PX4-ECL#601

priseborough · 2019-06-04T09:13:02Z

So we wait for the firmware change, ensure there are no unexpected effects, then merge this. My limited replay testing so far has shown that the exisiting tuning parameter is OK to use, but that was one flight log from one vehicle.

angle- and delta velocity bias variance - the contribution of process noise per iteration for these states can be so small that it gets lost if using standard floating point summation Signed-off-by: Roman <bapstroman@gmail.com>

jkflying

Compiler optimization issue fixed in Firmware

RomanBapst requested review from priseborough and jkflying May 8, 2019 11:43

dagar added ekf enhancement labels May 8, 2019

jkflying suggested changes May 8, 2019

View reviewed changes

EKF/covariance.cpp Outdated Show resolved Hide resolved

priseborough suggested changes May 9, 2019

View reviewed changes

EKF/ekf.h Show resolved Hide resolved

RomanBapst force-pushed the kahanSum branch from f3211bd to ab8d938 Compare May 9, 2019 05:24

jkflying previously approved these changes May 9, 2019

View reviewed changes

CarlOlsson reviewed May 13, 2019

View reviewed changes

EKF/covariance.cpp Outdated Show resolved Hide resolved

priseborough previously approved these changes May 14, 2019

View reviewed changes

RomanBapst dismissed stale reviews from priseborough and jkflying via 0f8b0ec May 14, 2019 09:33

RomanBapst force-pushed the kahanSum branch from ab8d938 to 0f8b0ec Compare May 14, 2019 09:33

jkflying suggested changes May 24, 2019

View reviewed changes

EKF/ekf_helper.cpp Outdated Show resolved Hide resolved

EKF/ekf_helper.cpp Show resolved Hide resolved

EKF/ekf_helper.cpp Show resolved Hide resolved

dagar added a commit to PX4/PX4-Autopilot that referenced this pull request May 28, 2019

[DO NOT MERGE]: consider changing math optimizations

be45be9

- includes ECL #601: Implemented Kahan summation algorithm for adding process noise - PX4/PX4-ECL#601

dagar mentioned this pull request May 28, 2019

disable -fassociative-math (within -funsafe-math-optimizations) PX4/PX4-Autopilot#12096

Merged

dagar force-pushed the kahanSum branch from 0f8b0ec to 4303125 Compare June 4, 2019 13:30

dagar approved these changes Jun 4, 2019

View reviewed changes

jkflying approved these changes Jun 11, 2019

View reviewed changes

priseborough merged commit cef2ba5 into PX4:master Jun 11, 2019

TSC21 mentioned this pull request Mar 12, 2020

Vehicle crash due do estimator GPS failure #771

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented Kahan summation algorithm for adding process noise #601

Implemented Kahan summation algorithm for adding process noise #601

RomanBapst commented May 8, 2019 •

edited by AuterionWrikeBot

Loading

CarlOlsson commented May 8, 2019

RomanBapst commented May 8, 2019

jkflying left a comment

RomanBapst commented May 9, 2019

priseborough commented May 14, 2019

priseborough commented May 23, 2019

priseborough commented May 24, 2019

bkueng commented May 24, 2019

jkflying commented May 24, 2019 •

edited

Loading

bkueng commented May 24, 2019

jkflying commented May 24, 2019

jkflying left a comment

priseborough commented May 28, 2019

priseborough commented May 28, 2019 •

edited

Loading

dagar commented May 28, 2019

jkflying commented May 28, 2019

dagar commented May 28, 2019 •

edited

Loading

LorenzMeier commented May 28, 2019

dagar commented May 28, 2019

jkflying commented May 29, 2019

dagar commented May 29, 2019

jkflying commented May 29, 2019

dagar commented May 29, 2019

priseborough commented Jun 4, 2019

jkflying left a comment

Implemented Kahan summation algorithm for adding process noise #601

Implemented Kahan summation algorithm for adding process noise #601

Conversation

RomanBapst commented May 8, 2019 • edited by AuterionWrikeBot Loading

CarlOlsson commented May 8, 2019

RomanBapst commented May 8, 2019

jkflying left a comment

Choose a reason for hiding this comment

RomanBapst commented May 9, 2019

priseborough commented May 14, 2019

priseborough commented May 23, 2019

priseborough commented May 24, 2019

bkueng commented May 24, 2019

jkflying commented May 24, 2019 • edited Loading

bkueng commented May 24, 2019

jkflying commented May 24, 2019

jkflying left a comment

Choose a reason for hiding this comment

priseborough commented May 28, 2019

priseborough commented May 28, 2019 • edited Loading

dagar commented May 28, 2019

jkflying commented May 28, 2019

dagar commented May 28, 2019 • edited Loading

Relevant GCC math optimizations

LorenzMeier commented May 28, 2019

dagar commented May 28, 2019

jkflying commented May 29, 2019

dagar commented May 29, 2019

jkflying commented May 29, 2019

dagar commented May 29, 2019

priseborough commented Jun 4, 2019

jkflying left a comment

Choose a reason for hiding this comment

RomanBapst commented May 8, 2019 •

edited by AuterionWrikeBot

Loading

jkflying commented May 24, 2019 •

edited

Loading

priseborough commented May 28, 2019 •

edited

Loading

dagar commented May 28, 2019 •

edited

Loading