-
Notifications
You must be signed in to change notification settings - Fork 511
Implemented Kahan summation algorithm for adding process noise #601
Conversation
Cool! Does this solve a "real" issue you experienced in flight? |
@CarlOlsson This all started with a log that revealed a covariance matrix instability from a quadcopter. We still need to go through the process noise tuning though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Maybe it would also be good to add a little comment about why Kahan is used on some variables, and not others? Just to make it easier in the future if we want to add more states to the EKF or something.
@jkflying @priseborough I addressed your comments. |
There are no issues with the code, however I am hesitant to merge this so close to a major release because it requires a change to the default parameter in ekf2 module which will need flight hours on different platforms to validate the new process noise value. |
PX4/Firmware v1.9.0-rc2 has been tagged and from now on and i have been told it will now only update with bug fixes, so we can now carry on with getting a sensible value for the noise variance, include that default with this PR and also the corresponding one on the firmware side. I will do some replay testing tomorrow. |
My testing on replay is showing that the kahanSummation function is not working as expected and the _delta_vel_bias_var_accum variable is always returning as zero. See debug output added here: https://github.com/priseborough/ecl/tree/pr601 Replay with with EKF2_ACC_B_NOISE = 0.0006 gives: At a larger values of EKF2_ACC_B_NOISE = 0.06 that just raise the variance off the lower limit I get the same variance before and after the PR changes are applied, eg: Before: After: |
@priseborough in order to ensure the compiler does not optimize the algorithm away, we'll need to disable |
@bkueng Maybe another way would be to declare the accumulator |
That's also fine with me, but I think we need to be very careful with enabling flags like |
It was probably added to reduce flash size, I know that often things like matrix calculations get enormously reduced using these flags. But there might be other cases where this is causing an issue, so yeah, we really need to be careful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realise -funsafe-math-optimizations
was applied, it will remove the Kahan summation (since it assumes ideal numbers, not floating point).
Either of these two changes will fix it (top one, or the two bottom ones)
I will try the method using volatiles in replay. |
I'd rather explicitly disable the optimization with the attribute than use a volatile. Yet another option would be moving |
IMO the optimization flag itself is dangerous, but if it reduces binary size a lot and potentially increases performance I understand, even if it might be problematic here. The attribute won't work on The CMake and different compilation unit will work, but then the kahan summation is scattered across a few different files, and it seems easy for a change to be made in the build system (CMake include orders, for example) which would silently break this. We don't have unit tests to make sure it is working, and if we did they would need to be in Release builds with all optimizations on for me to really trust them.
I'm a bit torn here. If we aren't planning to support |
I'm not opposed to dropping Relevant GCC math optimizations
|
Yes, sounds like we should be dropping those in general. |
- includes ECL #601: Implemented Kahan summation algorithm for adding process noise - PX4/PX4-ECL#601
Here's a PX4/Firmware side PR for testing. PX4/PX4-Autopilot#12096 |
Ok, so how about we use the |
That's fine as long as we don't completely move on and forget about PX4/PX4-Autopilot#12096. Here's the cmake level solution. https://github.com/PX4/ecl/blob/e4c31c30f746205a973dfae7aaef8dc5c3429d0c/EKF/CMakeLists.txt#L58 |
I think you mean |
Yep 😏. It's just an example of how additional flags can be set per source file. |
…-math-optimizations) - includes ECL #601: Implemented Kahan summation algorithm for adding process noise - PX4/PX4-ECL#601
So we wait for the firmware change, ensure there are no unexpected effects, then merge this. My limited replay testing so far has shown that the exisiting tuning parameter is OK to use, but that was one flight log from one vehicle. |
angle- and delta velocity bias variance - the contribution of process noise per iteration for these states can be so small that it gets lost if using standard floating point summation Signed-off-by: Roman <bapstroman@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiler optimization issue fixed in Firmware
Each time there is a covariance matrix update we add process noise to the variance of each state.
For the delta angle and the delta velocity bias variances these contributions can be so small compared to the actual variance that they get lost due to numerical limitations when dealing with floating point variables.
One solution for this problem is to use the Kahan summation algorithm which relies on an accumulator variable which captures the small increments that would get lost otherwise.
For more details, see https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Thanks to @priseborough for helping debug this issue and thanks to @jkflying for suggesting the algorithm.
Delta velocity z bias variance prior to this change:

Delta velocity z bias variance after this change:

I also compared the results of this PR against the previous code when the delta velocity bias noise was set very high (0.3 m/s/s/s) in order to see if both implementations lead to similar variances. The figure below shows that this was the case.
This is a replay log file from an S500 which has been used for validation:
https://logs.px4.io/plot_app?log=7ebfbbc0-d9ed-4e4f-92f0-e9ab73f0016f