Skip to content
This repository was archived by the owner on May 1, 2024. It is now read-only.

Implemented Kahan summation algorithm for adding process noise #601

Merged
merged 1 commit into from
Jun 11, 2019

Conversation

RomanBapst
Copy link
Contributor

@RomanBapst RomanBapst commented May 8, 2019

Each time there is a covariance matrix update we add process noise to the variance of each state.
For the delta angle and the delta velocity bias variances these contributions can be so small compared to the actual variance that they get lost due to numerical limitations when dealing with floating point variables.

One solution for this problem is to use the Kahan summation algorithm which relies on an accumulator variable which captures the small increments that would get lost otherwise.

For more details, see https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Thanks to @priseborough for helping debug this issue and thanks to @jkflying for suggesting the algorithm.

Delta velocity z bias variance prior to this change:
image (4)

Delta velocity z bias variance after this change:
image (3)

I also compared the results of this PR against the previous code when the delta velocity bias noise was set very high (0.3 m/s/s/s) in order to see if both implementations lead to similar variances. The figure below shows that this was the case.

variance!UNITO-UNDERSCORE!diff

This is a replay log file from an S500 which has been used for validation:
https://logs.px4.io/plot_app?log=7ebfbbc0-d9ed-4e4f-92f0-e9ab73f0016f

@CarlOlsson
Copy link
Contributor

Cool! Does this solve a "real" issue you experienced in flight?

@RomanBapst
Copy link
Contributor Author

@CarlOlsson This all started with a log that revealed a covariance matrix instability from a quadcopter.
(accel z bias grew to the limits which triggered the reset).
After inspecting the variances we noticed that the delta velocity variance was pulled to its lower bound in a very short time and was stuck there for the remainder of the flight. I then talked to Paul who told me that he once tuned the process noise so that the variance would be pulled to the lower bound but not as hard as what we have currently (variance should be "bouncing off the limit"). Sorry for the loose formulations :)
One reason why the tuning is currently off could be the fact that we fuse baro at about 50Hz while we used to fuse it at a much lower rate. (That is related to the min_observation_interval definition)
So we started increasing the delta velocity bias process noise but noticed that it hardly had an effect on the actual state variance. In the end we figured out that it was a numerical issue and thus this PR.

We still need to go through the process noise tuning though.

Copy link
Contributor

@jkflying jkflying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Maybe it would also be good to add a little comment about why Kahan is used on some variables, and not others? Just to make it easier in the future if we want to add more states to the EKF or something.

@RomanBapst
Copy link
Contributor Author

@jkflying @priseborough I addressed your comments.

jkflying
jkflying previously approved these changes May 9, 2019
priseborough
priseborough previously approved these changes May 14, 2019
@priseborough
Copy link
Collaborator

There are no issues with the code, however I am hesitant to merge this so close to a major release because it requires a change to the default parameter in ekf2 module which will need flight hours on different platforms to validate the new process noise value.

@priseborough
Copy link
Collaborator

PX4/Firmware v1.9.0-rc2 has been tagged and from now on and i have been told it will now only update with bug fixes, so we can now carry on with getting a sensible value for the noise variance, include that default with this PR and also the corresponding one on the firmware side. I will do some replay testing tomorrow.

@priseborough
Copy link
Collaborator

My testing on replay is showing that the kahanSummation function is not working as expected and the _delta_vel_bias_var_accum variable is always returning as zero.

See debug output added here: https://github.com/priseborough/ecl/tree/pr601

Replay with with EKF2_ACC_B_NOISE = 0.0006 gives:

Screen Shot 2019-05-24 at 11 47 10 am

At a larger values of EKF2_ACC_B_NOISE = 0.06 that just raise the variance off the lower limit I get the same variance before and after the PR changes are applied, eg:

Before:

before

After:

after

@bkueng
Copy link
Member

bkueng commented May 24, 2019

@priseborough in order to ensure the compiler does not optimize the algorithm away, we'll need to disable -funsafe-math-optimizations (in particular -fassociative-math, which is set by -funsafe-math-optimizations). It's here: https://github.com/PX4/Firmware/blob/master/cmake/px4_add_common_flags.cmake#L54

@jkflying
Copy link
Contributor

jkflying commented May 24, 2019

@bkueng Maybe another way would be to declare the accumulator volatile, or to add __attribute__ ((optimize("no-associative-math"))) above the Kahan function declaration?

@bkueng
Copy link
Member

bkueng commented May 24, 2019

That's also fine with me, but I think we need to be very careful with enabling flags like -fassociative-math on a global scope.

@jkflying
Copy link
Contributor

It was probably added to reduce flash size, I know that often things like matrix calculations get enormously reduced using these flags. But there might be other cases where this is causing an issue, so yeah, we really need to be careful.

Copy link
Contributor

@jkflying jkflying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realise -funsafe-math-optimizations was applied, it will remove the Kahan summation (since it assumes ideal numbers, not floating point).

Either of these two changes will fix it (top one, or the two bottom ones)

@priseborough
Copy link
Collaborator

I will try the method using volatiles in replay.

@priseborough
Copy link
Collaborator

priseborough commented May 28, 2019

I have confirmed on replay that making the t and y floats volatiles fixes the problem reported.

Screen Shot 2019-05-28 at 9 08 09 pm

Tomorrow I will have another look at the tuning of the bias process noise with the Kahan summation changes.

@dagar
Copy link
Member

dagar commented May 28, 2019

I'd rather explicitly disable the optimization with the attribute than use a volatile.

Yet another option would be moving Ekf::kahanSummation to a dedicated compilation unit and removing the optimization flag at the cmake level .

@jkflying
Copy link
Contributor

IMO the optimization flag itself is dangerous, but if it reduces binary size a lot and potentially increases performance I understand, even if it might be problematic here.

The attribute won't work on clang builds, and there is no equivalent way of doing it inline like that for clang. I really don't want there to be different behavior in something as fundamental as the ECL based on compiler used. It would be impossible to debug.

The CMake and different compilation unit will work, but then the kahan summation is scattered across a few different files, and it seems easy for a change to be made in the build system (CMake include orders, for example) which would silently break this. We don't have unit tests to make sure it is working, and if we did they would need to be in Release builds with all optimizations on for me to really trust them.

volatile is ugly from a "that's not what volatile is meant for" perspective, but seems to me the most likely to be robust, least effort, and won't silently break in the future.

I'm a bit torn here. If we aren't planning to support clang, then the attribute is better, but then we need to make it clear that clang is broken.

@dagar
Copy link
Member

dagar commented May 28, 2019

I'm not opposed to dropping -fassociative-math either across the entire ecl/EKF or PX4-wide for that matter. The flash savings are minor and so far I'm not seeing any performance differences on the bench (obviously needs significantly more testing).

image

Relevant GCC math optimizations

  • -funsafe-math-optimizations # Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math

    • -fno-signed-zeros # Allow optimizations for floating-point arithmetic that ignore the signedness of zero
    • -fno-trapping-math # Compile code assuming that floating-point operations cannot generate user-visible traps
    • -fassociative-math # Allow re-association of operands in series of floating-point operations
    • -freciprocal-math # Allow the reciprocal of a value to be used instead of dividing by the value if this enables optimizations
  • -fno-math-errno # Do not set errno after calling math functions that are executed with a single instruction, e.g., sqrt

@LorenzMeier
Copy link
Member

Yes, sounds like we should be dropping those in general.

dagar added a commit to PX4/PX4-Autopilot that referenced this pull request May 28, 2019
 - includes ECL #601: Implemented Kahan summation algorithm for adding process noise
 - PX4/PX4-ECL#601
@dagar
Copy link
Member

dagar commented May 28, 2019

Here's a PX4/Firmware side PR for testing. PX4/PX4-Autopilot#12096

@jkflying
Copy link
Contributor

Ok, so how about we use the volatile as a temporary workaround, and once we've updated the flags in the firmware we can get rid of the volatile.

@dagar
Copy link
Member

dagar commented May 29, 2019

Ok, so how about we use the volatile as a temporary workaround, and once we've updated the flags in the firmware we can get rid of the volatile.

That's fine as long as we don't completely move on and forget about PX4/PX4-Autopilot#12096.

Here's the cmake level solution. https://github.com/PX4/ecl/blob/e4c31c30f746205a973dfae7aaef8dc5c3429d0c/EKF/CMakeLists.txt#L58

@jkflying
Copy link
Contributor

Here's the cmake level solution.

https://github.com/PX4/ecl/blob/e4c31c30f746205a973dfae7aaef8dc5c3429d0c/EKF/CMakeLists.txt#L58

I think you mean -fno-associative-math?

@dagar
Copy link
Member

dagar commented May 29, 2019

Here's the cmake level solution.
https://github.com/PX4/ecl/blob/e4c31c30f746205a973dfae7aaef8dc5c3429d0c/EKF/CMakeLists.txt#L58

I think you mean -fno-associative-math?

Yep 😏. It's just an example of how additional flags can be set per source file.

dagar added a commit to PX4/PX4-Autopilot that referenced this pull request May 30, 2019
…-math-optimizations)

 - includes ECL #601: Implemented Kahan summation algorithm for adding process noise
 - PX4/PX4-ECL#601
@priseborough
Copy link
Collaborator

So we wait for the firmware change, ensure there are no unexpected effects, then merge this. My limited replay testing so far has shown that the exisiting tuning parameter is OK to use, but that was one flight log from one vehicle.

angle- and delta velocity bias variance

- the contribution of process noise per iteration for these states can be so
small that it gets lost if using standard floating point summation

Signed-off-by: Roman <bapstroman@gmail.com>
Copy link
Contributor

@jkflying jkflying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler optimization issue fixed in Firmware

@priseborough priseborough merged commit cef2ba5 into PX4:master Jun 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants