-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: numerical errors in GPU recombination #874
Conversation
47459ff
to
690cea6
Compare
Should this PR also resolve #852? Additionally, would it be possible to add a GPU runner to the CI tests so we can avoid these issues going silently unnoticed in the future (will need to open a new issue for this)? |
We had thought about this previously, but decided against it primarily on cost grounds. We do run GPU tests manually just before each release. A few older GPU bugs may have gone unnoticed until now because of a configuration issue with the GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your idea is good, and a valid approach! However, I believe we can achieve the same thing (fixing both #853 and #852) by simply correcting the following line, which is currently incorrect:
# Old (incorrect)
updated_null_space_basis.at = updated_null_space_basis.at[basis_index].set(0)
# New (correct)
updated_null_space_basis.at = updated_null_space_basis.at[:, elimination_index].set(0)
In my own local tests, this appears to work on CPU and GPU devices; all tests pass on CPU and GPU devices.
The only other thing that might be worthwhile is to indicate to users (via the docstring or a warning) that enabling double precision ("jax_enable_x64") is sometimes required to obtain correct results.
Ah, that does make sense - I'd just assumed the code was correct as I wasn't super familiar with the algorithm! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very small comment on updating the CHANGELOG, but otherwise looks good to me!
PR Type
Description
Closes #852. Closes #853. Fixes an incorrect update applied in the elimination step of the recombination solvers.
How Has This Been Tested?
Existing tests pass as expected, even on a GPU machine.
Does this PR introduce a breaking change?
No.
Checklist before requesting a review