Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HackerOne-2239704] Introduce an override for atomic batch operations #2296

Merged
merged 11 commits into from
Feb 21, 2024

Conversation

ljedrz
Copy link
Collaborator

@ljedrz ljedrz commented Jan 10, 2024

This PR introduces a database-level override flag that does not allow an atomic operation to be executed until it's flipped manually.

This is a relatively simple solution to make block insertion and finalization happen as a single atomic storage operation; the reason why they can't be carried out like that "normally" is that they are disjoint in the storage hierarchy.

This should fix the storage corruption that can currently happen if the node is shut down between those two operations.

Filing it as a draft, as I'd like to further improve the related docs, but it worked when I tested it locally.

This PR should address:

@ljedrz ljedrz requested a review from howardwu January 10, 2024 18:47
@ljedrz
Copy link
Collaborator Author

ljedrz commented Jan 10, 2024

I'm also going to check the failing test cases; it's likely that some of them are just using the low-level operations individually.

Edit: it was actually just due to the in-memory storage making assertions that only apply to the persistent one; fixed.

@ljedrz ljedrz force-pushed the feat/atomic_batch_override branch 2 times, most recently from 6a17280 to 64d36d7 Compare January 11, 2024 12:02
@ljedrz ljedrz marked this pull request as ready for review January 11, 2024 13:23
@ljedrz ljedrz force-pushed the feat/atomic_batch_override branch from 64d36d7 to d4eb70b Compare January 11, 2024 13:25
@ljedrz
Copy link
Collaborator Author

ljedrz commented Jan 19, 2024

Rebased and addressed the review comment in a new commit.

Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
@ljedrz ljedrz force-pushed the feat/atomic_batch_override branch from 5f50c0f to 6bfa9a3 Compare January 20, 2024 10:19
@ljedrz
Copy link
Collaborator Author

ljedrz commented Jan 20, 2024

Rebased and addressed the latest review comments in a new commit.

I also noticed that the unhappy path for finalization needed to be adjusted, which was done in the last commit.

@@ -335,12 +335,32 @@ impl<N: Network, C: ConsensusStorage<N>> VM<N, C> {
// Attention: The following order is crucial because if 'finalize' fails, we can rollback the block.
// If one first calls 'finalize', then calls 'insert(block)' and it fails, there is no way to rollback 'finalize'.

// Enable the atomic batch override, so that both the insertion and finalization belong to a single batch.
#[cfg(feature = "rocks")]
assert!(self.block_store().flip_atomic_override()?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be debug_assert!?

Copy link
Collaborator Author

@ljedrz ljedrz Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plain assert was intentional, as this ensures that it is initially disabled - if this was not the case, it would indicate a logic bug, and other atomic operations could be affected

self.finalize_store().atomic_abort();
// Disable the atomic batch override.
assert!(!self.block_store().flip_atomic_override()?);
}
// Rollback the block.
self.block_store().remove_last_n(1).map_err(|removal_error| {
Copy link
Member

@howardwu howardwu Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still incorrect.

Calling remove_last_n(1) here will 1) correctly update the Merkle tree, however 2) it will also delete the penultimate block now from self.block_store().

This is erroneous because we are using this concept of a flip_atomic_override to remove the pending last block, so the call to remove_last_n(1) will now remove the penultimate block erroneously.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed; there are 2 ways I can see of tackling this; we can either:

  • put the storage deletion within remove_last_n behind a feature flag (so that it doesn't happen with rocks); however, this would obscure the purpose of the method and affect its use outside of typical runtime (and probably break some tests)
  • break the remove_last_n method into 2 distinct operations (isolated into new methods) - the tree update and the storage update - and only call the tree update here; I'll see if I can propose something "clean" for this

Copy link
Collaborator Author

@ljedrz ljedrz Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed the 2nd approach in ProvableHQ@1a27305. The scope of the write lock over the Merkle tree is different, but since the entirety of block insertion is behind the block_lock, this shouldn't affect anything (I'm assuming that only block insertion/rollback can alter it).

@ljedrz ljedrz marked this pull request as draft February 21, 2024 11:58
@ljedrz
Copy link
Collaborator Author

ljedrz commented Feb 21, 2024

Switching to draft so I can double-check the updated setup.

Done, and tested locally.

@ljedrz ljedrz force-pushed the feat/atomic_batch_override branch from 6e5e390 to f656d9f Compare February 21, 2024 12:18
@ljedrz ljedrz marked this pull request as ready for review February 21, 2024 13:15
Copy link
Contributor

@raychu86 raychu86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@raychu86 raychu86 changed the title Introduce an override for atomic batch operations [HackerOne-2239704] Introduce an override for atomic batch operations Feb 21, 2024
@howardwu howardwu merged commit d23daa8 into ProvableHQ:testnet3 Feb 21, 2024
70 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants