Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce P3 and RocksDB fixes #624

Merged
merged 10 commits into from
Oct 4, 2024
Merged

Introduce P3 and RocksDB fixes #624

merged 10 commits into from
Oct 4, 2024

Conversation

shizzard
Copy link
Collaborator

@shizzard shizzard commented Oct 2, 2024

This PR includes several directions of work:

  • P3: fix configuration;
  • ar_kv: rework, introduce SST flushes, WAL syncs, and proper termination sequence.

@shizzard
Copy link
Collaborator Author

shizzard commented Oct 2, 2024

@ldmberman short explanation for ar_kv changes for you:

  • Reconnect machanism introduced a race condition and was not used anyway, so it was removed.
  • The repair mechanism is actually inherited from leveldb and cannot repair CF databases, therefore is dangerous to use: in best case scenario it will drop the data from WAL, in worst case it will only apply default CF changes, corrupting the database. Repair was completely removed for this reason.
  • The termination sequence is now set as [memtable flush, WAL sync, close], which is sufficient for data to be persisted.
  • RocksDB will handle dangling corrupted records in MANIFEST, SST file or WAL if the process crashed mid-write, dropping the corrupted entries, so there is no need to do anything about that.
  • In order to be sure that CF databases are consistent, the atomic_flush option is applied to all of them: it ensures that all the CFs are flushed atomically.

Worth reading, despite it is 10 years old: facebook/rocksdb#236 (comment) (other comments are also useful).

@shizzard
Copy link
Collaborator Author

shizzard commented Oct 2, 2024

Update: I found some traces of adding CF support for repair mechanism, but I cannot make it work anyway, neither with Erlang bindings nor RocksDB CLI tool (ldb).
trace commit

{exception, io_lib:format("~p", [Exc])}]),
{error, Exc};
_ ->
case reconnect(Name, Ref) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code discards the retry functionality. Is this what you were referring to in your PR comment about the reconnect functionality not working right? If the retry didn't work or, worse, introduced a race condition, then I agree good to remove - but just want to double check that we're not accidentally using a valuable bit of retry logic

Reconnect machanism introduced a race condition and was not used anyway, so it was removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. All methods were called with RetryCount set to 1? In that case, I completely agree - kill the retry functionality

Copy link
Collaborator Author

@shizzard shizzard Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the problem was bigger than that.
The reconnect was called from the functions that were executed on the caller side (not in the ar_kv process), so these calls may happen concurrently, a few at once. This means, that if the database for some reason is down, and the db reference is outdated, few other processes will do attempt gen_server:call with demand to reconnect the database. This calls will be serialized in the process mailbox, and gen_server will do close/open sequence several times. While the database is closed, other processes will find the database reference dead, will call for reconnects, and this will never end.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it - sounds like a mess. Good call removing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calls will be serialized in the process mailbox, and gen_server will do close/open sequence several times. While the database is closed, other processes will find the database reference dead, will call for reconnects, and this will never end.

This is not true, the first reconnect changes the reference and the subsequent processes simply take it - https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_kv.erl#L171

Nevertheless, we probably do not need the reconnect functionality now so it makes sense to remove it.

Copy link
Collaborator Author

@shizzard shizzard Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldmberman yes, that's the race condition, because several processes will hit the same gen_server:call at the same time and the ar_kv server will do reconnect sequence several times.
If you're referring to the fact that one database is only used by one process (not sure that this is true), then yes, there is no race condition just because the race contains just one process.

@ldmberman
Copy link
Member

@ldmberman short explanation for ar_kv changes for you:

  • Reconnect machanism introduced a race condition and was not used anyway, so it was removed.
  • The repair mechanism is actually inherited from leveldb and cannot repair CF databases, therefore is dangerous to use: in best case scenario it will drop the data from WAL, in worst case it will only apply default CF changes, corrupting the database. Repair was completely removed for this reason.
  • The termination sequence is now set as [memtable flush, WAL sync, close], which is sufficient for data to be persisted.
  • RocksDB will handle dangling corrupted records in MANIFEST, SST file or WAL if the process crashed mid-write, dropping the corrupted entries, so there is no need to do anything about that.
  • In order to be sure that CF databases are consistent, the atomic_flush option is applied to all of them: it ensures that all the CFs are flushed atomically.

Worth reading, despite it is 10 years old: facebook/rocksdb#236 (comment) (other comments are also useful).

This is a relatively fresh discussion of the repairing procedure. The key point from there:

The WAL is processed afterwards, when the insertion for new column families are ignored as they are not represented in the recovered manifest.

This is (so far) in line with what @shizzard observed locally in tests. There are no other caveats mentioned so I think it is only about flushing the CFs once they are created (no need to flush them explicitly later on.) In any case, @shizzard and I had a discussion and decided to remove the repairing code for now.

Regarding the atomic_flush option, we do not need it because we do not use RocksDB transactions. I would not introduce it until we need it.

@shizzard
Copy link
Collaborator Author

shizzard commented Oct 3, 2024

@ldmberman @JamesPiechota I think it is ready to merge, unless you have any changes in mind.

@shizzard shizzard merged commit f59047b into master Oct 4, 2024
65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants