-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Refactor of authority server #608
Conversation
@asonnino Hey Alberto! I don't have much context on this change, though moving from a custom stack to Tokio for handling server commands sounds great. Just wondering what is the rough timeline for this change, as it moves authority files around and I'm hoping to come out with a PR this week which has to do with logging where I'll be touching most of the files in the authority core. Thanks! |
I’m full time on this PR and will try to land as soon as possible @ade is also helping out
|
@asonnino looks like a lot of this code is verbatim from Narwhal https://github.com/MystenLabs/narwhal/tree/main/network/src Doesnt it make more sense to put the networking stuff in mysten-infra where both Narwhal and Sui can depend on it? |
config.config_path() | ||
); | ||
|
||
// Spawning the authorities is only required to create the database folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the problems here is that genesis is not supposed to start the authorities.
It should simply config/provision them. So spawning the server here causes various issues because they're again started in the SuiCommand::Start
. This causes DB lock issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically this is the reason for test failures.
Authorities are started here https://github.com/MystenLabs/fastnft/blob/bf9fcdc254d4ac271c20a85f3c72385cdd429a35/sui/src/unit_tests/cli_tests.rs#L539
And then again here in
https://github.com/MystenLabs/fastnft/blob/bf9fcdc254d4ac271c20a85f3c72385cdd429a35/sui/src/unit_tests/cli_tests.rs#L548
Second time around causes issues.
Why did you change to code to start authorities in Genesis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the problems here is that genesis is not supposed to start the authorities.
It should simply config/provision them. So spawning the server here causes various issues because they're again started in theSuiCommand::Start
. This causes DB lock issues.
Ha well that was a silly issue after all :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change the code of the authorities to start in genesis, it was already there. The original code however took down the authorities immediately after spawning them (they were only there to generate the storage file).
Yes this is the idea |
@asonnino slow down a bit on this line of work: the concurrency story for Sui is very different from the concurrency story for narwhal/Tusk, because we do execution as well, so we are not bounded (just) on IO. Right now we implement this though shared memory primitives: there is a Task per TCP connection, and they all share an AuthorityState, but execute on all cores. The architecture you are implementing moves all work into an inner loop for the authority, that processes all messages, which I think runs on a single core. This is a bad idea for Sui right now (it is also a bad idea for narwhal, but it is what it is). |
Also there is a lot of copy paste code from Narwhal/Tusk here, which we will have to maintain twice. Better to separate into a crate we maintain once, like we did for DBMap. |
The idea was to re-use the same network as narwhal (to be moved into |
Yes I wanted to move it into |
Also with the current design it is not clear how to receive inputs from consensus (for shared objects) |
Yeah, that is not what we want. We want to execute on all cores, with minimal locking between them, which is right now managed just by the authority store locks table. Eventually we will even execute on separate machines, so an architecture that centralises and executes all messages moves us away from our goal. |
I see, should I then drop this PR? |
What do you need there? Lets discuss minimal changes, we cannot change 1000 lines to find a way to do this. |
For the shared object, we need a way to receive inputs from the consensus. The consensus may feed us its output either through a channel or TCP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on the specific places where we are regressing in terms of the architecture we want.
@@ -1,376 +0,0 @@ | |||
// Copyright (c) 2021, Facebook, Inc. and its affiliates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently the only working bench we have. We cannot just drop it.
/// as the network or the consensus) and drives the core. | ||
async fn run(&mut self) { | ||
loop { | ||
tokio::select! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs on one big single core loop -- so we are regressing in terms of our architecture that currently uses all cores through shared mem, and tomorrow will use many machines that share a DB.
/// Create an `AuthorityServer` and spawn it in a new tokio task. | ||
pub fn spawn( | ||
state: AuthorityState, | ||
rx_client_core_message: Receiver<(ClientToAuthorityCoreMessage, Replier)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not any more a great fan of losing the connection relation: we want to fairly share time we spend between connections / IPs, and therefore we should probably keep that information and have tasks that do all the work and map to these, so that tokio can fairly schedule. Or we schedule. But this moves us to an architecture where we lose this information.
Sounds great. Lets have a task that manages incoming consensus messages, and has a copy (of the Arc) of the AuthorityState to update the locks tables there that result from the consensus? It can also call handle_certificate if you like on the AuthorityState. Does that resolve the immediate issue? The key to all this is: you can have concurrent access to the AuthorityState and indirectly the authority store. And we are engineering these to make it safe. |
yes we can try to do that, should I then close this PR and start a new one? And then think of the network on another day with a separate PR |
Yep, priority is on what you suggest: to make inputs from consensus work to support shared objects. Some aspects of this PR would be lovely as separate smaller PRs (to not kill others who have to rebase to them):
What I think needs more discussion:
|
I see this got closed, and there was a discussion on sharding. When we get to scaling our storage, we'll want to have a discussion on sharding and what it means, but of course that is a separate discussion from sharding an authority. |
* crypto: expose recover pubkey feature for Secp256k1Signature * better error handling and test
…abs#608) * crypto: expose recover pubkey feature for Secp256k1Signature * better error handling and test
What is done
What will be done in a next PR
Serde
andbincode
instead of the current custom serialisation format. If I remember correctly, the current serialisation format does not provide any speed benefit with respect to Serde (Serde was not that big at the time).Questions
sui_core
but do not have a lot of code in common.Committee
structure (that currently only holds their voting rights and cryptographic keys)? We separated them at the time because we wanted to have a 'simulator' of FastPay running on a fake/mock network.