Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Refactor of authority server #608

Closed
wants to merge 10 commits into from
Closed

[Core] Refactor of authority server #608

wants to merge 10 commits into from

Conversation

asonnino
Copy link
Contributor

@asonnino asonnino commented Mar 1, 2022

What is done

  • Refactor of the authority server to have a main reactor loop. This removes the current (rigid) structure of the server and allows to feed the core with messages in a more flexible way. This is for instance necessary to feed consensus outputs to the core and support shared objects: see [fastx] How to best integrate information from a consensus core to allow more general fastx smart contracts #195
  • Get rid of the current network and use instead the same network as Narwhal. The Narwhal's network will eventually be updated, but it will be a lot easier to use it in Sui as well.
  • Simplify the core message structure
  • Update the client to also use the Narwhal network. I am first waiting for the following PR to land to minimise merge conflicts: Refactor ClientState API #584

What will be done in a next PR

  • Separate unit tests into appropriate files. We currently have gigantic unit tests files (1,700 LOC).
  • User Serde and bincode instead of the current custom serialisation format. If I remember correctly, the current serialisation format does not provide any speed benefit with respect to Serde (Serde was not that big at the time).
  • Isolate the error messages of the Core to not have a gigantic errors enum (suggested by @huitseeker).
  • Add a mock (single-process) sequencer. We will eventually replace it with a proper consensus protocol.
  • Write better unit tests for the core (I think some corner cases are missing).
  • Rename 'mutable-objects' -> 'owned-objects' everywhere (inspired from @lxfind comments)

Questions

  • Can I get rid of the gigantic formatter test (sui.yaml)?
  • Can I make a separate crate for the authority and the client? They are currently both in sui_core but do not have a lot of code in common.
  • Can I add the network information of the authorities into the Committee structure (that currently only holds their voting rights and cryptographic keys)? We separated them at the time because we wanted to have a 'simulator' of FastPay running on a fake/mock network.

@asonnino asonnino marked this pull request as draft March 1, 2022 14:47
@asonnino asonnino self-assigned this Mar 1, 2022
@asonnino asonnino added this to the Post-GDC milestone Mar 1, 2022
@velvia
Copy link
Contributor

velvia commented Mar 1, 2022

@asonnino Hey Alberto! I don't have much context on this change, though moving from a custom stack to Tokio for handling server commands sounds great. Just wondering what is the rough timeline for this change, as it moves authority files around and I'm hoping to come out with a PR this week which has to do with logging where I'll be touching most of the files in the authority core. Thanks!

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022 via email

@oxade
Copy link
Contributor

oxade commented Mar 2, 2022

@asonnino looks like a lot of this code is verbatim from Narwhal https://github.com/MystenLabs/narwhal/tree/main/network/src

Doesnt it make more sense to put the networking stuff in mysten-infra where both Narwhal and Sui can depend on it?

config.config_path()
);

// Spawning the authorities is only required to create the database folder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems here is that genesis is not supposed to start the authorities.
It should simply config/provision them. So spawning the server here causes various issues because they're again started in the SuiCommand::Start. This causes DB lock issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this is the reason for test failures.
Authorities are started here https://github.com/MystenLabs/fastnft/blob/bf9fcdc254d4ac271c20a85f3c72385cdd429a35/sui/src/unit_tests/cli_tests.rs#L539
And then again here in
https://github.com/MystenLabs/fastnft/blob/bf9fcdc254d4ac271c20a85f3c72385cdd429a35/sui/src/unit_tests/cli_tests.rs#L548
Second time around causes issues.

Why did you change to code to start authorities in Genesis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems here is that genesis is not supposed to start the authorities.
It should simply config/provision them. So spawning the server here causes various issues because they're again started in the SuiCommand::Start. This causes DB lock issues.

Ha well that was a silly issue after all :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change the code of the authorities to start in genesis, it was already there. The original code however took down the authorities immediately after spawning them (they were only there to generate the storage file).

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022

@asonnino looks like a lot of this code is verbatim from Narwhal https://github.com/MystenLabs/narwhal/tree/main/network/src

Doesnt it make more sense to put the networking stuff in mysten-infra where both Narwhal and Sui can depend on it?

Yes this is the idea

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 2, 2022

@asonnino slow down a bit on this line of work: the concurrency story for Sui is very different from the concurrency story for narwhal/Tusk, because we do execution as well, so we are not bounded (just) on IO.

Right now we implement this though shared memory primitives: there is a Task per TCP connection, and they all share an AuthorityState, but execute on all cores. The architecture you are implementing moves all work into an inner loop for the authority, that processes all messages, which I think runs on a single core. This is a bad idea for Sui right now (it is also a bad idea for narwhal, but it is what it is).

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 2, 2022

Also there is a lot of copy paste code from Narwhal/Tusk here, which we will have to maintain twice. Better to separate into a crate we maintain once, like we did for DBMap.

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022

@asonnino slow down a bit on this line of work: the concurrency story for Sui is very different from the concurrency story for narwhal/Tusk, because we do execution as well, so we are not bounded (just) on IO.

Right now we implement this though shared memory primitives: there is a Task per TCP connection, and they all share an AuthorityState, but execute on all cores. The architecture you are implementing moves all work into an inner loop for the authority, that processes all messages, which I think runs on a single core. This is a bad idea for Sui right now (it is also a bad idea for narwhal, but it is what it is).

The idea was to re-use the same network as narwhal (to be moved into Mysten-infra) and get rid of the custom network we currently have. There is still one task per TCP connection (in the network stack) and then we execute on a single core; the idea was to use all the cores with sharding (eg. run 16 shards per machine).

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022

Also there is a lot of copy paste code from Narwhal/Tusk here, which we will have to maintain twice. Better to separate into a crate we maintain once, like we did for DBMap.

Yes I wanted to move it into Mysten-infra

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022

Also with the current design it is not clear how to receive inputs from consensus (for shared objects)

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 2, 2022

then we execute on a single core; the idea was to use all the cores with sharding (eg. run 16 shards per machine).

Yeah, that is not what we want. We want to execute on all cores, with minimal locking between them, which is right now managed just by the authority store locks table. Eventually we will even execute on separate machines, so an architecture that centralises and executes all messages moves us away from our goal.

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022

I see, should I then drop this PR?

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 2, 2022

Also with the current design it is not clear how to receive inputs from consensus (for shared objects)

What do you need there? Lets discuss minimal changes, we cannot change 1000 lines to find a way to do this.

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022

For the shared object, we need a way to receive inputs from the consensus. The consensus may feed us its output either through a channel or TCP.

Copy link
Collaborator

@gdanezis gdanezis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on the specific places where we are regressing in terms of the architecture we want.

@@ -1,376 +0,0 @@
// Copyright (c) 2021, Facebook, Inc. and its affiliates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently the only working bench we have. We cannot just drop it.

/// as the network or the consensus) and drives the core.
async fn run(&mut self) {
loop {
tokio::select! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs on one big single core loop -- so we are regressing in terms of our architecture that currently uses all cores through shared mem, and tomorrow will use many machines that share a DB.

/// Create an `AuthorityServer` and spawn it in a new tokio task.
pub fn spawn(
state: AuthorityState,
rx_client_core_message: Receiver<(ClientToAuthorityCoreMessage, Replier)>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not any more a great fan of losing the connection relation: we want to fairly share time we spend between connections / IPs, and therefore we should probably keep that information and have tasks that do all the work and map to these, so that tokio can fairly schedule. Or we schedule. But this moves us to an architecture where we lose this information.

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 2, 2022

For the shared object, we need a way to receive inputs from the consensus. The consensus may feed us its output either through a channel or TCP.

Sounds great. Lets have a task that manages incoming consensus messages, and has a copy (of the Arc) of the AuthorityState to update the locks tables there that result from the consensus? It can also call handle_certificate if you like on the AuthorityState. Does that resolve the immediate issue?

The key to all this is: you can have concurrent access to the AuthorityState and indirectly the authority store. And we are engineering these to make it safe.

@asonnino
Copy link
Contributor Author

asonnino commented Mar 2, 2022

yes we can try to do that, should I then close this PR and start a new one? And then think of the network on another day with a separate PR

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 2, 2022

Yep, priority is on what you suggest: to make inputs from consensus work to support shared objects.

Some aspects of this PR would be lovely as separate smaller PRs (to not kill others who have to rebase to them):

  • Eliminating silly things like double wrapping our certificates for no reason.
  • Separate errors that go over the network, vs the ones that stay in client / authority.
  • Refactor tests away from mega files, if that does not increase the line count.
  • mock sequencer
  • rename mutable to owned

What I think needs more discussion:

  • Moving networking to a crate and seeing what may be re-used.
  • The interaction between networking and then the concurrency story of the core.
  • Bincode/serde, there are reasons at places we use what we use, so need to understand them.
  • separate client / authority: yes, but need to ensure you break this in ways that does not make other people's PRs hell.

@asonnino asonnino closed this Mar 2, 2022
@velvia
Copy link
Contributor

velvia commented Mar 3, 2022

I see this got closed, and there was a discussion on sharding. When we get to scaling our storage, we'll want to have a discussion on sharding and what it means, but of course that is a separate discussion from sharding an authority.

mwtian pushed a commit that referenced this pull request Sep 12, 2022
* crypto: expose recover pubkey feature for Secp256k1Signature

* better error handling and test
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
…abs#608)

* crypto: expose recover pubkey feature for Secp256k1Signature

* better error handling and test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants