Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Improve bench, and tweak db options #818

Merged
merged 7 commits into from
Mar 16, 2022
Merged

Conversation

gdanezis
Copy link
Collaborator

@gdanezis gdanezis commented Mar 13, 2022

This PR contains the reasonable changes I made pre-GDC relating to tweaking the DB and improving the bench and perf:

  • Added some parallelism to the bench setup, and now using rayon to create transactions / certificates and seed the DB on all threads.
  • Exposed per-table/cf options, and tweaked some tables for more efficient point lookups
  • Now create locks when we insert object (unrelated but nice)

On this PR on AWS (c5d.metal 48 physical cores) we do:

$ cargo run --release --bin=bench -- --num-accounts 480000 --max-in-flight 48000 --use-move --tcp-connections 48 --db-dir /dataother/db --db-cpus 4 --send-timeout-us 20000000
> Total time: 12916005us, items: 480000, tx/sec: 37163.194037165515

$ cargo run --release --bin=bench -- --num-accounts 480000 --max-in-flight 48000 --use-move --tcp-connections 96 --db-dir /dataother/db --db-cpus 4 --send-timeout-us 20000000
> Total time: 11532369us, items: 480000, tx/sec: 41621.977236420375

$ cargo run --release --bin=bench -- --num-accounts 480000 --max-in-flight 96000 --use-move --tcp-connections 96 --db-dir /dataother/db --db-cpus 4 --send-timeout-us 20000000
> Total time: 11342484us, items: 480000, tx/sec: 42318.77250168481

The main branch baseline is:

$ cargo run --release --bin=bench -- --num-accounts 480000 --max-in-flight 96000 --use-move --tcp-connections 96 --db-dir /dataother/db --db-cpus 4 --send-timeout-us 20000000
> Total time: 13487272us, items: 480000, tx/sec: 35589.10949523373

And it is painfully slow to launch since the computation of certs / txs takes much longer than the bench.

Copy link
Contributor

@asonnino asonnino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.
Are we sure we are not overfitting for a specific AWS instance?

@@ -158,24 +159,23 @@ impl ClientServerBenchmark {

let mut opts = Options::default();
opts.increase_parallelism(self.db_cpus as i32);
let store = Arc::new(AuthorityStore::open(path, None));
opts.set_write_buffer_size(256 * 1024 * 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these numbers specifically?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is 64Mb and even my laptop has way more memory. Since the DB is currently our own store, we allow the memtables to be a little bigger to accommodate spikes.

sui/src/bench.rs Outdated
Comment on lines 207 to 217
let state = AuthorityState::new(
committee.clone(),
public_auth0,
Arc::pin(secret_auth0),
store,
genesis::clone_genesis_compiled_modules(),
&mut genesis::get_genesis_context(),
)
.await;

state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let state = AuthorityState::new(
committee.clone(),
public_auth0,
Arc::pin(secret_auth0),
store,
genesis::clone_genesis_compiled_modules(),
&mut genesis::get_genesis_context(),
)
.await;
state
AuthorityState::new(
committee.clone(),
public_auth0,
Arc::pin(secret_auth0),
store,
genesis::clone_genesis_compiled_modules(),
&mut genesis::get_genesis_context(),
)
.await

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed that!

@@ -20,6 +22,8 @@ pub type AuthorityStore = SuiDataStore<true>;
#[allow(dead_code)]
pub type ReplicaStore = SuiDataStore<false>;

const NUM_SHARDS: usize = 2048;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this optimised for the m5d.metal AWS machine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also used locally, and we have not optimized it much to be honest. Since we process (on laptop) about 10K-20K per second, I thought I should up this a little to reduce contention.

@asonnino
Copy link
Contributor

And it is painfully slow to launch since the computation of certs / txs takes much longer than the bench.

When possible I will add a benchmark client like this one: https://github.com/asonnino/fastpay/blob/extensions/benchmark_client/src/client.rs

@gdanezis gdanezis force-pushed the db-perf-reasonable branch from d5197e9 to e7821af Compare March 16, 2022 13:44
@gdanezis gdanezis force-pushed the db-perf-reasonable branch from a5f5e4b to 2b3fbf0 Compare March 16, 2022 15:09
@gdanezis gdanezis merged commit 0d2032d into main Mar 16, 2022
@gdanezis gdanezis deleted the db-perf-reasonable branch March 16, 2022 16:16
huitseeker added a commit to huitseeker/sui that referenced this pull request Aug 30, 2022
arun-koshy pushed a commit that referenced this pull request Aug 30, 2022
arun-koshy added a commit that referenced this pull request Aug 30, 2022
* Update narwhal pointer

* fix: adapt the Narwhal pointer in Sui post merge of narwhal/#859 (#4354)

This enacts the reversion of #818

* Add comments from #4219

* Update authority.rs after rebase

* hakari generate

Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>
mwtian pushed a commit that referenced this pull request Sep 12, 2022
Batch execution with single execution adapter
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
Batch execution with single execution adapter
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants