Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rest Server] stop/start/genesis endpoints #454

Merged
merged 9 commits into from
Feb 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions sui/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ rustyline-derive = "0.6.0"
colored = "2.0.0"
unescape = "0.1.0"

# Deps for rest server
dropshot = "0.6.0"
http = "0.2.6"
hyper = "0.14.16"
schemars = "0.8.8"

move-package = { git = "https://github.com/diem/move", rev = "7683d09732dd930c581583bf5fde97fb7ac02ff7" }
move-core-types = { git = "https://github.com/diem/move", rev = "7683d09732dd930c581583bf5fde97fb7ac02ff7", features = ["address20"] }
move-bytecode-verifier = { git = "https://github.com/diem/move", rev = "7683d09732dd930c581583bf5fde97fb7ac02ff7" }
Expand All @@ -65,3 +71,7 @@ path = "src/sui.rs"
[[bin]]
name = "sui-move"
path = "src/sui-move.rs"

[[bin]]
name = "rest_server"
path = "src/rest_server.rs"
339 changes: 339 additions & 0 deletions sui/src/rest_server.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,339 @@
// Copyright (c) 2022, Mysten Labs, Inc.
// SPDX-License-Identifier: Apache-2.0

use dropshot::endpoint;
use dropshot::{
ApiDescription, ConfigDropshot, ConfigLogging, ConfigLoggingLevel, HttpError, HttpResponseOk,
HttpResponseUpdatedNoContent, HttpServerStarter, RequestContext,
};
use hyper::StatusCode;
use serde_json::json;
use sui::config::{Config, GenesisConfig, NetworkConfig, WalletConfig};
use sui::sui_commands;
use sui::wallet_commands::WalletContext;
use sui_core::authority_client::AuthorityClient;
use sui_core::client::{Client, ClientState};
use sui_types::base_types::*;
use sui_types::committee::Committee;

use futures::stream::{futures_unordered::FuturesUnordered, StreamExt as _};

use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use std::fs;
use std::net::{Ipv6Addr, SocketAddr};
use std::path::PathBuf;
use tokio::task::{self, JoinHandle};
use tracing::{error, info};

use std::sync::{Arc, Mutex};

#[tokio::main]
async fn main() -> Result<(), String> {
let config_dropshot: ConfigDropshot = ConfigDropshot {
bind_address: SocketAddr::from((Ipv6Addr::LOCALHOST, 5000)),
..Default::default()
};

let config_logging = ConfigLogging::StderrTerminal {
level: ConfigLoggingLevel::Info,
};
let log = config_logging
.to_logger("rest_server")
.map_err(|error| format!("failed to create logger: {}", error))?;

tracing_subscriber::fmt().init();

let mut api = ApiDescription::new();
api.register(start).unwrap();
api.register(genesis).unwrap();
api.register(stop).unwrap();

let api_context = ServerContext::new();

let server = HttpServerStarter::new(&config_dropshot, api, api_context, &log)
.map_err(|error| format!("failed to create server: {}", error))?
.start();

server.await
}

/**
* Server context (state shared by handler functions)
*/
struct ServerContext {
genesis_config_path: String,
wallet_config_path: String,
network_config_path: String,
authority_db_path: String,
client_db_path: Arc<Mutex<String>>,
authority_handles: Arc<Mutex<Vec<JoinHandle<()>>>>,
wallet_context: Arc<Mutex<Option<WalletContext>>>,
}

impl ServerContext {
pub fn new() -> ServerContext {
ServerContext {
genesis_config_path: String::from("genesis.conf"),
wallet_config_path: String::from("wallet.conf"),
network_config_path: String::from("./network.conf"),
authority_db_path: String::from("./authorities_db"),
Comment on lines +77 to +80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brittle and somewhat panic-inducing: IIUC, if something goes wrong we blow up on a started server that the user has hit an endpoint of, not. on starting the server itself. I would at least open an issue for the following:

  1. Instances of NetworkConfig, GenesisConfig, etc should be created at the ServerContext constructor.
  2. That constructor should hit all panics due to invalid or non-existent paths, as well as basic validations (e.g. empty quorum).
  3. Ideally, those should re-use the config creation logic from the client. (which should be made reusable)

Moreover, all of this setup is re-done from scratch on every endpoint hit. Is there a reason why this should happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only exposing these endpoints because we plan on hosting instances of the rest server for game developers. Which means if they wanted to reset the SUI network the rest server is using, they need to be able to do so using the endpoints. In the long term none of this will matter because we shouldn't be starting the Sui network from within the rest server anyways. I can open an issue discussing the future design of the rest server post-GDC.

Moreover, all of this setup is re-done from scratch on every endpoint hit. Is there a reason why this should happen?

Because the server context is instantiated once here, all other changes to the server context happen via the endpoints. There are checks in the endpoints to see if the network is already running or the configs are already created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they wanted to reset the SUI network the rest server is using, they need to be able to do so using the endpoints

What's wrong with having, as we seem to, one endpoint that starts the network and calls the constructor of the server's state, one that tears it down? It would be a contract with our early test developers that nothing interesting is going to happen if they don't hit that particular bootstrap first, and that they'll carry over the networks' state if they don't tear it down before a restart.

Then we can have real state, with configuration that contains actual data rather than recreating everything on the fly.

My issue isn't with the re-startability it's with the multiplication of panic-prone config read and scaffolding code in each endpoint handler.

client_db_path: Arc::new(Mutex::new(String::new())),
authority_handles: Arc::new(Mutex::new(Vec::new())),
wallet_context: Arc::new(Mutex::new(None)),
}
}
}

/**
* 'GenesisResponse' returns the genesis of wallet & network config.
*/
#[derive(Deserialize, Serialize, JsonSchema)]
struct GenesisResponse {
wallet_config: serde_json::Value,
network_config: serde_json::Value,
}

/**
* [SUI] Use to provide server configurations for genesis.
*/
#[endpoint {
method = POST,
path = "/debug/sui/genesis",
}]
async fn genesis(
rqctx: Arc<RequestContext<ServerContext>>,
) -> Result<HttpResponseOk<GenesisResponse>, HttpError> {
let server_context = rqctx.context();
let genesis_config_path = &server_context.genesis_config_path;
let network_config_path = &server_context.network_config_path;
let wallet_config_path = &server_context.wallet_config_path;

let mut network_config = NetworkConfig::read_or_create(&PathBuf::from(network_config_path))
.map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Unable to read network config: {error}"),
)
})?;

if !network_config.authorities.is_empty() {
return Err(custom_http_error(
StatusCode::CONFLICT,
String::from("Cannot run genesis on a existing network, stop network to try again."),
));
}

let working_dir = network_config.config_path().parent().unwrap().to_owned();
let genesis_conf = GenesisConfig::default_genesis(&working_dir.join(genesis_config_path))
.map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Unable to create default genesis configuration: {error}"),
)
})?;

let wallet_path = working_dir.join(wallet_config_path);
let mut wallet_config =
WalletConfig::create(&working_dir.join(wallet_path)).map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Wallet config was unable to be created: {error}"),
)
})?;
// Need to use a random id because rocksdb locks on current process which means even if the directory is deleted
// the lock will remain causing an IO Error when a restart is attempted.
let client_db_path = format!("client_db_{:?}", ObjectID::random());
wallet_config.db_folder_path = working_dir.join(&client_db_path);
*server_context.client_db_path.lock().unwrap() = client_db_path;

sui_commands::genesis(&mut network_config, genesis_conf, &mut wallet_config)
.await
.map_err(|err| {
custom_http_error(
StatusCode::FAILED_DEPENDENCY,
format!("Genesis error: {:?}", err),
)
})?;

Ok(HttpResponseOk(GenesisResponse {
wallet_config: json!(wallet_config),
network_config: json!(network_config),
}))
}

/**
* [SUI] Start servers with specified configurations.
*/
#[endpoint {
method = POST,
path = "/debug/sui/start",
}]
async fn start(
rqctx: Arc<RequestContext<ServerContext>>,
) -> Result<HttpResponseOk<String>, HttpError> {
let server_context = rqctx.context();
let network_config_path = &server_context.network_config_path;

let network_config = NetworkConfig::read_or_create(&PathBuf::from(network_config_path))
.map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Unable to read network config: {error}"),
)
})?;

if network_config.authorities.is_empty() {
return Err(custom_http_error(
StatusCode::CONFLICT,
String::from("No authority configured for the network, please run genesis."),
));
}

{
if !(*server_context.authority_handles.lock().unwrap()).is_empty() {
return Err(custom_http_error(
StatusCode::FORBIDDEN,
String::from("Sui network is already running."),
));
}
}

let committee = Committee::new(
network_config
.authorities
.iter()
.map(|info| (*info.key_pair.public_key_bytes(), info.stake))
.collect(),
);
let mut handles = FuturesUnordered::new();

for authority in &network_config.authorities {
let server = sui_commands::make_server(
authority,
&committee,
vec![],
&[],
network_config.buffer_size,
)
.await
.map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Unable to make server: {error}"),
)
})?;
handles.push(async move {
match server.spawn().await {
Ok(server) => Ok(server),
Err(err) => {
return Err(custom_http_error(
StatusCode::FAILED_DEPENDENCY,
format!("Failed to start server: {}", err),
));
}
}
})
}

let num_authorities = handles.len();
info!("Started {} authorities", num_authorities);

while let Some(spawned_server) = handles.next().await {
server_context
.authority_handles
.lock()
.unwrap()
.push(task::spawn(async {
if let Err(err) = spawned_server.unwrap().join().await {
error!("Server ended with an error: {}", err);
}
}));
}
Comment on lines +211 to +252
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we're maintaining a copy of this in sui_commands, called start_network

Copy link
Contributor Author

@arun-koshy arun-koshy Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am handling the async portion of this differently than start_network does. I am maintaining a hold of the JoinHandle so that I can abort it later during start. Don't think reusing sui_commands makes sense here, but correct me if I am wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Additionally, as we've seen, the pattern of what's going on in start_network and genesis in sui_commands doesn't make sense: the task waits on each launched server to finish (rather than wait for each server to start) here:
https://github.com/MystenLabs/fastnft/blob/dba291bfac0d9a9811e9a296f3f4712744ef2df6/sui/src/sui_commands.rs#L86-L88
As we've seen, that's not really usable in a task, which should keep a handle rather than block on completion.

Would you and @patrickkuo open an issue to track the refactoring of start_network in a way that:

  • has clear semantics on starting the server,
  • makes this usable in a tokio task (by returning the handle of the SpawnedServer rather blocking on it,
  • de-duplicates the code between the present function and sui_commands
    ?


let wallet_config_path = &server_context.wallet_config_path;

let wallet_config =
WalletConfig::read_or_create(&PathBuf::from(wallet_config_path)).map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Unable to read wallet config: {error}"),
)
})?;

let addresses = wallet_config
.accounts
.iter()
.map(|info| info.address)
.collect::<Vec<_>>();
let mut wallet_context = WalletContext::new(wallet_config).map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Can't create new wallet context: {error}"),
)
})?;

// Sync all accounts.
for address in addresses.iter() {
let client_state = wallet_context
.get_or_create_client_state(address)
.map_err(|error| {
custom_http_error(
StatusCode::CONFLICT,
format!("Can't create client state: {error}"),
)
})?;
if let Err(err) = sync_client_state(client_state).await {
return Err(err);
}
}

*server_context.wallet_context.lock().unwrap() = Some(wallet_context);

Ok(HttpResponseOk(format!(
"Started {} authorities",
num_authorities
)))
}

/**
* [SUI] Stop servers and delete storage.
*/
#[endpoint {
method = POST,
path = "/debug/sui/stop",
}]
async fn stop(
rqctx: Arc<RequestContext<ServerContext>>,
) -> Result<HttpResponseUpdatedNoContent, HttpError> {
let server_context = rqctx.context();

for authority_handle in &*server_context.authority_handles.lock().unwrap() {
authority_handle.abort();
}
(*server_context.authority_handles.lock().unwrap()).clear();

fs::remove_dir_all(server_context.client_db_path.lock().unwrap().clone()).ok();
fs::remove_dir_all(&server_context.authority_db_path).ok();
fs::remove_file(&server_context.network_config_path).ok();
fs::remove_file(&server_context.wallet_config_path).ok();

Ok(HttpResponseUpdatedNoContent())
}

async fn sync_client_state(
client_state: &mut ClientState<AuthorityClient>,
) -> Result<(), HttpError> {
// synchronize with authorities
let res = async move { client_state.sync_client_state().await };
res.await.map_err(|err| {
custom_http_error(
StatusCode::FAILED_DEPENDENCY,
format!("Sync error: {:?}", err),
)
})
}

fn custom_http_error(status_code: http::StatusCode, message: String) -> HttpError {
HttpError::for_client_error(None, status_code, message)
}
Loading