Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fastx client] Correct generic processing of orders and certificates. (Part 1) #378

Merged
merged 19 commits into from
Feb 8, 2022

Conversation

gdanezis
Copy link
Collaborator

@gdanezis gdanezis commented Feb 6, 2022

This PR addresses issue

In an nutshell is refactors sync_all_owned_objects to create a sync_all_given_objects that updates all authorities to the latest versions of the objects with IDs given. Then

  • A process_order function takes an order, updates all authorities with the objects it needs to be processed, and then submits it to get a certificate.
  • It also defines a process_certificate that submits a certificate to authorities, and even tries to update them if they are not up-to-date. This is designed to be called after process_order or when a certificate is available.
  • Cleaned up some test functions, added more test helpers, and test for new functions.

For next PRs:

  • Handle errors assigned to authorities better (probably separate PR).
  • Delete dead code, or older code that is no more correct.

@gdanezis gdanezis marked this pull request as draft February 6, 2022 20:05
@gdanezis
Copy link
Collaborator Author

gdanezis commented Feb 6, 2022

Hey @patrickkuo & @oxade -- this is to give you some early visibility into the resolution (finally) of #252.

@gdanezis gdanezis marked this pull request as ready for review February 7, 2022 18:04
@gdanezis gdanezis marked this pull request as draft February 7, 2022 18:21
@gdanezis gdanezis marked this pull request as ready for review February 7, 2022 18:26
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing:

  • this is under-utilizing local structs in calls to quorum_map_then_reduce_with_timeout,
  • this is missing an opportunity to get help on client test setup through an issue

Comment on lines 741 to 755
let (_active_objects, _deleted_objects) = self
.sync_all_given_objects(required_ids, timeout_after_quorum)
.await?;

// TODO: check here that the latest versions necessary are available, AND that
// no newer versions are available.

let _required_id_version: Vec<(ObjectID, SequenceNumber)> = order
.input_objects()
.iter()
.map(|o| (o.object_id(), o.version()))
.collect();

// Now broadcast the order to all authorities.
let _digest = order.digest();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand those are here to materialize todos, but:

  • would it be possible to group them, one TODO comment per group?
  • could we deduce required_ids from required_id_version rather than re-compute it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after some reflection I do not think it is right to check that objects are at the right version. Since it is not incorrect to re-submit the same order even if some authorities have already processed it, in order to make other make progress (even though really you should then submit a cert). So I removed a lot of this.

Note that the _digest was there to check the handle_order response, but I will do that in separate functions as we dicussed.

Comment on lines +769 to +768
// NOTE: here I assume the AuthorityClient has done
// the checks on this order response, and it is well formed.
Box::pin(async move { client.handle_order(order_ref.clone()).await })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should those checks be performed?

Copy link
Collaborator Author

@gdanezis gdanezis Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to define within authority_aggregator safe versions of the client authority interface, that check the returned values, and only use these that are pre-checked instead of doing checks everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 936 to 937
for (stake, effects) in state.values() {
if stake >= &threshold {
return Ok(effects.clone());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With match_opt for readability:

Suggested change
for (stake, effects) in state.values() {
if stake >= &threshold {
return Ok(effects.clone());
}
}
if Some(effects) = state.into_values().find_map(|x| match_opt!(x, (stake, effects) if stake >= &threshold => effects)) {
return Ok(effects)
}

if you're not into manual Option dissection:

state.into_values()
     .find_map(|x| match_opt!(x, (stake, effects) if stake >= &threshold => effects))
     .ok_or(FastPayError::ErrorWhileRequestingCertificate)

If you're conservative:

state.into_values()
     .find(|(stake, effects)| stake >= &threshold)
     .map(|x| x.1)
     .ok_or(FastPayError::ErrorWhileRequestingCertificate)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I do not find the iterator approach very readable. It requires the reader to know the semantics of about 3-4 rust specific functions or macros, rather than for, if and >=.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the only thing I would insist on: I believe you can iterate over state.into_values() and eliminate the final .clone().

I would also note that sometimes when people are exposed to functionality present in the standard library, they tend to find it useful (e.g. local structs) rather than daunting.

Comment on lines 238 to 242
pub fn secret(&self) -> &KeyPair {
&self.secret
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to make #380 harder, let's at least put a TODO here to remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets find the right way for tests to be able to make orders -- this is what this test function is used for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now straight up covered (to an acceptable compromise between correctness and development velocity) by #386

.await
{
votes.push((signed.authority, signed.signature));
order = Some(signed.order);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're retaining only the last returned signed.order and not testing discrepancies on this returned value?
Is it worth making that explicit by preceding this with an if order.is_none()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this function to setup authority configurations, but I have now protected this by:

            if let Some(inner_order) = order {
                assert!(inner_order == signed.order);
            }
            order = Some(signed.order);

Comment on lines +3449 to +3530
let (authority_clients, committee) = init_local_authorities(4).await;
let auth_vec: Vec<_> = authority_clients.values().cloned().collect();

let mut client1 = make_client(authority_clients.clone(), committee.clone());
let framework_obj_ref = client1.get_framework_object_ref().await.unwrap();

let gas_object1 = ObjectID::random();
let gas_object2 = ObjectID::random();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue open to make the client tests less repetitive? It would be a good first issue, and list the things we need pretty much everywhere:

  • authorities, committee,
  • a few clients, their fas objects and corresponding funding.
  • a bunch of initial orders (prior to any assertions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version is much less verbose in the schedule of orders and certificates for each authority, but the setup is still long and repetitive. We could define a standard setup indeed.

Copy link
Contributor

@lxfind lxfind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be in a separate PR, but the quorum weight tracking / checking seems to be a common need and might as well be done inside quorum_map_then_reduce_with_timeout if possible. If not, maybe we could add a new quorum primitive latter.

address: FastPayAddress,
timeout_after_quorum: Duration,
) -> Result<(Vec<Object>, Vec<ObjectRef>), FastPayError> {
// First get a map of all objects at least a quorum of authorities think we hold.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that what get_all_owned_objects really returns, is the list of all objects at least one authority thinks we hold, not a quorum of authorities think we hold, as long as a quorum of authorities responded. Not sure if that's a bug in get_all_owned_objects or intended.
If that's intended, we may want to clarify in the comment here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed what we do, and here is why:

if we have a certificate that executes on 2f+1 authorities, we can only guarantee f+1 of these will not be faulty, and also f non faulty are not signers. So if now we sync with a fresh 2f+1 we can only guarantee that 1 non faulty intersects, and will give us the latest. So to get the latest objects from all previous executions we must listen to all objects, and check they are valid (through asking for certs and re-executing ideally).

// In all other cases we will not be able to use this response
// to make a certificate. If this happens for more than f
// authorities we just stop, as there is no hope to finish.
_ => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi in #375 @oxade was trying to distinguish the None case vs Err case as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key thing here is to distinguish client errors (no network) from authority errors. Client errors are common, transient and recoverable (the client will just reconnect and re-try). On the other hand authorities going bad in very large number should be rare, and is sadly fatal and unrecoverable.

@gdanezis
Copy link
Collaborator Author

gdanezis commented Feb 8, 2022

Could be in a separate PR, but the quorum weight tracking / checking seems to be a common need and might as well be done inside quorum_map_then_reduce_with_timeout if possible. If not, maybe we could add a new quorum primitive latter.

The initial version had that but @huitseeker correctly identified a problem with it: the correctness condition to tally good versus bad stake vary significantly, and in some cases we even have to keep a tally per item returned. Hence we do it specifically for each function defined -- which is a little repetitive indeed but forces one to think what their conditions are.

@gdanezis gdanezis force-pushed the client_cleanup_aggregator branch from 781c7af to d39fc59 Compare February 8, 2022 08:10
@gdanezis gdanezis changed the title [fastx client] Correct generic processing of orders and certificates. [fastx client] Correct generic processing of orders and certificates. (Part 1) Feb 8, 2022
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! :shipit:

.await?;

let object_latest_version: HashSet<_> =
object_map.keys().map(|object_ref| object_ref.0).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object_map.keys().map(|object_ref| object_ref.0).collect();
object_map.keys().unzip().0;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object_ref is a reference, so I would have to copy for this to work I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try object_map.into_keys().unzip().0;

Comment on lines 933 to 937
for (stake, effects) in state.effects_map.values() {
if stake >= &threshold {
return Ok(effects.clone());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (stake, effects) in state.effects_map.values() {
if stake >= &threshold {
return Ok(effects.clone());
}
}
for (stake, effects) in state.effects_map.into_values() {
if stake >= threshold {
return Ok(effects);
}
}

Comment on lines 238 to 242
pub fn secret(&self) -> &KeyPair {
&self.secret
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now straight up covered (to an acceptable compromise between correctness and development velocity) by #386

@gdanezis gdanezis force-pushed the client_cleanup_aggregator branch from 60ac64a to 8a5a772 Compare February 8, 2022 17:05
.await?;

let object_latest_version: HashSet<_> =
object_map.keys().map(|object_ref| object_ref.0).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try object_map.into_keys().unzip().0;

// First get a map of all objects at one authority holds when at
// least a quorum of authorities is contacted.
let (object_map, _authority_list) = self
.get_all_owned_objects(address, timeout_after_quorum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have other plans for get_all_owned_objects in the future? It seems we are only using the list of all objects ids, but it returns a ton of stuff

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have many many plans for this function, starting with #381 and beyond.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that zip(), unzip() work on 2-tuples, but refs are 3-tuples?

@gdanezis gdanezis merged commit 32a5352 into main Feb 8, 2022
@gdanezis gdanezis deleted the client_cleanup_aggregator branch February 8, 2022 17:49
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[fastx client] Synchronise certificates with authorities before making move call and publish
3 participants