-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fastx client] Correct generic processing of orders and certificates. (Part 1) #378
Conversation
Hey @patrickkuo & @oxade -- this is to give you some early visibility into the resolution (finally) of #252. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main thing:
- this is under-utilizing local structs in calls to
quorum_map_then_reduce_with_timeout
, - this is missing an opportunity to get help on client test setup through an issue
let (_active_objects, _deleted_objects) = self | ||
.sync_all_given_objects(required_ids, timeout_after_quorum) | ||
.await?; | ||
|
||
// TODO: check here that the latest versions necessary are available, AND that | ||
// no newer versions are available. | ||
|
||
let _required_id_version: Vec<(ObjectID, SequenceNumber)> = order | ||
.input_objects() | ||
.iter() | ||
.map(|o| (o.object_id(), o.version())) | ||
.collect(); | ||
|
||
// Now broadcast the order to all authorities. | ||
let _digest = order.digest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand those are here to materialize todos, but:
- would it be possible to group them, one TODO comment per group?
- could we deduce
required_ids
fromrequired_id_version
rather than re-compute it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after some reflection I do not think it is right to check that objects are at the right version. Since it is not incorrect to re-submit the same order even if some authorities have already processed it, in order to make other make progress (even though really you should then submit a cert). So I removed a lot of this.
Note that the _digest
was there to check the handle_order
response, but I will do that in separate functions as we dicussed.
// NOTE: here I assume the AuthorityClient has done | ||
// the checks on this order response, and it is well formed. | ||
Box::pin(async move { client.handle_order(order_ref.clone()).await }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should those checks be performed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan is to define within authority_aggregator
safe versions of the client authority interface, that check the returned values, and only use these that are pre-checked instead of doing checks everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is PR
for (stake, effects) in state.values() { | ||
if stake >= &threshold { | ||
return Ok(effects.clone()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With match_opt
for readability:
for (stake, effects) in state.values() { | |
if stake >= &threshold { | |
return Ok(effects.clone()); | |
} | |
} | |
if Some(effects) = state.into_values().find_map(|x| match_opt!(x, (stake, effects) if stake >= &threshold => effects)) { | |
return Ok(effects) | |
} |
if you're not into manual Option dissection:
state.into_values()
.find_map(|x| match_opt!(x, (stake, effects) if stake >= &threshold => effects))
.ok_or(FastPayError::ErrorWhileRequestingCertificate)
If you're conservative:
state.into_values()
.find(|(stake, effects)| stake >= &threshold)
.map(|x| x.1)
.ok_or(FastPayError::ErrorWhileRequestingCertificate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I do not find the iterator approach very readable. It requires the reader to know the semantics of about 3-4 rust specific functions or macros, rather than for
, if
and >=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the only thing I would insist on: I believe you can iterate over state.into_values()
and eliminate the final .clone()
.
I would also note that sometimes when people are exposed to functionality present in the standard library, they tend to find it useful (e.g. local structs) rather than daunting.
fastpay_core/src/client.rs
Outdated
pub fn secret(&self) -> &KeyPair { | ||
&self.secret | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to make #380 harder, let's at least put a TODO here to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets find the right way for tests to be able to make orders -- this is what this test function is used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now straight up covered (to an acceptable compromise between correctness and development velocity) by #386
.await | ||
{ | ||
votes.push((signed.authority, signed.signature)); | ||
order = Some(signed.order); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're retaining only the last returned signed.order
and not testing discrepancies on this returned value?
Is it worth making that explicit by preceding this with an if order.is_none()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this function to setup authority configurations, but I have now protected this by:
if let Some(inner_order) = order {
assert!(inner_order == signed.order);
}
order = Some(signed.order);
let (authority_clients, committee) = init_local_authorities(4).await; | ||
let auth_vec: Vec<_> = authority_clients.values().cloned().collect(); | ||
|
||
let mut client1 = make_client(authority_clients.clone(), committee.clone()); | ||
let framework_obj_ref = client1.get_framework_object_ref().await.unwrap(); | ||
|
||
let gas_object1 = ObjectID::random(); | ||
let gas_object2 = ObjectID::random(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue open to make the client tests less repetitive? It would be a good first issue, and list the things we need pretty much everywhere:
- authorities, committee,
- a few clients, their fas objects and corresponding funding.
- a bunch of initial orders (prior to any assertions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version is much less verbose in the schedule of orders and certificates for each authority, but the setup is still long and repetitive. We could define a standard setup indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be in a separate PR, but the quorum weight tracking / checking seems to be a common need and might as well be done inside quorum_map_then_reduce_with_timeout
if possible. If not, maybe we could add a new quorum primitive latter.
address: FastPayAddress, | ||
timeout_after_quorum: Duration, | ||
) -> Result<(Vec<Object>, Vec<ObjectRef>), FastPayError> { | ||
// First get a map of all objects at least a quorum of authorities think we hold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that what get_all_owned_objects
really returns, is the list of all objects at least one authority thinks we hold, not a quorum of authorities think we hold, as long as a quorum of authorities responded. Not sure if that's a bug in get_all_owned_objects
or intended.
If that's intended, we may want to clarify in the comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed what we do, and here is why:
if we have a certificate that executes on 2f+1 authorities, we can only guarantee f+1 of these will not be faulty, and also f non faulty are not signers. So if now we sync with a fresh 2f+1 we can only guarantee that 1 non faulty intersects, and will give us the latest. So to get the latest objects from all previous executions we must listen to all objects, and check they are valid (through asking for certs and re-executing ideally).
// In all other cases we will not be able to use this response | ||
// to make a certificate. If this happens for more than f | ||
// authorities we just stop, as there is no hope to finish. | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key thing here is to distinguish client errors (no network) from authority errors. Client errors are common, transient and recoverable (the client will just reconnect and re-try). On the other hand authorities going bad in very large number should be rare, and is sadly fatal and unrecoverable.
The initial version had that but @huitseeker correctly identified a problem with it: the correctness condition to tally good versus bad stake vary significantly, and in some cases we even have to keep a tally per item returned. Hence we do it specifically for each function defined -- which is a little repetitive indeed but forces one to think what their conditions are. |
781c7af
to
d39fc59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work!
.await?; | ||
|
||
let object_latest_version: HashSet<_> = | ||
object_map.keys().map(|object_ref| object_ref.0).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object_map.keys().map(|object_ref| object_ref.0).collect(); | |
object_map.keys().unzip().0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object_ref is a reference, so I would have to copy for this to work I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try object_map.into_keys().unzip().0;
for (stake, effects) in state.effects_map.values() { | ||
if stake >= &threshold { | ||
return Ok(effects.clone()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (stake, effects) in state.effects_map.values() { | |
if stake >= &threshold { | |
return Ok(effects.clone()); | |
} | |
} | |
for (stake, effects) in state.effects_map.into_values() { | |
if stake >= threshold { | |
return Ok(effects); | |
} | |
} |
fastpay_core/src/client.rs
Outdated
pub fn secret(&self) -> &KeyPair { | ||
&self.secret | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now straight up covered (to an acceptable compromise between correctness and development velocity) by #386
60ac64a
to
8a5a772
Compare
.await?; | ||
|
||
let object_latest_version: HashSet<_> = | ||
object_map.keys().map(|object_ref| object_ref.0).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try object_map.into_keys().unzip().0;
// First get a map of all objects at one authority holds when at | ||
// least a quorum of authorities is contacted. | ||
let (object_map, _authority_list) = self | ||
.get_all_owned_objects(address, timeout_after_quorum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have other plans for get_all_owned_objects
in the future? It seems we are only using the list of all objects ids, but it returns a ton of stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have many many plans for this function, starting with #381 and beyond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that zip(), unzip() work on 2-tuples, but refs are 3-tuples?
This PR addresses issue
In an nutshell is refactors
sync_all_owned_objects
to create async_all_given_objects
that updates all authorities to the latest versions of the objects with IDs given. Thenprocess_order
function takes an order, updates all authorities with the objects it needs to be processed, and then submits it to get a certificate.process_certificate
that submits a certificate to authorities, and even tries to update them if they are not up-to-date. This is designed to be called afterprocess_order
or when a certificate is available.For next PRs: