-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ReplicaStore
#668
Add ReplicaStore
#668
Conversation
Love the approach of using a type template! But instead of adding a new table for all objects, rather could we not just not run the following two lines in the store? |
@gdanezis That table is keyed on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added hopefully useful comments 😄
Regarding one table vs two tables: The only way I could see a unified single table would be a table based on (ObjectID, Version)
. The Version field would be ignored or set to 0 always in the authority version of the store -- OR you use the right version, but delete the older versions. That would create more churn though, but it might be OK since it fits the LSM pattern.
Actually I kinda like that idea slightly better than two tables. The difference in behaviour would then be if you kept older versions of objects. But maybe it would mess up the read patterns in authorities.
|
||
/// AUTHORITY is used to distinguish how the SuiDataStore is being used, whether for | ||
/// the authorities or for replicas. | ||
pub struct SuiDataStore<const AUTHORITY: bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling it authority, can we instead make it tied to the function which differentiates the store itself - ie the all object versions map. So I'd call it ALL_OBJ_VER: bool
instead. The reason is that down the line, another user of the store could need this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about that this flag can be used down the line, but in a different way: I think there will be other differences between authorities and non-authorities, not just whether the object versions are kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that is true - my point mainly was that "Authority" might become a misnomer down the line as then the functionality of the store becomes unclear, then I have to look up what differences an authority might have. We can always add more types for diff features, though that adds, well, more type complexity.
/// This is a map between the object ID and the latest state of the object, namely the | ||
/// state that is needed to process new transactions. If an object is deleted its entry is | ||
/// removed from this map. | ||
objects: DBMap<ObjectID, Object>, | ||
|
||
/// Stores all history versions of all objects. | ||
/// This is not needed by an authority, but is needed by a replica. | ||
all_object_versions: DBMap<ObjectRef, Object>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An ObjectRef contains the ObjectID, version, and Object Ref.
Isn't the object ref redundant here? There should never be two objects in the system with the same version but different object refs correct?
I would create a new type which is just the ObjectID and version, or just use that as a tuple. Dropping the object ref saves a ton of space in the key, enabling faster object lookups and less space used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still worried about this part. An ObjectID is 20 bytes, sequence number 28 bytes. An Object digest, the last part of an ObjectRef, is 32 bytes. If we don't need the ObjectRef, that shaves the key size down to less than half of what an entire ObjectRef is. When you multiply this by many many millions of objects it makes a big difference.
What about changing its key to ObjectRef? |
Re: using one table instead of two tables Why do we prefer one over two here? Is it just for code clarity? |
I do not want to block this forever, but just to clarify what the DBMap BTree allows us to do efficiently:
We can efficiently lookup the latest ObjectID in in a table indexed by ObjectRef by using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets start somehwere, and this is a fine place to start -- the refactor in the direction we want.
Would shared objects not be stored in the all_object_versions table? The tradeoffs for two vs one table are:
For a thin client 1) is not a concern, maybe for a full replica that wants to keep both tables it could be a concern, but the size of the all_objects table would quickly outgrow the other one, so probably not. I'm more worried about 2) since there are many more places we have to get it right. I believe you use batches which is a good start though. |
Let me iterate on this a bit |
6713dbb
to
d51cd84
Compare
Made the following changes:
I am still using two tables instead of one. I think that trying to unify them has too many implications to the implementation of AuthorityStore that would complicate the authority implementation. For example, having to delete the old version as well as adding the new version is going to be much more expensive/slower than simply overriding the table by id. |
@lxfind thanks for the changes |
We can reuse the
AuthorityStore
for the store in a replica.There can be more differences but the major difference would be storing all history versions of objects.
This PR adds a type template to the store, so that we could use it for both authority and replica.
Since it's type template, the cost should be minimum.
Open to suggestions on using a boolean flag field instead of type template if there is a good reason.