-
Notifications
You must be signed in to change notification settings - Fork 137
Conversation
Codecov Report
@@ Coverage Diff @@
## master #303 +/- ##
==========================================
+ Coverage 83.93% 84.10% +0.16%
==========================================
Files 50 50
Lines 9400 9411 +11
Branches 2508 2500 -8
==========================================
+ Hits 7890 7915 +25
+ Misses 953 940 -13
+ Partials 557 556 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
src/log.h
Outdated
/* Initialize an empty in-memory log of raft entries. */ | ||
void logInit(struct raft_log *l); | ||
void logInit(struct raft_log **l); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe struct raft_log *logInit(void);
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, that's a bit more idiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something -- if we're adding a version field to raft_uv_transport
to prepare for evolving it in the future, why don't we also add a reserved
field as for some of the other structs?
include/raft/fixture.h
Outdated
@@ -70,6 +72,7 @@ struct raft_fixture | |||
struct raft_fixture_event *event; /* Last event occurred. */ | |||
raft_fixture_event_cb hook; /* Event callback. */ | |||
struct raft_fixture_server *servers[RAFT_FIXTURE_MAX_SERVERS]; | |||
uint64_t _reserved[16]; /* For future expansion of struct. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this has an underscore where the other reserved
fields don't.
Maybe the difference is somewhat arbitrary, but I consider those structs with reserved space 'internal' structs, meaning that raft uses them for whatever it wants, but they can still be allocated by the user or embedded in another struct (was one of the design decisions), and that's why we need enough room to add functionality. The versioned structs are 'external', meaning the client fills out their functionality, they too can be allocated by the client or embedded in another struct. Based on the version, as long as the raft library doesn't read/write members that do not correspond to the version, memory accesses should be safe. In theory we could add padding to those structs, but I don't think it is needed imo. So for internal structs we don't want the version approach because it will hamper us to add transparent functionality purely because the user did not allocate enough space for it. |
81bbf58
to
da11b0a
Compare
@MathieuBordere Thanks, makes sense to me! |
include/raft.h
Outdated
* give meaning the `zero` value because the raft implementation needs to know | ||
* whether to other side knows the new field or not. `zero` means the other side | ||
* does not know the new field/functionality (this is why raft_tribool exists). | ||
* Implementers MUST always zero out unknown fields before passing them to raft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite realize this when we introduced raft_tribool
, but now that I see the 0
-means-older-code-on-the-other-side approach generalized here I'm thinking that perhaps this not the most natural approach.
In most cases 0
is an important and natural value to use. For example let's say that we introduce a new counter field in a message. Do we then have to use 0
to mean "the other side did not include this field", 1
to mean 0
, 2
to mean 1
, etc?
Plus, it creates an asymmetry between fields that we already have and the just follow the natural semantics for their values and fields that we add in the future.
Actually we already see the asymmetry in struct raft_request_vote_result
, where the vote_granted
field is a regular bool
and the pre_vote
field is a raft_tribool
, which looks weird to me. If we had thought about the pre-vote issue addressed by pre_vote
field since the very first version of this library and not instead after a while, we would certainly have used bool
not raft_tribool
, which is something telling. The fact that we use raft_tribool
now seems a smell that we are mixing two separate concerns: one concern is to store a boolean value in a struct, the other concern is to know whether the other side did include this value in the message it sent us.
This makes me think that a more appropriate approach would be perhaps to separate these two concerns and be explicit about what the other side sent. There are various ways I can imagine to do that, for example adding a version
field to struct raft_message
or other variations of that idea. Before we discuss the detail about this possible approach though, I'd like to know how do you feel about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm certainly open to suggestions. The zero
approach feels indeed a bit like a 'plumbing' solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then my suggestion would be to add an int version
field to all the various message structures (struct raft_request_vote
, struct raft_request_vote_result
, struct raft_append_entries
, etc).
Note that I believe this is a bit better than adding the version
field to struct raft_message
as I was hinting above, both because what we're versioning is really the specific messages (and not struct raft_message
), and because as bonus points it should also minimize code changes (since we often pass around the specific messages, not whole struct raft_message
objects).
This would be pretty much similar to the version
fields in the raft_io
/raft_fsm
structures, making the various message structures effectively "external" structures (borrowing the same terminology that you used in this comment).
It makes sense considering that those message structures are really part of the raft_io
interface (specifically the send()
and raft_io_recv_cb
parts), so they "inherit" the fact of being external. When receiving messages via raft_io_recv_cb
, the raft_io
implementation should be in charge of telling raft
core what to fields to expect, and it can be done via this new version
field (like it does for its methods, via the raft_io->version
field). When sending messages raft
core should tell raft_io
what fields it's providing, again via the version
field.
Regarding encoding and decoding in the libuv-based raft_io
implementation, we currently use the very first 8 bytes of a message to store the message type (see here and here). I'd suggest to change that slightly, in a fully backward compatible way. We should use the first byte to store the message type (a range of 255 is more than enough even for the future), the second byte to store the version, and the last two bytes are unused and can be reserved for future use. We can decide that version 0
is the current version of all messages, and current servers are already setting the second byte to 0
, so we have backward compatibility.
Note that this also allows us to change the semantics of a message without changing its wire-level length, as at the moment we're using the message length as a "proxy" for its version.
As a side note, with this explicit message versioning approach we should not need to add padding to the various message structures, in the same way we don't use padding for raft_io
/raft_fsm
.
To keep things light, in this branch we could just add the version
field to all the message structures and nothing else. The actual implementation of the encoding and decoding could be done separately. Or we could add the implementation as well, it shouldn't be a lot of code.
edit: Hm I just realized that the wire level change would actually break old servers if we set the version to something other than 0
, so that needs a bit more thinking. I hope we can find something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use the first 2 bytes as the message type, the next 2 bytes as version and rest is unused. Then the cast to unsigned short
here should still be fine on old clients with the version set to non-0, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use the first 2 bytes as the message type, the next 2 bytes as version and rest is unused. Then the cast to
unsigned short
here should still be fine on old clients with the version set to non-0, or am I missing something?
I'm not sure what you mean. If version is set to non-0, old clients would get a broken message type no? Because they use byteFlip64
to get the type.
Unfortunately I believe the only option is to look at the message length, as we already do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, practically, we would still version the message structs, but instead of relying on an explicit version in the on wire message, we would rely on the size of the on wire message.
Exactly. As said, I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, practically, we would still version the message structs, but instead of relying on an explicit version in the on wire message, we would rely on the size of the on wire message.
Exactly. As said, I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.
I was doing it the other way around, only writing 2 bytes when encoding the message, but that's not strictly needed now, reading only 2 bytes when decoding is what is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, practically, we would still version the message structs, but instead of relying on an explicit version in the on wire message, we would rely on the size of the on wire message.
Exactly. As said, I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.
I was doing it the other way around, only writing 2 bytes when encoding the message, but that's not strictly needed now, reading only 2 bytes when decoding is what is needed.
Exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.
I downcast the uint64_t
type to uint16_t
, I thought that might be the least intrusive way for now to only use the first 2 bytes of the preamble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.
I downcast the
uint64_t
type touint16_t
, I thought that might be the least intrusive way for now to only use the first 2 bytes of the preamble.
Sounds right to me.
3d9a9fe
to
ddf3d5e
Compare
I'm working on the dqlite disk-mode in parallel, I'm currently figuring out a decent proposal for the snapshot mechanism, and analyze if anything needs to be added to this PR for that. |
ddf3d5e
to
f1c44fd
Compare
I think this is ready for review, let's maybe not merge it on a Friday as it would bump the .so number and break builds. Will remove draft state on Monday for this reason. |
edit: I can probably just add a field edit2: Or maybe we should call it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The individual commits look good to me, there are a couple places where it seems like we might have room to "opaquify" more but that's discretionary.
@@ -598,6 +596,8 @@ typedef void (*raft_close_cb)(struct raft *raft); | |||
struct raft_change; /* Forward declaration */ | |||
struct raft_transfer; /* Forward declaration */ | |||
|
|||
struct raft_log; | |||
|
|||
/** | |||
* Hold and drive the state of a single raft server in a cluster. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not necessarily in favor of this, but did we consider making struct raft
partially or wholly opaque?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it yes, but the argument to leave that as-is is to allow for stack allocation and embedding the struct raft
in other structs (like dqlite is currently doing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay -- no strong feelings either way on my end, just wanted to check whether it was on the table.
raft_term term; /* Receiver's current term (candidate updates itself). */ | ||
bool vote_granted; /* True means candidate received vote. */ | ||
raft_tribool pre_vote; /* The response to a pre-vote RequestVote or not. */ | ||
bool pre_vote; /* The response to a pre-vote RequestVote or not. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change, if I understand correctly the raft_tribool
type is not needed anymore because we version messages? However, won't there issues with backward compatibility?
I didn't quite look in detail, just asking to conform it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backwards compatibility on the wire you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backwards compatibility on the wire you mean?
Yes, I believe 0
currently has a special meaning (i.e. neither true or false, iow, the other side didn't include it). If we move to regular bool
that will be false
. As said, I'm not entirely sure of the details, I was wondering if problems could arise from an old node talking to a new node.
In any case, all things considered, even if there were issue, I think they would be minor and unlikely enough to not bother and keep this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be okay, it's just a different translation to the new struct of a flags argument of RequestVoteResult
on the wire. In the past if we received the old message we would fill out the struct with raft_tribool_unknown
, now we fill it out with version = 1
and raft ignores the new bool pre_vote
struct field if version < 2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered that for canonical/dqlite#420 we want to support including a request ID in various places like struct raft_apply
, I don't think that's addressed by the current branch?
Edit: and see @freeekanayaka's comment in that issue about client sessions, which might raise other ABI/API concerns
Thanks, I'll take a look! |
@MathieuBordere here's how I imagine the config stuff working, hopefully we're on the same page. Writing it out in detail to clarify my own thinking & so that you can catch if I'm misunderstanding something :) We add a new field to
I think that should ensure that we always have a config stored in memory that can be safely rolled back to. |
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
To allow future extension of the struct. Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
- Add padding to different messages. - Version the messages.
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
2086d0a
to
af13184
Compare
raft_transfer doesn't go through the raft log, so it's a bit of a special case, but for uniformity I thought it was nicer to add the RAFT_REQUEST fields. I can image the `req_id` and `reserved` fields can come in handy one day. Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
af13184
to
4c8f360
Compare
I propose to merge this next Monday, unless @freeekanayaka has any objections. |
Proposal to update the raft API to allow for future extensions without breaking binaries linked against older versions.
Probably easiest to review by commit, it's draft so input is more then welcome.