Skip to content
This repository was archived by the owner on Mar 4, 2024. It is now read-only.

API update #303

Merged
merged 20 commits into from
Dec 12, 2022
Merged

API update #303

merged 20 commits into from
Dec 12, 2022

Conversation

MathieuBordere
Copy link
Contributor

Proposal to update the raft API to allow for future extensions without breaking binaries linked against older versions.

Probably easiest to review by commit, it's draft so input is more then welcome.

@codecov-commenter
Copy link

codecov-commenter commented Aug 31, 2022

Codecov Report

Merging #303 (af13184) into master (fde5378) will increase coverage by 0.16%.
The diff coverage is 86.22%.

❗ Current head af13184 differs from pull request most recent head 4c8f360. Consider uploading reports for the commit 4c8f360 to get more accurate results

@@            Coverage Diff             @@
##           master     #303      +/-   ##
==========================================
+ Coverage   83.93%   84.10%   +0.16%     
==========================================
  Files          50       50              
  Lines        9400     9411      +11     
  Branches     2508     2500       -8     
==========================================
+ Hits         7890     7915      +25     
+ Misses        953      940      -13     
+ Partials      557      556       -1     
Impacted Files Coverage Δ
src/recv.c 76.78% <0.00%> (-1.98%) ⬇️
src/recv_request_vote_result.c 58.18% <0.00%> (-3.64%) ⬇️
src/state.c 54.16% <0.00%> (ø)
src/uv.c 84.30% <25.00%> (-0.55%) ⬇️
src/uv_tcp.c 91.02% <40.00%> (-3.50%) ⬇️
src/start.c 75.72% <66.66%> (ø)
src/raft.c 85.03% <73.33%> (-1.74%) ⬇️
src/client.c 73.62% <80.00%> (ø)
src/membership.c 71.31% <80.00%> (ø)
src/replication.c 77.68% <92.30%> (ø)
... and 16 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

src/log.h Outdated
/* Initialize an empty in-memory log of raft entries. */
void logInit(struct raft_log *l);
void logInit(struct raft_log **l);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe struct raft_log *logInit(void); instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, that's a bit more idiomatic.

Copy link
Contributor

@cole-miller cole-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something -- if we're adding a version field to raft_uv_transport to prepare for evolving it in the future, why don't we also add a reserved field as for some of the other structs?

@@ -70,6 +72,7 @@ struct raft_fixture
struct raft_fixture_event *event; /* Last event occurred. */
raft_fixture_event_cb hook; /* Event callback. */
struct raft_fixture_server *servers[RAFT_FIXTURE_MAX_SERVERS];
uint64_t _reserved[16]; /* For future expansion of struct. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this has an underscore where the other reserved fields don't.

@MathieuBordere
Copy link
Contributor Author

Maybe I'm missing something -- if we're adding a version field to raft_uv_transport to prepare for evolving it in the future, why don't we also add a reserved field as for some of the other structs?

Maybe the difference is somewhat arbitrary, but I consider those structs with reserved space 'internal' structs, meaning that raft uses them for whatever it wants, but they can still be allocated by the user or embedded in another struct (was one of the design decisions), and that's why we need enough room to add functionality.

The versioned structs are 'external', meaning the client fills out their functionality, they too can be allocated by the client or embedded in another struct. Based on the version, as long as the raft library doesn't read/write members that do not correspond to the version, memory accesses should be safe. In theory we could add padding to those structs, but I don't think it is needed imo.

So for internal structs we don't want the version approach because it will hamper us to add transparent functionality purely because the user did not allocate enough space for it.

@cole-miller
Copy link
Contributor

@MathieuBordere Thanks, makes sense to me!

include/raft.h Outdated
* give meaning the `zero` value because the raft implementation needs to know
* whether to other side knows the new field or not. `zero` means the other side
* does not know the new field/functionality (this is why raft_tribool exists).
* Implementers MUST always zero out unknown fields before passing them to raft.
Copy link
Contributor

@freeekanayaka freeekanayaka Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite realize this when we introduced raft_tribool, but now that I see the 0-means-older-code-on-the-other-side approach generalized here I'm thinking that perhaps this not the most natural approach.

In most cases 0 is an important and natural value to use. For example let's say that we introduce a new counter field in a message. Do we then have to use 0 to mean "the other side did not include this field", 1 to mean 0, 2 to mean 1, etc?

Plus, it creates an asymmetry between fields that we already have and the just follow the natural semantics for their values and fields that we add in the future.

Actually we already see the asymmetry in struct raft_request_vote_result, where the vote_granted field is a regular bool and the pre_vote field is a raft_tribool, which looks weird to me. If we had thought about the pre-vote issue addressed by pre_vote field since the very first version of this library and not instead after a while, we would certainly have used bool not raft_tribool, which is something telling. The fact that we use raft_tribool now seems a smell that we are mixing two separate concerns: one concern is to store a boolean value in a struct, the other concern is to know whether the other side did include this value in the message it sent us.

This makes me think that a more appropriate approach would be perhaps to separate these two concerns and be explicit about what the other side sent. There are various ways I can imagine to do that, for example adding a version field to struct raft_message or other variations of that idea. Before we discuss the detail about this possible approach though, I'd like to know how do you feel about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm certainly open to suggestions. The zero approach feels indeed a bit like a 'plumbing' solution.

Copy link
Contributor

@freeekanayaka freeekanayaka Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then my suggestion would be to add an int version field to all the various message structures (struct raft_request_vote, struct raft_request_vote_result, struct raft_append_entries, etc).

Note that I believe this is a bit better than adding the version field to struct raft_message as I was hinting above, both because what we're versioning is really the specific messages (and not struct raft_message), and because as bonus points it should also minimize code changes (since we often pass around the specific messages, not whole struct raft_message objects).

This would be pretty much similar to the version fields in the raft_io/raft_fsm structures, making the various message structures effectively "external" structures (borrowing the same terminology that you used in this comment).

It makes sense considering that those message structures are really part of the raft_io interface (specifically the send() and raft_io_recv_cb parts), so they "inherit" the fact of being external. When receiving messages via raft_io_recv_cb, the raft_io implementation should be in charge of telling raft core what to fields to expect, and it can be done via this new version field (like it does for its methods, via the raft_io->version field). When sending messages raft core should tell raft_io what fields it's providing, again via the version field.

Regarding encoding and decoding in the libuv-based raft_io implementation, we currently use the very first 8 bytes of a message to store the message type (see here and here). I'd suggest to change that slightly, in a fully backward compatible way. We should use the first byte to store the message type (a range of 255 is more than enough even for the future), the second byte to store the version, and the last two bytes are unused and can be reserved for future use. We can decide that version 0 is the current version of all messages, and current servers are already setting the second byte to 0, so we have backward compatibility.

Note that this also allows us to change the semantics of a message without changing its wire-level length, as at the moment we're using the message length as a "proxy" for its version.

As a side note, with this explicit message versioning approach we should not need to add padding to the various message structures, in the same way we don't use padding for raft_io/raft_fsm.

To keep things light, in this branch we could just add the version field to all the message structures and nothing else. The actual implementation of the encoding and decoding could be done separately. Or we could add the implementation as well, it shouldn't be a lot of code.

edit: Hm I just realized that the wire level change would actually break old servers if we set the version to something other than 0, so that needs a bit more thinking. I hope we can find something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the first 2 bytes as the message type, the next 2 bytes as version and rest is unused. Then the cast to unsigned short here should still be fine on old clients with the version set to non-0, or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the first 2 bytes as the message type, the next 2 bytes as version and rest is unused. Then the cast to unsigned short here should still be fine on old clients with the version set to non-0, or am I missing something?

I'm not sure what you mean. If version is set to non-0, old clients would get a broken message type no? Because they use byteFlip64 to get the type.

Unfortunately I believe the only option is to look at the message length, as we already do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, practically, we would still version the message structs, but instead of relying on an explicit version in the on wire message, we would rely on the size of the on wire message.

Exactly. As said, I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, practically, we would still version the message structs, but instead of relying on an explicit version in the on wire message, we would rely on the size of the on wire message.

Exactly. As said, I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.

I was doing it the other way around, only writing 2 bytes when encoding the message, but that's not strictly needed now, reading only 2 bytes when decoding is what is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, practically, we would still version the message structs, but instead of relying on an explicit version in the on wire message, we would rely on the size of the on wire message.

Exactly. As said, I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.

I was doing it the other way around, only writing 2 bytes when encoding the message, but that's not strictly needed now, reading only 2 bytes when decoding is what is needed.

Exactly.

Copy link
Contributor Author

@MathieuBordere MathieuBordere Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.

I downcast the uint64_t type to uint16_t, I thought that might be the least intrusive way for now to only use the first 2 bytes of the preamble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also change the code to read only the first 2 bytes to decode the message type, so maybe at some point down the road we'll feel that enough time has passed that it's very unlikely to have both old code and new code servers operating together.

I downcast the uint64_t type to uint16_t, I thought that might be the least intrusive way for now to only use the first 2 bytes of the preamble.

Sounds right to me.

@MathieuBordere MathieuBordere force-pushed the api-v3 branch 2 times, most recently from 3d9a9fe to ddf3d5e Compare September 7, 2022 12:37
@MathieuBordere
Copy link
Contributor Author

I'm working on the dqlite disk-mode in parallel, I'm currently figuring out a decent proposal for the snapshot mechanism, and analyze if anything needs to be added to this PR for that.

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Dec 2, 2022

I think this is ready for review, let's maybe not merge it on a Friday as it would bump the .so number and break builds. Will remove draft state on Monday for this reason.

@cole-miller cole-miller self-requested a review December 2, 2022 16:09
@MathieuBordere MathieuBordere marked this pull request as ready for review December 5, 2022 12:33
@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Dec 5, 2022

I'm probably going to add the fix and the changes to struct raft for #250 to this PR otherwise we'll have to immediately start using the reserved fields and can't put the last committed configuration close to the struct raft members it logically belongs near to.

edit: I can probably just add a field struct raft_configuration committed_configuration to struct raft and not use it immediately. I think we all agree we need to keep a copy of the last committed configuration around, and it's logical to store it in struct raft. Let me know what you think of that @cole-miller.

edit2: Or maybe we should call it previous_configuration. Because on startup, a raft node doesn't know which log entries are committed. In that way it's imaginable that a node loads 2 (or more) configurations from the log entries and either (a) both are committed or (b) the first one is committed and the last one is not committed. In this case we need a way to save the second to last configuration to possibly revert to it. In case (a) it would not be correct to set committed_configuration imo as that one should always contain the last committed configuration. (I hope I'm being clear)

@MathieuBordere MathieuBordere marked this pull request as draft December 5, 2022 13:51
@MathieuBordere MathieuBordere marked this pull request as ready for review December 5, 2022 14:46
Copy link
Contributor

@cole-miller cole-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The individual commits look good to me, there are a couple places where it seems like we might have room to "opaquify" more but that's discretionary.

@@ -598,6 +596,8 @@ typedef void (*raft_close_cb)(struct raft *raft);
struct raft_change; /* Forward declaration */
struct raft_transfer; /* Forward declaration */

struct raft_log;

/**
* Hold and drive the state of a single raft server in a cluster.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily in favor of this, but did we consider making struct raft partially or wholly opaque?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it yes, but the argument to leave that as-is is to allow for stack allocation and embedding the struct raft in other structs (like dqlite is currently doing)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay -- no strong feelings either way on my end, just wanted to check whether it was on the table.

@MathieuBordere MathieuBordere marked this pull request as draft December 5, 2022 15:59
@MathieuBordere MathieuBordere marked this pull request as ready for review December 5, 2022 16:17
@cole-miller cole-miller self-requested a review December 5, 2022 16:41
raft_term term; /* Receiver's current term (candidate updates itself). */
bool vote_granted; /* True means candidate received vote. */
raft_tribool pre_vote; /* The response to a pre-vote RequestVote or not. */
bool pre_vote; /* The response to a pre-vote RequestVote or not. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice change, if I understand correctly the raft_tribool type is not needed anymore because we version messages? However, won't there issues with backward compatibility?

I didn't quite look in detail, just asking to conform it's ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compatibility on the wire you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compatibility on the wire you mean?

Yes, I believe 0 currently has a special meaning (i.e. neither true or false, iow, the other side didn't include it). If we move to regular bool that will be false. As said, I'm not entirely sure of the details, I was wondering if problems could arise from an old node talking to a new node.

In any case, all things considered, even if there were issue, I think they would be minor and unlikely enough to not bother and keep this change.

Copy link
Contributor Author

@MathieuBordere MathieuBordere Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be okay, it's just a different translation to the new struct of a flags argument of RequestVoteResult on the wire. In the past if we received the old message we would fill out the struct with raft_tribool_unknown, now we fill it out with version = 1 and raft ignores the new bool pre_vote struct field if version < 2.

Copy link
Contributor

@cole-miller cole-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered that for canonical/dqlite#420 we want to support including a request ID in various places like struct raft_apply, I don't think that's addressed by the current branch?

Edit: and see @freeekanayaka's comment in that issue about client sessions, which might raise other ABI/API concerns

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Dec 5, 2022

I just remembered that for canonical/dqlite#420 we want to support support including a request ID in various places like struct raft_apply, I don't think that's addressed by the current branch?

Edit: and see @freeekanayaka's comment in that issue about client sessions, which might raise other ABI/API concerns

Thanks, I'll take a look!

@cole-miller
Copy link
Contributor

cole-miller commented Dec 5, 2022

@MathieuBordere here's how I imagine the config stuff working, hopefully we're on the same page. Writing it out in detail to clarify my own thinking & so that you can catch if I'm misunderstanding something :)

We add a new field to struct raft that I'll call known_good_config; this represents a configuration that we will roll back to in the event that the CHANGE log entry that controls the current raft->configuration is truncated out of the log. Here is how we update the new field in various situations:

  • at raft_start, we initially don't know any config information, so both configuration and known_good_config are empty
  • at raft_bootstrap or raft_recover, known_good_config is initialized to the passed-in config, just like configuration is
  • when loading a snapshot causes us to increase our last known log index (i.e. we're replaying the log from disk at startup, or a leader is getting us up to speed), we set known_good_config to the config stored with the snapshot -- this is because an uncommitted config should never be snapshotted (possibly we do snapshot uncommitted configs currently; that should be fixed)
  • when installing a snapshot that doesn't cover new entries, I don't think an update is required, because even if known_good_config refers to a log entry that gets snapshotted we've already cached it when that log entry was applied (and an entry that hasn't been applied shouldn't be snapshotted)
  • when applying a CHANGE entry to the log, known_good_config is updated to that config

I think that should ensure that we always have a config stored in memory that can be safely rolled back to.

Mathieu Borderé added 16 commits December 7, 2022 13:54
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
To allow future extension of the struct.

Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
- Add padding to different messages.
- Version the messages.
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Mathieu Borderé added 3 commits December 8, 2022 11:32
raft_transfer doesn't go through the raft log, so it's a bit of a
special case, but for uniformity I thought it was nicer to add the
RAFT_REQUEST fields. I can image the `req_id` and `reserved` fields can
come in handy one day.

Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
@MathieuBordere
Copy link
Contributor Author

I propose to merge this next Monday, unless @freeekanayaka has any objections.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants