Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't block handler during add/del notify #66

Open
joel-bluedata opened this issue Oct 5, 2018 · 1 comment
Open

don't block handler during add/del notify #66

joel-bluedata opened this issue Oct 5, 2018 · 1 comment
Assignees
Labels

Comments

@joel-bluedata
Copy link
Member

joel-bluedata commented Oct 5, 2018

This isn't quite as important as issue #64 (split from issue #54) since it's probably unlikely that notify-processing will be long-running. However the idea is generally the same. It would also help with issue #41 and issue #18. And it probably will make the syncMembers state machine a bit clearer so that we are not doing things on member A while processing member B.

The tricky bit with carrying a notify across handler invocations is that it has more context than our other member-processing operations. A possible solution involves stashing a list of "pending notifications" (each including the notify type and arg list) into the member status, although I'd need to think about whether that's OK when the notify arg list is long.

Let's say we did that. Roughly speaking then syncMembers could do this:

  • In current spots where we do notifyReadyNodes, instead put the pending notify into a queue in each of those members' statuses. Unlike notifyReadyNodes, we should even do this for non-running nodes, to help with issue handle pods in Error state #18.

  • When putting a delete notify into a member's queue, see if it has a pending add notify for the same node. If so, they should cancel each other out.

  • The overall cluster state should not move back to ready while there are outstanding notifies for running nodes.

  • A node should not move from delete_pending to deleting state while there are outstanding delete notifies for it on running nodes. (Possibly; see below.)

  • syncMembers would have a final pass at the end of each role-handling (or after all role-handling?) that processes the running ready nodes to see if they have pending notifies.

That last part is potentially the trickiest:

We have to be clear about the notify semantics we provide to the folks who are writing the notify-processing scripts. If there's going to be any coordination between notify handling across multiple nodes, we might have to be careful about how we decide which notifies to send out on a given handler pass.

Non-running members are a gotcha though. When a non-running member is resurrected our only option is going to be to replay a bunch of notifies to that one member. So we're probably going to have to insist that notify processing be only local to the member, not relying on any other member's notify handler to do something, and not caring in general about how notifies across multiple members are interleaved.

If that's the case then maybe we don't need to worry much about organizing the notifies to running members either. Obviously they should be sent in order for that one member, but ordering between members may be a non-issue.

(Similarly, resurrected non-running members will have to be OK with getting a delete notify for a member that has already been deleted. So maybe that needs to be an OK situation for running member notifies too.)

@joel-bluedata
Copy link
Member Author

PR #272 did the notify queue thing. We currently don't allow further spec changes while there are undelivered notifies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant