don't block handler during add/del notify #66

joel-bluedata · 2018-10-05T17:06:27Z

This isn't quite as important as issue #64 (split from issue #54) since it's probably unlikely that notify-processing will be long-running. However the idea is generally the same. It would also help with issue #41 and issue #18. And it probably will make the syncMembers state machine a bit clearer so that we are not doing things on member A while processing member B.

The tricky bit with carrying a notify across handler invocations is that it has more context than our other member-processing operations. A possible solution involves stashing a list of "pending notifications" (each including the notify type and arg list) into the member status, although I'd need to think about whether that's OK when the notify arg list is long.

Let's say we did that. Roughly speaking then syncMembers could do this:

In current spots where we do notifyReadyNodes, instead put the pending notify into a queue in each of those members' statuses. Unlike notifyReadyNodes, we should even do this for non-running nodes, to help with issue handle pods in Error state #18.
When putting a delete notify into a member's queue, see if it has a pending add notify for the same node. If so, they should cancel each other out.
The overall cluster state should not move back to ready while there are outstanding notifies for running nodes.
A node should not move from delete_pending to deleting state while there are outstanding delete notifies for it on running nodes. (Possibly; see below.)
syncMembers would have a final pass at the end of each role-handling (or after all role-handling?) that processes the running ready nodes to see if they have pending notifies.

That last part is potentially the trickiest:

We have to be clear about the notify semantics we provide to the folks who are writing the notify-processing scripts. If there's going to be any coordination between notify handling across multiple nodes, we might have to be careful about how we decide which notifies to send out on a given handler pass.

Non-running members are a gotcha though. When a non-running member is resurrected our only option is going to be to replay a bunch of notifies to that one member. So we're probably going to have to insist that notify processing be only local to the member, not relying on any other member's notify handler to do something, and not caring in general about how notifies across multiple members are interleaved.

If that's the case then maybe we don't need to worry much about organizing the notifies to running members either. Obviously they should be sent in order for that one member, but ordering between members may be a non-issue.

(Similarly, resurrected non-running members will have to be OK with getting a delete notify for a member that has already been deleted. So maybe that needs to be an OK situation for running member notifies too.)

joel-bluedata · 2020-02-15T23:26:45Z

PR #272 did the notify queue thing. We currently don't allow further spec changes while there are undelivered notifies.

joel-bluedata added Priority: High Type: Enhancement Project: Cluster Reconcile beyond simple xlate of model to K8s spec labels Oct 5, 2018

joel-bluedata self-assigned this Oct 5, 2018

joel-bluedata mentioned this issue Oct 8, 2018

member should not be notified if its role has no setup package #77

Closed

joel-bluedata added Priority: Low and removed Priority: High labels Oct 30, 2018

joel-bluedata mentioned this issue Feb 15, 2020

handle host/pod shutdown and reboot #272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't block handler during add/del notify #66

don't block handler during add/del notify #66

joel-bluedata commented Oct 5, 2018 •

edited

Loading

joel-bluedata commented Feb 15, 2020

don't block handler during add/del notify #66

don't block handler during add/del notify #66

Comments

joel-bluedata commented Oct 5, 2018 • edited Loading

joel-bluedata commented Feb 15, 2020

joel-bluedata commented Oct 5, 2018 •

edited

Loading