Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race in change processing can cause deadlock. #518

Closed
joel-bluedata opened this issue Oct 15, 2021 · 1 comment
Closed

Race in change processing can cause deadlock. #518

joel-bluedata opened this issue Oct 15, 2021 · 1 comment
Assignees
Labels
Priority: Blocker Project: Cluster Reconcile beyond simple xlate of model to K8s spec Type: Bug

Comments

@joel-bluedata
Copy link
Member

We have a race condition that can happen when connected resources change. Might be possible for other changes too.

In syncMembers there's a sequence like this:

  • Do all processing for currently "ready" members, i.e. handle reconfigs by applying new configmeta and/or generating notifications.
  • If ANY members in ready, creating, or createPending states don't yet have the latest configmeta JSON (and they should), bail out.
  • Proceed to handle all non-ready-state members.

Checking whether a member has the latest configmeta is done by comparing a member's LastConfigDataGeneration to its SpecGenerationToProcess. If LastConfigDataGeneration is nil, that's fine, this member has no idea of any configmeta yet so we don't need to worry about it being stale. If however it is non-nil and less than SpecGenerationToProcess, then a new configmeta JSON injection is needed.

LastConfigDataGeneration is initially nil for a member, but it is set to the SpecGenerationToProcess value in:

  • appConfig called from handleCreatingMembers (initial config case)
  • handleReadyMembers (reconfig case)

And the value-to-compare-to, SpecGenerationToProcess, is bumped (indicating the need for new configmeta JSON injection) in syncCluster for two reasons:

  • cluster membership is changing
  • connected resources are changing

Therefore, we can get this sequence:

  • A member starts up and gets into creating state with some value for LastConfigDataGeneration.
  • While the member is chewing on its app configuration scripts (over multiple handlers), the connected resources change.
  • So, SpecGenerationToProcess gets bumped up for that member.
  • We can't make any progress now. In syncMembers we will always bail out before reaching handleCreatingMembers to finish the app config and move the member to some ready or error state.

This is unlikely to happen in "normal" operation but can certainly happen in the backup-restore case since things like create time and UID will change for the connected resources as they are restored, which will make the hash of those resources different from the last-known hash.

It's worth thinking about why this race hasn't arisen previously from cluster membership changes. I don't know yet! Maybe just lucky.

The changes to do that "bail out if configmeta not updated" came in PR #272 (handling member container re-creation).

I think the specific vulnerability here happens only when a member stays in creating state for multiple handler passes. The first pass sets its LastConfigDataGeneration to non-nil, which makes it then vulnerable to triggering the bailout case on subsequent passes.


My initial thoughts:

  • There's no need to do the spec generation check for createPending members; they will always have LastConfigDataGeneration = nil.
  • Members in creating state may have LastConfigDataGeneration = nil if they haven't been processed yet. Or, if their LastConfigDataGeneration is non-nil, they're already chewing on some version of the configmeta and we can't interrupt them anyway... we need to keep running their handlers.

So, to be concrete. The code currently looks like this:

    for _, r := range roles {
        if ready, readyOk := r.membersByState[memberReady]; readyOk {
            handleReadyMembers(reqLogger, cr, r, configmetaGenerator)
            if allMembersUpdated {
                allMembersUpdated = checkGenOk(ready)
            }
        }
        if allMembersUpdated {
            if createPending, createPendingOk := r.membersByState[memberCreatePending]; createPendingOk {
                allMembersUpdated = checkGenOk(createPending)
            }
        }
        if allMembersUpdated {
            if creating, creatingOk := r.membersByState[memberCreating]; creatingOk {
                allMembersUpdated = checkGenOk(creating)
            }
        }
    }
    if !allMembersUpdated {
        return nil
    }
    for _, r := range roles {
        if _, ok := r.membersByState[memberCreatePending]; ok {
            handleCreatePendingMembers(reqLogger, cr, r)
        }
        if _, ok := r.membersByState[memberCreating]; ok {
            handleCreatingMembers(reqLogger, cr, r, roles, configmetaGenerator)
        }
        if _, ok := r.membersByState[memberDeletePending]; ok {
            handleDeletePendingMembers(reqLogger, cr, r, roles)
        }
        if _, ok := r.membersByState[memberDeleting]; ok {
            handleDeletingMembers(reqLogger, cr, r)
        }
    }

Removing the checkGenOk stuff for createPending members is straightforward.

The other change I think we want to make is that we do always want to run handleCreatingMembers for any members where LastConfigDataGeneration is already non-nil. The rough-and-ready way to do that would be to add a parameter to handleCreatingMembers indicating whether it is for members that need initial config (not ongoing config). Let's say that parameter is "true" for initial config, then this code block would look like:

    for _, r := range roles {
        if ready, readyOk := r.membersByState[memberReady]; readyOk {
            handleReadyMembers(reqLogger, cr, r, configmetaGenerator)
            if allMembersUpdated {
                allMembersUpdated = checkGenOk(ready)
            }
        }
        if allMembersUpdated {
            handleCreatingMembers(reqLogger, cr, r, roles, configmetaGenerator, false)
            if creating, creatingOk := r.membersByState[memberCreating]; creatingOk {
                allMembersUpdated = checkGenOk(creating)
            }
        }
    }
    if !allMembersUpdated {
        return nil
    }
    for _, r := range roles {
        if _, ok := r.membersByState[memberCreatePending]; ok {
            handleCreatePendingMembers(reqLogger, cr, r)
        }
        if _, ok := r.membersByState[memberCreating]; ok {
            handleCreatingMembers(reqLogger, cr, r, roles, configmetaGenerator, true)
        }
        if _, ok := r.membersByState[memberDeletePending]; ok {
            handleDeletePendingMembers(reqLogger, cr, r, roles)
        }
        if _, ok := r.membersByState[memberDeleting]; ok {
            handleDeletingMembers(reqLogger, cr, r)
        }
    }

(with appropriate comments added of course).

I'll give it a whirl.

@joel-bluedata joel-bluedata added this to the 0.7.0 milestone Oct 15, 2021
@joel-bluedata joel-bluedata self-assigned this Oct 15, 2021
@joel-bluedata
Copy link
Member Author

Correction, createPending members can have a non-nil LastConfigDataGeneration if they are members with persistent storage that are being "rebooted" (container ID changed). That seems to just make it truly necessary to not have that check block the handleCreatingMembers processing though, for members that are in the middle of config.

It seems like that final new argument for handleCreatingMembers then is not specifically about whether or not LastConfigDataGeneration is nil. Instead it's about whether we're allowed yet to start a new initial config.

joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue Oct 15, 2021
See issue bluek8s#518.

Stimulated the case and made sure the deadlock didn't happen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Blocker Project: Cluster Reconcile beyond simple xlate of model to K8s spec Type: Bug
Projects
None yet
Development

No branches or pull requests

1 participant