Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Enhance daemonset on rollingupdate. #1140

Conversation

zhangxiaoyu-zidif
Copy link

Now we find Daemonset has a weak ability to do rollingUpdate. So we need to do moere enhancement, such as pause, selector & and surge and more.

So we want to do some enhancement on Daemonset rollingupdate.

fix kubernetes/kubernetes#80130

/cc @resouer @answer1991

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2019
@k8s-ci-robot k8s-ci-robot requested a review from resouer July 14, 2019 23:55
@k8s-ci-robot
Copy link
Contributor

@zhangxiaoyu-zidif: GitHub didn't allow me to request PR reviews from the following users: answer1991.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Now we find Daemonset has a weak ability to do rollingUpdate. So we need to do moere enhancement, such as pause, selector & and surge and more.

So we want to do some enhancement on Daemonset rollingupdate.

fix kubernetes/kubernetes#80130

/cc @resouer @answer1991

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 14, 2019
- Add new fields `Selector` and `Partition` for existing RollingUpdateDaemonSet.

### Non-Goals
- This inplement is only implemeted to affect to update strategy of Daemonset. No other behaviors of Daemonset will be changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/inplement/implement

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. I will fix it this weekend.

- Add `SurgingRollingUpdate` for Daemonset as a new update strategy.
- Add two more fields, `partition` and `selector` for `RollingUpdate` and `SurgingRollingUpdate`.
- `partition` is the number of DaemonSet pods remained to be old version.
- `selector` is to query over nodes whoes labels are matched by the Daemonset `RollingUpdate` or `SurgingRollingUpdate`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be tricky if the daemonset use host port to expose service.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, user can choose their own proper update strategy. We just provide more choices. And to some case, surge is necessary, I will add them later.

@kow3ns
Copy link
Member

kow3ns commented Jul 15, 2019

/assign @janetkuo


1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version.

1. In some case, user do not want to stop some services when rolling update, so SurgingRollingUpdate is a must choice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you clarify this more? This sentence is a bit vague. Try making it more concrete/convincing. Maybe include some real world examples.

Also note that DaemonSet rolling update is designed to be delete-first-create-later (and only create when the old Pod is gone) because most DaemonSet pods cannot tolerate two instances running in the same node at a time. For example, it's common to have hostPort specified - how do you handle that?

Ref a previous proposal for the same effort: kubernetes/community#977

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I would give more real examples soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now In our cluster, we deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. As an online e-commercial services which have 200K+ QPS, we can not stops those services even in a very short time. So surgingRollingUpdate is a necessary strategy for our production environment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS. Adding partition and Selector for SurgingRollingUpdate will help user to find some errors such as hostPort conflicts, in a limit range of 404 disaster.


## Proposal

### User Stories

#### Story 1
In some end users' clusters, they deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. To an online e-commercial services which serve 200K+ QPS, it must cause cascading disaster to stop those infrastructure services even in a very short time by current Daemonset RollingUpdate strategy. So surgingRollingUpdate is a necessary strategy for our production environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a rolling update would solve this problem, you can't replace a running pod which specify host port in the spec without stopping it. SurgingRollingUpdate does not make sense to me, delete by partitions or in specific order may mitigate this problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SurgingRollingUpdate will solve those cases without binding host port, so it could be a supplement for existing update strategy. Operators should have knowledge to choose suitable one. As to partitions and selector, they can help to avoid mistake and reduce its cost.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@draveness Not sure if I'm understanding your correctly but: how rollout strategy is related to host port?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@draveness Not sure if I'm understanding your correctly but: how rollout strategy is related to host port?

It is a common pattern to use host port in daemonset, suppose we use a rolling update strategy like deployment, deamonsets should create a new pod with the same hostport on the same machine, but it will fail to start since the port on the machine is owned by the older daemonset pod. SurgingRollingUpdate does not hold since we can't create more pods with the same port on the same machine.

SurgingRollingUpdate will solve those cases without binding host port, so it could be a supplement for existing update strategy.

If we use a damonset without hostport, why not using deployment instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use daemonset for log collecting which does not need a hostport, but I think two log collecting pod running on the same machine could be problematic. Add a maxUnavailable field instead of surging to the daemonset pod would make more sense.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the intention of this proposal which is trying to propose an optional strategy instead of enhancement for existing ones.

But yeah, this KEP dose not make sense to me as well since it mixed too many things together, and the use cases are not clear.

### User Stories

#### Story 1
In some end users' clusters, they deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. To an online e-commercial services which serve 200K+ QPS, it must cause cascading disaster to stop those infrastructure services even in a very short time by current Daemonset RollingUpdate strategy. So surgingRollingUpdate is a necessary strategy for our production environment.
Copy link

@resouer resouer Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This narrative is problematic to me. A user story should be described in objective language and be general as possible as you can.

// Default value is 0.
// Maximum value is status.DesiredNumberScheduled, which means no pod will be updated.
// +optional
Partition *int32 `json:"partition,omitempty" protobuf:"varint,3,opt,name=partition"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sure why the existing MaxUnnavailable strategy is missing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misunderstanding, I use ”...“ to omit the existed fields.

Copy link

@resouer resouer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this KEP mixed three things at least in one doc, and barely provided concrete use cases in the narrative.

  1. Ability to Pause during DaemonSet rollout;
  2. Ability of Select & Partition during DaemonSet rollout;
  3. Ability of Surging during DaemonSet rollout.

IMO, each of these proposals will need its own KEP for both API change and detailed design. I would suggest you to create an umbrella issue to link them together, and propose them one by one. Start from the easiest change of course.


## Proposal

### User Stories

#### Story 1
In some end users' clusters, they deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. To an online e-commercial services which serve 200K+ QPS, it must cause cascading disaster to stop those infrastructure services even in a very short time by current Daemonset RollingUpdate strategy. So surgingRollingUpdate is a necessary strategy for our production environment.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the intention of this proposal which is trying to propose an optional strategy instead of enhancement for existing ones.

But yeah, this KEP dose not make sense to me as well since it mixed too many things together, and the use cases are not clear.


1. In some case, user do not want to stop some services when rolling update, so SurgingRollingUpdate is a must choice.

1. When rollingUpdate or surgeRollingUpdate, user should have some chioce to select some specific or random nodes to update pods running on it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/chioce/choices

@zhangxiaoyu-zidif
Copy link
Author

Seems this KEP mixed three things at least in one doc, and barely provided concrete use cases in the narrative.

  1. Ability to Pause during DaemonSet rollout;
  2. Ability of Select & Partition during DaemonSet rollout;
  3. Ability of Surging during DaemonSet rollout.

IMO, each of these proposals will need its own KEP for both API change and detailed design. I would suggest you to create an umbrella issue to link them together, and propose them one by one. Start from the easiest change of course.

OK, I will split them soon and re-send those KEPs.

@damdo
Copy link
Member

damdo commented Oct 30, 2019

Hey @zhangxiaoyu-zidif any update on this :) ?

@zhangxiaoyu-zidif
Copy link
Author

Hey @zhangxiaoyu-zidif any update on this :) ?

Sorry I am busy because of Alibaba‘s Annual Singles' Day Shopping Festival, this doc will be updated later.


Consider the following scenarios:-

1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do deployments handle this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I make it confused. I just want to execute command like using deployment for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we should provide an observation period(pause for a while) for the upgraded Pod to determine that the upgrade is real and effective. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a good idea to serve an observation channel for users. such as show which node has finished the update and which has not in the whole update process. As to observation period, let me think how to import this feature. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wgliang we can add some fileds:

// The total nodes that are running updated daemon pod
    // key is node while value is namespace podName: <Namespace/PodName>.
    // +optional
    UpdatedNumberScheduledNodes map[string]string `json:"updatedNumberScheduledNodes,omitempty" protobuf:"bytes,11,rep,name=updatedNumberScheduledNodes"`

to see what the status of nodes and its daemon pod. if you want to pause the sync, you can pasued the action of ds sync.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zhangxiaoyu-zidif
To complete the pull request process, please assign janetkuo
You can assign the PR to them by writing /assign @janetkuo in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@zhangxiaoyu-zidif: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-enhancements-verify 2589459 link /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.


Consider the following scenarios:-

1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we should provide an observation period(pause for a while) for the upgraded Pod to determine that the upgrade is real and effective. :)


1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version.

1. When rollingUpdate, user should have some chioces to select some specific or random nodes to update pods running on it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we can control the progress of the upgrade, such as setting the target percentage of the upgrade.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's why we want to use field partition enhance update strategy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really essential in a complex private cloud environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also landing DaemonSet to upgrade some agent components. We also encountered these problems. From the current point of view, the only feasible solution is CRD. Looking forward to this KEP landing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will send our implement to other project to release this enhancement at the same time. But if it could be adopted by K/K, that is the best choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, agree with U.

@smarterclayton
Copy link
Contributor

Did we split surge into a new enhancement?

I think surge is the most obvious improvement for availability of existing DaemonSets (since there's no other easy way to mitigate the duplication). I suspect we could get fairly fast agreement on #51161 with a default value of zero. The primary user is host networking components that expect to coexist (and are not using host ports on pod specs, which are advisory only). I think bypassing host port requesting is a reasonable limitation for those sorts of use cases. @soltysh @dcbw because of feedback from networking providers.

- Add `SurgingRollingUpdate` for Daemonset as a new update strategy.
- Add two more fields, `partition` and `selector` for `RollingUpdate`
- `partition` is the number of DaemonSet pods remained to be old version.
- `selector` is to query over nodes whoes labels are matched by the Daemonset `RollingUpdate`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DaemonSet already has a nodeSelector which selects where to run the pods, why do we need to select a subset of those nodes by selector? What is the real use case ?

@smarterclayton
Copy link
Contributor

I'm going to open a surge proposal since it has been increasingly difficult to maintain availability for node level components in daemonsets without it. I have a working implementation which mirrors deployments which I'm using to test with node services that must remain available during upgrade (i.e. CNI, CSI, or other node plugins).

@smarterclayton
Copy link
Contributor

I opened #1591 and #1590 to carve off the surge aspect. I believe (and have prototyped) the implementation of MaxSurge consistent with Deployments is fairly straightforward and would require very little novel change to Kubernetes.

// A label query over nodes that are managed by the daemon set RollingUpdate.
// Must match in order to be controlled.
// It must match the node's labels.
Selector *metav1.LabelSelector `json:"selector" protobuf:"bytes,2,opt,name=selector"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you explain more about how this selector works with rolling update?

Note that DaemonSet works with node selector.
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#running-pods-on-only-some-nodes

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 1, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 31, 2020
@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Daemonset need more RollingUpdateStrategy