-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Enhance daemonset on rollingupdate. #1140
[WIP] Enhance daemonset on rollingupdate. #1140
Conversation
@zhangxiaoyu-zidif: GitHub didn't allow me to request PR reviews from the following users: answer1991. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
2e99536
to
78d299c
Compare
- Add new fields `Selector` and `Partition` for existing RollingUpdateDaemonSet. | ||
|
||
### Non-Goals | ||
- This inplement is only implemeted to affect to update strategy of Daemonset. No other behaviors of Daemonset will be changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/inplement/implement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. I will fix it this weekend.
- Add `SurgingRollingUpdate` for Daemonset as a new update strategy. | ||
- Add two more fields, `partition` and `selector` for `RollingUpdate` and `SurgingRollingUpdate`. | ||
- `partition` is the number of DaemonSet pods remained to be old version. | ||
- `selector` is to query over nodes whoes labels are matched by the Daemonset `RollingUpdate` or `SurgingRollingUpdate`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be tricky if the daemonset use host port to expose service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, user can choose their own proper update strategy. We just provide more choices. And to some case, surge
is necessary, I will add them later.
/assign @janetkuo |
|
||
1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version. | ||
|
||
1. In some case, user do not want to stop some services when rolling update, so SurgingRollingUpdate is a must choice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you clarify this more? This sentence is a bit vague. Try making it more concrete/convincing. Maybe include some real world examples.
Also note that DaemonSet rolling update is designed to be delete-first-create-later (and only create when the old Pod is gone) because most DaemonSet pods cannot tolerate two instances running in the same node at a time. For example, it's common to have hostPort
specified - how do you handle that?
Ref a previous proposal for the same effort: kubernetes/community#977
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I would give more real examples soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now In our cluster, we deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. As an online e-commercial services which have 200K+ QPS, we can not stops those services even in a very short time. So surgingRollingUpdate is a necessary strategy for our production environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS. Adding partition and Selector for SurgingRollingUpdate will help user to find some errors such as hostPort conflicts, in a limit range of 404 disaster.
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
#### Story 1 | ||
In some end users' clusters, they deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. To an online e-commercial services which serve 200K+ QPS, it must cause cascading disaster to stop those infrastructure services even in a very short time by current Daemonset RollingUpdate strategy. So surgingRollingUpdate is a necessary strategy for our production environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a rolling update would solve this problem, you can't replace a running pod which specify host port in the spec without stopping it. SurgingRollingUpdate
does not make sense to me, delete by partitions or in specific order may mitigate this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SurgingRollingUpdate will solve those cases without binding host port, so it could be a supplement for existing update strategy. Operators should have knowledge to choose suitable one. As to partitions and selector, they can help to avoid mistake and reduce its cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@draveness Not sure if I'm understanding your correctly but: how rollout strategy is related to host port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@draveness Not sure if I'm understanding your correctly but: how rollout strategy is related to host port?
It is a common pattern to use host port in daemonset, suppose we use a rolling update strategy like deployment, deamonsets should create a new pod with the same hostport on the same machine, but it will fail to start since the port on the machine is owned by the older daemonset pod. SurgingRollingUpdate does not hold since we can't create more pods with the same port on the same machine.
SurgingRollingUpdate will solve those cases without binding host port, so it could be a supplement for existing update strategy.
If we use a damonset without hostport, why not using deployment instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use daemonset for log collecting which does not need a hostport, but I think two log collecting pod running on the same machine could be problematic. Add a maxUnavailable
field instead of surging to the daemonset pod would make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's the intention of this proposal which is trying to propose an optional strategy instead of enhancement for existing ones.
But yeah, this KEP dose not make sense to me as well since it mixed too many things together, and the use cases are not clear.
### User Stories | ||
|
||
#### Story 1 | ||
In some end users' clusters, they deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. To an online e-commercial services which serve 200K+ QPS, it must cause cascading disaster to stop those infrastructure services even in a very short time by current Daemonset RollingUpdate strategy. So surgingRollingUpdate is a necessary strategy for our production environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This narrative is problematic to me. A user story should be described in objective language and be general as possible as you can.
// Default value is 0. | ||
// Maximum value is status.DesiredNumberScheduled, which means no pod will be updated. | ||
// +optional | ||
Partition *int32 `json:"partition,omitempty" protobuf:"varint,3,opt,name=partition"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sure why the existing MaxUnnavailable
strategy is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for misunderstanding, I use ”...“ to omit the existed fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this KEP mixed three things at least in one doc, and barely provided concrete use cases in the narrative.
- Ability to
Pause
during DaemonSet rollout; - Ability of
Select & Partition
during DaemonSet rollout; - Ability of
Surging
during DaemonSet rollout.
IMO, each of these proposals will need its own KEP for both API change and detailed design. I would suggest you to create an umbrella issue to link them together, and propose them one by one. Start from the easiest change of course.
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
#### Story 1 | ||
In some end users' clusters, they deploy DNS resolver and other containers serving infrastructure by daemonset in every nodes. To an online e-commercial services which serve 200K+ QPS, it must cause cascading disaster to stop those infrastructure services even in a very short time by current Daemonset RollingUpdate strategy. So surgingRollingUpdate is a necessary strategy for our production environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's the intention of this proposal which is trying to propose an optional strategy instead of enhancement for existing ones.
But yeah, this KEP dose not make sense to me as well since it mixed too many things together, and the use cases are not clear.
|
||
1. In some case, user do not want to stop some services when rolling update, so SurgingRollingUpdate is a must choice. | ||
|
||
1. When rollingUpdate or surgeRollingUpdate, user should have some chioce to select some specific or random nodes to update pods running on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/chioce/choices
OK, I will split them soon and re-send those KEPs. |
Hey @zhangxiaoyu-zidif any update on this :) ? |
Sorry I am busy because of Alibaba‘s Annual Singles' Day Shopping Festival, this doc will be updated later. |
|
||
Consider the following scenarios:- | ||
|
||
1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do deployments handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I make it confused. I just want to execute command like using deployment for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we should provide an observation period(pause for a while) for the upgraded Pod to determine that the upgrade is real and effective. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a good idea to serve an observation channel for users. such as show which node has finished the update and which has not in the whole update process. As to observation period, let me think how to import this feature. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgliang we can add some fileds:
// The total nodes that are running updated daemon pod
// key is node while value is namespace podName: <Namespace/PodName>.
// +optional
UpdatedNumberScheduledNodes map[string]string `json:"updatedNumberScheduledNodes,omitempty" protobuf:"bytes,11,rep,name=updatedNumberScheduledNodes"`
to see what the status of nodes and its daemon pod. if you want to pause the sync, you can pasued the action of ds sync.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zhangxiaoyu-zidif The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@zhangxiaoyu-zidif: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
||
Consider the following scenarios:- | ||
|
||
1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we should provide an observation period(pause for a while) for the upgraded Pod to determine that the upgrade is real and effective. :)
|
||
1. When executing Daemonset update, even those Pods run well, user need to pause updation to check if some results of new version image meet expect. If not, user can rollback images to stable version. | ||
|
||
1. When rollingUpdate, user should have some chioces to select some specific or random nodes to update pods running on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we can control the progress of the upgrade, such as setting the target percentage of the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's why we want to use field partition
enhance update strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really essential in a complex private cloud environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are also landing DaemonSet to upgrade some agent components. We also encountered these problems. From the current point of view, the only feasible solution is CRD. Looking forward to this KEP landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will send our implement to other project to release this enhancement at the same time. But if it could be adopted by K/K, that is the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, agree with U.
Did we split surge into a new enhancement? I think surge is the most obvious improvement for availability of existing DaemonSets (since there's no other easy way to mitigate the duplication). I suspect we could get fairly fast agreement on #51161 with a default value of zero. The primary user is host networking components that expect to coexist (and are not using host ports on pod specs, which are advisory only). I think bypassing host port requesting is a reasonable limitation for those sorts of use cases. @soltysh @dcbw because of feedback from networking providers. |
- Add `SurgingRollingUpdate` for Daemonset as a new update strategy. | ||
- Add two more fields, `partition` and `selector` for `RollingUpdate` | ||
- `partition` is the number of DaemonSet pods remained to be old version. | ||
- `selector` is to query over nodes whoes labels are matched by the Daemonset `RollingUpdate`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DaemonSet already has a nodeSelector which selects where to run the pods, why do we need to select a subset of those nodes by selector? What is the real use case ?
I'm going to open a surge proposal since it has been increasingly difficult to maintain availability for node level components in daemonsets without it. I have a working implementation which mirrors deployments which I'm using to test with node services that must remain available during upgrade (i.e. CNI, CSI, or other node plugins). |
// A label query over nodes that are managed by the daemon set RollingUpdate. | ||
// Must match in order to be controlled. | ||
// It must match the node's labels. | ||
Selector *metav1.LabelSelector `json:"selector" protobuf:"bytes,2,opt,name=selector"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you explain more about how this selector
works with rolling update?
Note that DaemonSet works with node selector.
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#running-pods-on-only-some-nodes
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Now we find Daemonset has a weak ability to do rollingUpdate. So we need to do moere enhancement, such as pause, selector & and surge and more.
So we want to do some enhancement on Daemonset rollingupdate.
fix kubernetes/kubernetes#80130
/cc @resouer @answer1991