[WIP] Add restart cluster command #408

tao12345666333 · 2019-03-27T09:27:42Z

xref: #148

Add restart cluster command
The container's IP will be change, when restart Nodes(containers). We need update the admin.conf and others.
- Change HAProxy's config file. Make it can forward request to correct control plane.
- Make control plane can work correctly. Include api-server , etcd and some certs.

k8s-ci-robot · 2019-03-27T09:27:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tao12345666333
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: bentheelder

If they are not already assigned, you can assign the PR to them by writing /assign @bentheelder in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123

@tao12345666333 thanks for working on this!

The container's IP will be change, when restart Nodes(containers). We need update the admin.conf.

this seems tricky. if the IP of a load balancer or a control plane node change this means that all workers have to rejoin the cluster. is there a way to make docker restart keep the old IP/port?

pkg/cluster/internal/restart/restart.go

tao12345666333 · 2019-03-27T11:23:27Z

is there a way to make docker restart keep the old IP/port?

if we use docker run with --ip flag maybe resolve it.

pkg/cluster/nodes/nodes.go

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

pkg/cluster/internal/create/actions/loadbalancer/haproxy.go

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

tao12345666333 · 2019-04-30T02:49:02Z

after #461 be merged we can make this simple. although the core issue to be addressed is still IP changes.

I will complete this PR as soon as possible after the end of this holiday.

I need to complete the following sections.

change all old IP to new IP at /etc/kubernetes.
regenerate apiserver and etcd peer certs

aojea · 2019-04-30T09:35:48Z

just sharing my thoughts, what's the best option regarding the IP address problem?

Try to keep the same ip addresses on all the nodes after restart?
Regenerate all certificates after restart matching the new ip addresses?

I guess that 1 is the option closer to real user scenarios, and looking at docker networking seems possible to assign static ip addresses to the node so a possible solution can be.

Get node ip address
Stop node
Start node with address obtained in 1

tao12345666333 · 2019-04-30T09:49:48Z

Thanks for your point.

My current practice is the first one.

Because the scenario we first encountered was due to "docker restart", #148 this does not guarantee that the original IP is not already occupied. It may not be suitable to start it with the original static IP.

Of course, if the restart is done in 1 , then we should select the higher IP in the IP address pool at startup to avoid being occupied.

tao12345666333 · 2019-04-30T09:55:46Z

Do you have any suggestions? @BenTheElder @neolit123 I think it would be easier to do this by using kubeadm.

neolit123 · 2019-04-30T12:44:20Z

pkg/cluster/nodes/nodes.go

+			return err
+		}
+
+		if !node.WaitForDocker(time.Now().Add(time.Second * 60)) {


i think this PR will be affected by the changes in:
#461

yes. after the PR has been merged these codes remove.

I need to rework HAproxy in the new world ™️ as well, so those parts will be getting a PR soon and also will need updating... sorry about that!

do not mind. as long as we can push things forward, it will suffice.

k8s-ci-robot · 2019-05-01T16:53:55Z

@tao12345666333: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2019-05-02T07:45:21Z

@tao12345666333 I think that the problem is that we are using docker restart that's adding the complexity of dealing with the IP assignment.
IIUIC the goal is to simulate a node restart. Since the nodes are using systemd I think that it could be possible to implement a "simulated" restart just restarting all service inside the container systemctl restart kubelet containerd ... getting rid of the docker restart command.
@neolit123 @fabriziopandini @BenTheElder any thoughts?

neolit123 · 2019-05-02T12:30:22Z

the original issue #148 (comment)
talks about docker on the host being restarted which renders the cluster broken.

with this comment:
#148 (comment)
we are one step further, but i don't think the IP update is avoidable here.

can we just use container networking instead of IPs (e.g. run --network).
kubeadm should work be fine with that...

BenTheElder · 2019-05-02T17:43:15Z

we should explore networks with --network
the primary goal AFAIK is to survive container restarts, mostly from users on docker for mac after restarting the daemon i'd guess.
after we sort out networking, this should just be starting the containers with matching labels, fixmounts etc. is gone now :-)

tao12345666333 · 2019-05-05T07:07:26Z

I will open another PR to handle the network problem(using --network instead of IPs). After the processing is completed, I will process this PR.

Thanks all. 👍

Ilyes512 · 2019-09-15T13:18:16Z

removed this from the v0.5.0 milestone 26 days ago

It would be really awesome if it was possible to restart kind cluster(s) after for example rebooting (MacOS) your host. I (want to) use kind for local development. My end goal is to replace minikube.

Luckily recreating the cluster(s) is really fast, but it would still be nice to just (re)start it. I am currently looking at Velero and see if I can use that to restore the state after recreation.

k8s-ci-robot · 2019-10-02T20:53:36Z

@tao12345666333: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kind-conformance-parallel-1-14	`0947f8d`	link	`/test pull-kind-conformance-parallel-1-14`
pull-kind-conformance-parallel-1-15	`0947f8d`	link	`/test pull-kind-conformance-parallel-1-15`
pull-kind-unit	`0947f8d`	link	`/test pull-kind-unit`
pull-kind-conformance-parallel-1-16	`0947f8d`	link	`/test pull-kind-conformance-parallel-1-16`
pull-kind-e2e-kubernetes	`0947f8d`	link	`/test pull-kind-e2e-kubernetes`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

BenTheElder · 2019-11-01T14:11:03Z

this will need to be revisited on top of the provider work after we figure out the network issues

thanks again for the PR, we'll surely reference this when we've resolved the network part and come to revive restart support.

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 27, 2019

k8s-ci-robot requested review from BenTheElder and krzyzacy March 27, 2019 09:27

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 27, 2019

tao12345666333 mentioned this pull request Mar 27, 2019

Cluster doesn't restart when docker restarts #148

Closed

neolit123 reviewed Mar 27, 2019

View reviewed changes

pkg/cluster/internal/restart/restart.go Outdated Show resolved Hide resolved

vincepri reviewed Mar 27, 2019

View reviewed changes

pkg/cluster/nodes/nodes.go Show resolved Hide resolved

tao12345666333 force-pushed the add-restart branch from af64afa to 2f38d22 Compare April 26, 2019 06:22

add restart cluster command.

423799c

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

tao12345666333 force-pushed the add-restart branch from 2f38d22 to de495f8 Compare April 26, 2019 10:34

aojea reviewed Apr 26, 2019

View reviewed changes

pkg/cluster/internal/create/actions/loadbalancer/haproxy.go Show resolved Hide resolved

refactor haproxy action

cfec24c

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

tao12345666333 force-pushed the add-restart branch from de495f8 to cfec24c Compare April 26, 2019 14:50

HAProxy restart finish.

0947f8d

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

neolit123 reviewed Apr 30, 2019

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 1, 2019

tao12345666333 mentioned this pull request May 5, 2019

Using bridge network for nodes #484

Closed

4 tasks

BenTheElder added this to the 0.4 milestone May 15, 2019

BenTheElder modified the milestones: v0.4.0, v0.5.0 Jun 20, 2019

BenTheElder removed this from the v0.5.0 milestone Aug 20, 2019

BenTheElder closed this Nov 1, 2019

tao12345666333 mentioned this pull request Apr 19, 2022

Add start and stop command to kind. #2715

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add restart cluster command #408

[WIP] Add restart cluster command #408

tao12345666333 commented Mar 27, 2019 •

edited

Loading

k8s-ci-robot commented Mar 27, 2019

neolit123 left a comment •

edited

Loading

tao12345666333 commented Mar 27, 2019

tao12345666333 commented Apr 30, 2019

aojea commented Apr 30, 2019

tao12345666333 commented Apr 30, 2019

tao12345666333 commented Apr 30, 2019

neolit123 Apr 30, 2019

tao12345666333 Apr 30, 2019

BenTheElder May 1, 2019

tao12345666333 May 1, 2019

k8s-ci-robot commented May 1, 2019

aojea commented May 2, 2019

neolit123 commented May 2, 2019

BenTheElder commented May 2, 2019

tao12345666333 commented May 5, 2019

Ilyes512 commented Sep 15, 2019 •

edited

Loading

k8s-ci-robot commented Oct 2, 2019

BenTheElder commented Nov 1, 2019

[WIP] Add restart cluster command #408

[WIP] Add restart cluster command #408

Conversation

tao12345666333 commented Mar 27, 2019 • edited Loading

k8s-ci-robot commented Mar 27, 2019

neolit123 left a comment • edited Loading

Choose a reason for hiding this comment

tao12345666333 commented Mar 27, 2019

tao12345666333 commented Apr 30, 2019

aojea commented Apr 30, 2019

tao12345666333 commented Apr 30, 2019

tao12345666333 commented Apr 30, 2019

neolit123 Apr 30, 2019

Choose a reason for hiding this comment

tao12345666333 Apr 30, 2019

Choose a reason for hiding this comment

BenTheElder May 1, 2019

Choose a reason for hiding this comment

tao12345666333 May 1, 2019

Choose a reason for hiding this comment

k8s-ci-robot commented May 1, 2019

aojea commented May 2, 2019

neolit123 commented May 2, 2019

BenTheElder commented May 2, 2019

tao12345666333 commented May 5, 2019

Ilyes512 commented Sep 15, 2019 • edited Loading

k8s-ci-robot commented Oct 2, 2019

BenTheElder commented Nov 1, 2019

tao12345666333 commented Mar 27, 2019 •

edited

Loading

neolit123 left a comment •

edited

Loading

Ilyes512 commented Sep 15, 2019 •

edited

Loading