cri: handle the migration to the new configuration format #1251

mssola · 2020-07-14T06:33:13Z

Why is this PR needed?

Fixes SUSE/avant-garde#1679
Fixes SUSE/avant-garde#1785
See SUSE/avant-garde#1679
See SUSE/caasp-release-notes#51

What does this PR do?

Ensure that the local files of the user are properly migrated to the new cri-o configuration format.

Info for QA

In order to test this you should first create a v4.2.2 cluster with an old skuba. Then, use a skuba built with master with this patch applied, and call skuba cluster upgrade config. You should see:

The old addons/cri/default_flags should've been removed.
Now you should have addons/cri/conf.d with 01-caasp.conf and README in it.

Anything else a reviewer needs to know?

With this SUSE/caasp-release-notes#51 should not be needed.

mjura

I would rename this new command to skuba cluster update config or skuba cluster update config because skuba cluster upgrade apply suggests that it will upgrade all nodes in the cluster

kkaempf · 2020-07-14T07:26:20Z

How can this be tested ?

evrardjp

This indeed handles the migration (I haven't tested it) of the local files.
However, I think we miss the step in upgrades to also replace the sysconfig in the destination node.
Will this be in a separate PR or just a separate commit in this PR?

pkg/skuba/actions/cluster/init/init.go

evrardjp · 2020-07-14T07:37:50Z

internal/pkg/skuba/deployments/ssh/cri.go

+
+func criGenerateLocalConfiguration() error {
+	cfg := clusterinit.InitConfiguration{
+		PauseImage:        kubernetes.ComponentContainerImageForClusterVersion(kubernetes.Pause, kubernetes.LatestVersion()),


is LatestVersion correct, or should that be the next version?

At least this is giving me this result: registry.suse.de/devel/caasp/5/containers/containers/caasp/v5/pause:3.2, which I believe is fine, no?

It is fine for now, but it might not be in the future, if the user is lagging behind more than a version. No?

Well, in this case they should be using the proper skuba for each scenario (same as other upgrades). I don't think we have a next version somewhere in the code, right?

Mmm interesting, I thought we had that, because we have a path in the plan and apply commands, when dealing with multiple k8s versions, but I didn't check the code.

internal/pkg/skuba/deployments/ssh/cri.go

cmd/skuba/cluster/upgrade.go

mssola · 2020-07-14T13:36:26Z

How can this be tested ?

Updated the PR.

PR has been updated, need a new review.

mjura

LGTM

evrardjp

A good start. Here are a few changes to clean up the duties of each of the functions called.

pkg/skuba/actions/node/upgrade/apply.go

evrardjp · 2020-07-15T11:27:00Z

cmd/skuba/cluster/upgrade.go

+		Use:   "config",
+		Short: "Upgrades the local configuration",
+		Run: func(cmd *cobra.Command, args []string) {
+			if err := ssh.CriMigrate(); err != nil {


this is very misleading. You are saying migrating local configuration, yet this function run things on the remote node.

For me the upgrade localconfig should be dealing with just local things, while the things happening on target nodes can be dealt with during node upgrade.

This function does not touch files on remote nodes. It just updates the local configuration. Thus, I see two things:

It should not go into the ssh package, as you said in another comment.

We may want to change the subcommand name from config to localconfig, so to reinforce the idea of changes only happening locally, as it has been suggested. What do you think?

internal/pkg/skuba/deployments/ssh/cri.go

evrardjp · 2020-07-15T11:32:41Z

internal/pkg/skuba/deployments/ssh/testdata/addons/cri/default_flags

@@ -0,0 +1,7 @@
+## Path           : System/Management


We should NOT provide this. This should be deleted. We should trust the sysconfig coming from the package instead, and copy it from the package (in /usr/ iirc), to the /etc/sysconfig/.

Bear in mind that this is just test data. This is not planned to be installed anywhere, and for testing purposes I will not rely on developers having the file already on their machines.

pkg/skuba/actions/cluster/init/init.go

pkg/skuba/actions/node/upgrade/apply.go

evrardjp

A few more changes required IMO.

evrardjp · 2020-07-16T08:36:52Z

internal/pkg/skuba/deployments/ssh/cri.go

-	}
-
-	if _, _, err = t.ssh("mv -f /etc/sysconfig/crio /etc/sysconfig/crio.backup"); err != nil {
+	if _, _, err := t.ssh("mv -f /usr/share/fillup-templates/sysconfig.crio /etc/sysconfig/crio"); err != nil {


not sure if mv is a good idea, that would remove the file from its expected location, is that fine for zypper?
Otherwise the location is fine for me.

evrardjp · 2020-07-16T08:38:50Z

internal/pkg/skuba/deployments/ssh/cri.go

 		return err
 	}
-	_, _, err = t.ssh("mv -f /tmp/cri.d/default_flags /etc/sysconfig/crio")
-	return err
+	return t.target.UploadFile(skuba.CriDockerDefaultsConfFile(), filepath.Join("/etc/cri/default_flags.local"))


I am not even sure we need to do this anymore. I think we can just remove it, as ppl will just upload their own config file anyway... WDYT?

evrardjp · 2020-07-16T08:41:28Z

internal/pkg/skuba/upgrade/cluster/cri.go

+	files := clusterinit.CriScaffoldFiles["criconfig"]
+	for _, file := range files {
+		if err := clusterinit.WriteScaffoldFile(file, cfg); err != nil {
+			_ = os.RemoveAll(skuba.CriConfDir())


Why do we remove the CriConfDir ? It means if the customer has custom files, they will be removed, no?

Not exactly. clusterinit.WriteScaffoldFile(file, cfg) will write scaffold files into that directory. Thus, any files with the same names will be effectively replaced. This means that customers have to back things up before performing the migration (as in with any migration).

That being said, if, for whatever reason, we fail to write these files, we should remove everything instead of sticking into an intermediary result (e.g. imagine that one file was successfully written but the other didn't). We don't want these kinds of scenarios, it's all or nothing.

Therefore, I think that this is actually the safest approach.

Not exactly. clusterinit.WriteScaffoldFile(file, cfg) will write scaffold files into that directory. Thus, any files with the same names will be effectively replaced. This means that customers have to back things up before performing the migration (as in with any migration).

I think in this case, the files being replaced are the "caasp defaults", which is fine, and the customer shouldn't touch them, so we are good. Yes, it's indeed safer to backup, while not necessary.

That being said, if, for whatever reason, we fail to write these files, we should remove everything instead of sticking into an intermediary result (e.g. imagine that one file was successfully written but the other didn't). We don't want these kinds of scenarios, it's all or nothing.

That's what I am not getting my head around. For me, we should fail, and say "There is something wrong that happened here, maybe you can figure it out with this error: %err". Not deleting the whole folder, which might contain user configuration on top of our default configs. That's very disruptive to me.

I agree it's better to bail, but I don't think it's better to delete the folder.

Therefore, I think that this is actually the safest approach.

Do you mean that, because this migration only runs ONCE, it should be okay to delete the folder, as there is no user configuration yet, and that code will probably never be used for something else? I am fine with that, but I would prefer to add this explicitly in the comments. A comment just before the dir removal should be enough, to clarify that "this is okay to do so because this process only runs once, so there should be no failure, and there should be no user configuration at this point yet"

internal/pkg/skuba/upgrade/cluster/cri.go

evrardjp

I would like to see an extra comment because future me will read this badly.

internal/pkg/skuba/upgrade/cluster/cri.go

evrardjp · 2020-07-17T10:05:19Z

internal/pkg/skuba/upgrade/cluster/cri.go

+	files := clusterinit.CriScaffoldFiles["criconfig"]
+	for _, file := range files {
+		if err := clusterinit.WriteScaffoldFile(file, cfg); err != nil {
+			_ = os.RemoveAll(skuba.CriConfDir())


Not exactly. clusterinit.WriteScaffoldFile(file, cfg) will write scaffold files into that directory. Thus, any files with the same names will be effectively replaced. This means that customers have to back things up before performing the migration (as in with any migration).

I think in this case, the files being replaced are the "caasp defaults", which is fine, and the customer shouldn't touch them, so we are good. Yes, it's indeed safer to backup, while not necessary.

That being said, if, for whatever reason, we fail to write these files, we should remove everything instead of sticking into an intermediary result (e.g. imagine that one file was successfully written but the other didn't). We don't want these kinds of scenarios, it's all or nothing.

That's what I am not getting my head around. For me, we should fail, and say "There is something wrong that happened here, maybe you can figure it out with this error: %err". Not deleting the whole folder, which might contain user configuration on top of our default configs. That's very disruptive to me.

I agree it's better to bail, but I don't think it's better to delete the folder.

Therefore, I think that this is actually the safest approach.

Do you mean that, because this migration only runs ONCE, it should be okay to delete the folder, as there is no user configuration yet, and that code will probably never be used for something else? I am fine with that, but I would prefer to add this explicitly in the comments. A comment just before the dir removal should be enough, to clarify that "this is okay to do so because this process only runs once, so there should be no failure, and there should be no user configuration at this point yet"

Ensure that the local files of the user are properly migrated to the new cri-o configuration format. Signed-off-by: Miquel Sabaté Solà <msabate@suse.com>

mjura

LGTM

mssola added enhancement New feature or request do not merge v5 5.0.0 labels Jul 14, 2020

mssola requested review from evrardjp and mjura July 14, 2020 06:33

mssola self-assigned this Jul 14, 2020

mjura reviewed Jul 14, 2020

View reviewed changes

evrardjp previously requested changes Jul 14, 2020

View reviewed changes

mssola force-pushed the crio-migration branch 2 times, most recently from 418c8e5 to 2d525dd Compare July 14, 2020 13:31

mssola force-pushed the crio-migration branch from 2d525dd to 16c9e12 Compare July 14, 2020 14:48

mssola marked this pull request as ready for review July 14, 2020 14:49

mssola force-pushed the crio-migration branch from 16c9e12 to db45245 Compare July 14, 2020 14:51

mssola removed the do not merge label Jul 15, 2020

mssola force-pushed the crio-migration branch from db45245 to 7a25e3f Compare July 15, 2020 10:34

chentex previously approved these changes Jul 15, 2020

View reviewed changes

mjura previously approved these changes Jul 15, 2020

View reviewed changes

evrardjp suggested changes Jul 15, 2020

View reviewed changes

mssola dismissed stale reviews from mjura and chentex via 9bd4585 July 15, 2020 14:04

mssola force-pushed the crio-migration branch 2 times, most recently from 9bd4585 to 4b773e0 Compare July 16, 2020 07:36

evrardjp suggested changes Jul 16, 2020

View reviewed changes

mssola force-pushed the crio-migration branch from 4b773e0 to b051294 Compare July 16, 2020 14:14

evrardjp suggested changes Jul 17, 2020

View reviewed changes

cri: handle the migration to the new configuration format

6bbe87f

Ensure that the local files of the user are properly migrated to the new cri-o configuration format. Signed-off-by: Miquel Sabaté Solà <msabate@suse.com>

mssola force-pushed the crio-migration branch from b051294 to 6bbe87f Compare July 17, 2020 10:18

evrardjp approved these changes Jul 17, 2020

View reviewed changes

mjura approved these changes Jul 17, 2020

View reviewed changes

mssola merged commit ee9666c into SUSE:master Jul 17, 2020

mssola deleted the crio-migration branch July 17, 2020 13:58

mssola mentioned this pull request Jul 20, 2020

[backport] cri: handle the migration to the new configuration format #1267

Merged

jenting mentioned this pull request Jul 21, 2020

[WIP] Update cluster init before upgrade #1110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cri: handle the migration to the new configuration format #1251

cri: handle the migration to the new configuration format #1251

mssola commented Jul 14, 2020 •

edited

Loading

mjura left a comment

kkaempf commented Jul 14, 2020

evrardjp left a comment

evrardjp Jul 14, 2020

mssola Jul 14, 2020

evrardjp Jul 14, 2020

mssola Jul 14, 2020

evrardjp Jul 15, 2020

mssola commented Jul 14, 2020

mjura left a comment

evrardjp left a comment

evrardjp Jul 15, 2020 •

edited

Loading

mssola Jul 15, 2020

evrardjp Jul 15, 2020

mssola Jul 15, 2020 •

edited

Loading

evrardjp left a comment

evrardjp Jul 16, 2020 •

edited

Loading

evrardjp Jul 16, 2020

evrardjp Jul 16, 2020

mssola Jul 16, 2020

evrardjp Jul 17, 2020

evrardjp left a comment

evrardjp Jul 17, 2020

mjura left a comment

cri: handle the migration to the new configuration format #1251

cri: handle the migration to the new configuration format #1251

Conversation

mssola commented Jul 14, 2020 • edited Loading

Why is this PR needed?

What does this PR do?

Info for QA

Anything else a reviewer needs to know?

mjura left a comment

Choose a reason for hiding this comment

kkaempf commented Jul 14, 2020

evrardjp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mssola commented Jul 14, 2020

mjura left a comment

Choose a reason for hiding this comment

evrardjp left a comment

Choose a reason for hiding this comment

evrardjp Jul 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mssola Jul 15, 2020 • edited Loading

Choose a reason for hiding this comment

evrardjp left a comment

Choose a reason for hiding this comment

evrardjp Jul 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evrardjp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjura left a comment

Choose a reason for hiding this comment

mssola commented Jul 14, 2020 •

edited

Loading

evrardjp Jul 15, 2020 •

edited

Loading

mssola Jul 15, 2020 •

edited

Loading

evrardjp Jul 16, 2020 •

edited

Loading