Skip to content
This repository was archived by the owner on Feb 6, 2025. It is now read-only.

cri: handle the migration to the new configuration format #1251

Merged
merged 1 commit into from
Jul 17, 2020

Conversation

mssola
Copy link
Contributor

@mssola mssola commented Jul 14, 2020

Why is this PR needed?

Fixes SUSE/avant-garde#1679
Fixes SUSE/avant-garde#1785
See SUSE/avant-garde#1679
See SUSE/caasp-release-notes#51

What does this PR do?

Ensure that the local files of the user are properly migrated to the new cri-o configuration format.

Info for QA

In order to test this you should first create a v4.2.2 cluster with an old skuba. Then, use a skuba built with master with this patch applied, and call skuba cluster upgrade config. You should see:

  • The old addons/cri/default_flags should've been removed.
  • Now you should have addons/cri/conf.d with 01-caasp.conf and README in it.

Anything else a reviewer needs to know?

With this SUSE/caasp-release-notes#51 should not be needed.

@mssola mssola requested review from evrardjp and mjura July 14, 2020 06:33
@mssola mssola self-assigned this Jul 14, 2020
Copy link
Contributor

@mjura mjura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this new command to skuba cluster update config or skuba cluster update config because skuba cluster upgrade apply suggests that it will upgrade all nodes in the cluster

@kkaempf
Copy link
Member

kkaempf commented Jul 14, 2020

How can this be tested ?

Copy link
Contributor

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed handles the migration (I haven't tested it) of the local files.
However, I think we miss the step in upgrades to also replace the sysconfig in the destination node.
Will this be in a separate PR or just a separate commit in this PR?


func criGenerateLocalConfiguration() error {
cfg := clusterinit.InitConfiguration{
PauseImage: kubernetes.ComponentContainerImageForClusterVersion(kubernetes.Pause, kubernetes.LatestVersion()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is LatestVersion correct, or should that be the next version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least this is giving me this result: registry.suse.de/devel/caasp/5/containers/containers/caasp/v5/pause:3.2, which I believe is fine, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine for now, but it might not be in the future, if the user is lagging behind more than a version. No?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in this case they should be using the proper skuba for each scenario (same as other upgrades). I don't think we have a next version somewhere in the code, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm interesting, I thought we had that, because we have a path in the plan and apply commands, when dealing with multiple k8s versions, but I didn't check the code.

@mssola mssola force-pushed the crio-migration branch 2 times, most recently from 418c8e5 to 2d525dd Compare July 14, 2020 13:31
@mssola
Copy link
Contributor Author

mssola commented Jul 14, 2020

How can this be tested ?

Updated the PR.

@mssola mssola marked this pull request as ready for review July 14, 2020 14:49
@evrardjp evrardjp dismissed their stale review July 15, 2020 11:06

PR has been updated, need a new review.

chentex
chentex previously approved these changes Jul 15, 2020
mjura
mjura previously approved these changes Jul 15, 2020
Copy link
Contributor

@mjura mjura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good start. Here are a few changes to clean up the duties of each of the functions called.

Use: "config",
Short: "Upgrades the local configuration",
Run: func(cmd *cobra.Command, args []string) {
if err := ssh.CriMigrate(); err != nil {
Copy link
Contributor

@evrardjp evrardjp Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very misleading. You are saying migrating local configuration, yet this function run things on the remote node.

For me the upgrade localconfig should be dealing with just local things, while the things happening on target nodes can be dealt with during node upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not touch files on remote nodes. It just updates the local configuration. Thus, I see two things:

  1. It should not go into the ssh package, as you said in another comment.
  2. We may want to change the subcommand name from config to localconfig, so to reinforce the idea of changes only happening locally, as it has been suggested. What do you think?

@@ -0,0 +1,7 @@
## Path : System/Management
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should NOT provide this. This should be deleted. We should trust the sysconfig coming from the package instead, and copy it from the package (in /usr/ iirc), to the /etc/sysconfig/.

Copy link
Contributor Author

@mssola mssola Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear in mind that this is just test data. This is not planned to be installed anywhere, and for testing purposes I will not rely on developers having the file already on their machines.

@mssola mssola dismissed stale reviews from mjura and chentex via 9bd4585 July 15, 2020 14:04
@mssola mssola force-pushed the crio-migration branch 2 times, most recently from 9bd4585 to 4b773e0 Compare July 16, 2020 07:36
Copy link
Contributor

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more changes required IMO.

}

if _, _, err = t.ssh("mv -f /etc/sysconfig/crio /etc/sysconfig/crio.backup"); err != nil {
if _, _, err := t.ssh("mv -f /usr/share/fillup-templates/sysconfig.crio /etc/sysconfig/crio"); err != nil {
Copy link
Contributor

@evrardjp evrardjp Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if mv is a good idea, that would remove the file from its expected location, is that fine for zypper?
Otherwise the location is fine for me.

return err
}
_, _, err = t.ssh("mv -f /tmp/cri.d/default_flags /etc/sysconfig/crio")
return err
return t.target.UploadFile(skuba.CriDockerDefaultsConfFile(), filepath.Join("/etc/cri/default_flags.local"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not even sure we need to do this anymore. I think we can just remove it, as ppl will just upload their own config file anyway... WDYT?

files := clusterinit.CriScaffoldFiles["criconfig"]
for _, file := range files {
if err := clusterinit.WriteScaffoldFile(file, cfg); err != nil {
_ = os.RemoveAll(skuba.CriConfDir())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove the CriConfDir ? It means if the customer has custom files, they will be removed, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. clusterinit.WriteScaffoldFile(file, cfg) will write scaffold files into that directory. Thus, any files with the same names will be effectively replaced. This means that customers have to back things up before performing the migration (as in with any migration).

That being said, if, for whatever reason, we fail to write these files, we should remove everything instead of sticking into an intermediary result (e.g. imagine that one file was successfully written but the other didn't). We don't want these kinds of scenarios, it's all or nothing.

Therefore, I think that this is actually the safest approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. clusterinit.WriteScaffoldFile(file, cfg) will write scaffold files into that directory. Thus, any files with the same names will be effectively replaced. This means that customers have to back things up before performing the migration (as in with any migration).

I think in this case, the files being replaced are the "caasp defaults", which is fine, and the customer shouldn't touch them, so we are good. Yes, it's indeed safer to backup, while not necessary.

That being said, if, for whatever reason, we fail to write these files, we should remove everything instead of sticking into an intermediary result (e.g. imagine that one file was successfully written but the other didn't). We don't want these kinds of scenarios, it's all or nothing.

That's what I am not getting my head around. For me, we should fail, and say "There is something wrong that happened here, maybe you can figure it out with this error: %err". Not deleting the whole folder, which might contain user configuration on top of our default configs. That's very disruptive to me.

I agree it's better to bail, but I don't think it's better to delete the folder.

Therefore, I think that this is actually the safest approach.

Do you mean that, because this migration only runs ONCE, it should be okay to delete the folder, as there is no user configuration yet, and that code will probably never be used for something else? I am fine with that, but I would prefer to add this explicitly in the comments. A comment just before the dir removal should be enough, to clarify that "this is okay to do so because this process only runs once, so there should be no failure, and there should be no user configuration at this point yet"

Copy link
Contributor

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see an extra comment because future me will read this badly.

files := clusterinit.CriScaffoldFiles["criconfig"]
for _, file := range files {
if err := clusterinit.WriteScaffoldFile(file, cfg); err != nil {
_ = os.RemoveAll(skuba.CriConfDir())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. clusterinit.WriteScaffoldFile(file, cfg) will write scaffold files into that directory. Thus, any files with the same names will be effectively replaced. This means that customers have to back things up before performing the migration (as in with any migration).

I think in this case, the files being replaced are the "caasp defaults", which is fine, and the customer shouldn't touch them, so we are good. Yes, it's indeed safer to backup, while not necessary.

That being said, if, for whatever reason, we fail to write these files, we should remove everything instead of sticking into an intermediary result (e.g. imagine that one file was successfully written but the other didn't). We don't want these kinds of scenarios, it's all or nothing.

That's what I am not getting my head around. For me, we should fail, and say "There is something wrong that happened here, maybe you can figure it out with this error: %err". Not deleting the whole folder, which might contain user configuration on top of our default configs. That's very disruptive to me.

I agree it's better to bail, but I don't think it's better to delete the folder.

Therefore, I think that this is actually the safest approach.

Do you mean that, because this migration only runs ONCE, it should be okay to delete the folder, as there is no user configuration yet, and that code will probably never be used for something else? I am fine with that, but I would prefer to add this explicitly in the comments. A comment just before the dir removal should be enough, to clarify that "this is okay to do so because this process only runs once, so there should be no failure, and there should be no user configuration at this point yet"

Ensure that the local files of the user are properly migrated to the new cri-o
configuration format.

Signed-off-by: Miquel Sabaté Solà <msabate@suse.com>
Copy link
Contributor

@mjura mjura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
5.0.0 enhancement New feature or request v5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants