Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A guide for migration to CaaSP 4.5 #979

Merged
merged 10 commits into from
Sep 2, 2020
Merged

Conversation

atighineanu
Copy link

@r0ckarong r0ckarong self-assigned this Aug 27, 2020
@r0ckarong r0ckarong requested review from evrardjp and kkaempf August 27, 2020 09:27
@r0ckarong r0ckarong added 4.5.0 CaasP v5 release without Rancher AdminGuide Fix will change the Admin Guide ReleaseNotes Fix has impact that needs to be mentioned in the release notes labels Aug 27, 2020
@r0ckarong r0ckarong added this to the Sprint 37 milestone Aug 27, 2020
Copy link
Contributor

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change a few things for clarity:

First, remove all the automatisation (licenses, non interactive).

Then, clarify that the zypper migration feature is for direct access to SCC as mentioned.
It's not impossible to migrate and upgrade without SCC, but I think it needs more documentation, if possible in another section. This should refer to the SLE docs IMO.

As this SP migration is a prerequisite for the rest of the ugprade I think we could split the docs in two sections. I am not a docs architect, so I am not sure. I trust your experience y'all : )

Here is what I thought, for example:

header1: Before the upgrade, migrate your repositories and upgrade to SP2
header1a: for SCC connected installs
header1b: with other installs
header2: Trigger the upgrades as usual

@r0ckarong
Copy link
Contributor

I would change a few things for clarity:

First, remove all the automatisation (licenses, non interactive).

Then, clarify that the zypper migration feature is for direct access to SCC as mentioned.
It's not impossible to migrate and upgrade without SCC, but I think it needs more documentation, if possible in another section. This should refer to the SLE docs IMO.

As this SP migration is a prerequisite for the rest of the ugprade I think we could split the docs in two sections. I am not a docs architect, so I am not sure. I trust your experience y'all : )

Here is what I thought, for example:

header1: Before the upgrade, migrate your repositories and upgrade to SP2
header1a: for SCC connected installs
header1b: with other installs
header2: Trigger the upgrades as usual

OK, so who can perform the testing for these scenarios and provide the resulting procedures?

@r0ckarong r0ckarong added EngineeringInput NeedsEngineering Input Blocked Blocked by lack of information or external factors labels Aug 27, 2020
@evrardjp
Copy link
Contributor

evrardjp commented Aug 27, 2020

@r0ckarong this process already written IS what has been tested.

maybe I am just confused by your comment... Do you mean you want an extra round of QA?

This content is written by @atighineanu , which is X-squad's QA member :)

Copy link
Member

@kkaempf kkaempf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the content of the PR now. Some wording fixes remain.

@evrardjp
Copy link
Contributor

evrardjp commented Aug 27, 2020

Agreeing with @kkaempf comments.

mssola added a commit to mssola/caasp-release-notes that referenced this pull request Aug 27, 2020
See SUSE/doc-caasp#979
See SUSE/avant-garde#1918

Signed-off-by: Miquel Sabaté Solà <msabate@suse.com>
mssola added a commit to mssola/caasp-release-notes that referenced this pull request Aug 27, 2020
See SUSE/doc-caasp#979
See SUSE/avant-garde#1918

Signed-off-by: Miquel Sabaté Solà <msabate@suse.com>
@thehejik
Copy link
Contributor

Maybe we should add a note into this PR how to drain a node before performing manual reboot after zypper migration (which is not covered by skuba-update nor kured). But maybe simple sudo touch /var/run/reboot-needed would be sufficient for kured to reboot a node automatically after while.

@kkaempf
Copy link
Member

kkaempf commented Aug 28, 2020

Good point ! Each node needs to be drained/cordoned before starting the migration. We had this discussion on RC before.

@evrardjp
Copy link
Contributor

evrardjp commented Aug 31, 2020

Good point ! Each node needs to be drained/cordoned before starting the migration. We had this discussion on RC before.

This is now included in skuba automatically. But yes, it does not cover the case in which folks want to perform a manual reboot.

@thehejik
Copy link
Contributor

Good point ! Each node needs to be drained/cordoned before starting the migration. We had this discussion on RC before.

This is now included in skuba automatically. But yes, it does not cover the case in which folks want to perform a manual reboot.

But we need to reboot right after zypper migration which is one step before calling skuba. Or do you mean that once I start skuba node upgrade it will first reboot the node and after that it will try to upgrade packages from v4.5?

@Martin-Weiss
Copy link
Contributor

Not sure if the full customers requirements are covered here, already.

Please ensure that we can also do an upgrade with manually repo-changes (zypper ar / zypper rr) and using "zypper dup" instead of "zypper migration"! We have many enterprise customers that do NOT register their servers to SCC/SMT/RMT and have their own repo-sources (ZCM, SUSE Manager). In these cases we have to adjust the repos manually and us zypper dup...

@kkaempf
Copy link
Member

kkaempf commented Aug 31, 2020

Please ensure that we can also do an upgrade with manually repo-changes (zypper ar / zypper rr) and using "zypper dup" instead of "zypper migration"!

Is this a supported scenario for SLES migrations ?

In any case, there's not too much we can document for zypper repos, except

  • switch your SLE repos from 15 SP1 to 15 SP2 (incl. containers module for nodes and public cloud for the management/skuba host)
  • switch from caasp-4.2 to caasp-4.5

@Martin-Weiss
Copy link
Contributor

Please ensure that we can also do an upgrade with manually repo-changes (zypper ar / zypper rr) and using "zypper dup" instead of "zypper migration"!

Is this a supported scenario for SLES migrations ?

AFAIK - yes - we have to do this in SUMA assisted deployments often and we have to do this with SES, too..

In any case, there's not too much we can document for zypper repos, except

* switch your SLE repos from 15 SP1 to 15 SP2 (incl. containers module for nodes and public cloud for the management/skuba host)

* switch from caasp-4.2 to caasp-4.5

Documentation is fine - I believe we just need to list the required repos properly..

@r0ckarong r0ckarong removed the Blocked Blocked by lack of information or external factors label Sep 1, 2020
@r0ckarong
Copy link
Contributor

@kkaempf @Martin-Weiss From my side the requested information for this step is there now. Please give me a thumbs up and I will merge.

@r0ckarong r0ckarong requested a review from kkaempf September 2, 2020 10:05
@r0ckarong r0ckarong merged commit 5b24a3b into SUSE:master Sep 2, 2020
r0ckarong pushed a commit to SUSE/caasp-release-notes that referenced this pull request Sep 7, 2020
@evrardjp
Copy link
Contributor

@r0ckarong I just realised that the upgrade of the local config for cri-o is only in release notes. Shouldn't it be also here, in the process? (skuba cluster upgrade localconfig is documented in https://github.com/SUSE/caasp-release-notes/pull/51/files )

@r0ckarong
Copy link
Contributor

r0ckarong commented Sep 15, 2020

@r0ckarong I just realised that the upgrade of the local config for cri-o is only in release notes. Shouldn't it be also here, in the process? (skuba cluster upgrade localconfig is documented in https://github.com/SUSE/caasp-release-notes/pull/51/files )

@evrardjp Isn't this automatically done when you run skuba addon upgrade apply? AFAIU you can refresh manually from local configs with the command from the release notes but skuba will do this anyway when running addon upgrade. Is this not the case?

@Martin-Weiss
Copy link
Contributor

I thought "addon upgrade apply" just deploys yaml and does not need ssh? Do we control the crio config via yaml and can skuba distribute the crio.conf.d changes with addon apply and without ssh?

@evrardjp
Copy link
Contributor

evrardjp commented Sep 15, 2020

correct it's not automatically done.
The reason is that it doesn't happen at the same time as the addon upgrade.

For example, for version 1.17.9, the addons are at version x, and for 1.18.4, the addons are at version y (fictional numbers).
When following the process, the skuba addon upgrade apply is first done, which applies addon upgrades from "x" (1.17.9). It's similar to a post upgrade case. You then upgrade your cluster to a new version, and generally we had no action to do (as things didn't need manual intervention, we could auto migrate things). However, for 1.18.4, this is not the case, as y is not backwards compatible with x. Y migration needs to happen just before upgrading your node BUT needs manual intervention/checking from the deployer (which prevents us from doing things automatically). This is why this command have appeared. You should check with the contributors of this command, and the card implementing it. @mssola can help there.

@evrardjp
Copy link
Contributor

evrardjp commented Sep 15, 2020

I thought "addon upgrade apply" just deploys yaml and does not need ssh? Do we control the crio config via yaml and can skuba distribute the crio.conf.d changes with addon apply and without ssh?

  1. It's too late to change the design ;)
  2. crio.config is not managed through yaml, but is manually managed in the cluster definition folder of the users, like all other addons. A user can change it manually and it will automatically get applied, like it used to be done. However, this is by far not convenient to say to someone "Please rename those files, add another one there, and you are ready", while we can simply provide a command that is smart and does the job.
  3. We probably don't want to store keys for connecting to nodes, for security reasons.

@Martin-Weiss
Copy link
Contributor

  1. then we need to fix the bug in the design ASAP ;-)
  2. we need a solution for customers and make it easy for them to upgrade from 4.x to 4.5 that also takes care about their existing customizations of crio.conf (especially the global_auth_file). At the moment we do NOT have a working and "easy" solution as the solution is not implemented "fully automated" nor "documented" in a way a customer can follow.

I would recommend to document the step by step process properly and hopefully get this fixed in one of the next releases.
(this is the third of fourth time where we have to struggle with crio.conf since CaaSP 4.0)

@r0ckarong
Copy link
Contributor

So we are missing the step of refreshing the config manually during the migration. What is the correct way to do this and at what point in the current instructions does this need to be done?

@atighineanu Did you guys test with a modified crio.conf?

@evrardjp
Copy link
Contributor

evrardjp commented Sep 15, 2020

we need a solution for customers and make it easy for them to upgrade from 4.x to 4.5 that also takes care about their existing customizations of crio.conf (especially the global_auth_file). At the moment we do NOT have a working and "easy" solution as the solution is not implemented "fully automated" nor "documented" in a way a customer can follow.

We do! That's indeed the scope of that command I mentioned above, which is incorrectly documented only in release notes :)

@evrardjp
Copy link
Contributor

@r0ckarong I think @atighineanu indeed tested with a modified crio, else it wouldn't work ;)
We just forgot to document it. But I will let @atighineanu speak, instead of talking for him :p

@evrardjp
Copy link
Contributor

evrardjp commented Sep 16, 2020

What was tested when developing was only the features that were enabled through feature flags. Manually overriden data still need manual intervention, this is why we don't automate everything. We can't assume what people have done with their config, as it would be quite a wide gammut of changes. IMO We should just encourage deployers to try the automation tool to migrate to the new configuration structure, and kindly ask them to review the changes.

@evrardjp
Copy link
Contributor

  1. then we need to fix the bug in the design ASAP ;-)

I agree on that: if there is something we should iterate, let's iterate! Let's just not redesign from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.5.0 CaasP v5 release without Rancher AdminGuide Fix will change the Admin Guide EngineeringInput NeedsEngineering Input ReleaseNotes Fix has impact that needs to be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants