-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kdcluster backup/restore enablement #500
Comments
"We lose the kdcluster status stanza, when that kdcluster resource is re-created. There's important stuff in there!" Can you please elaborate on this line? What is the workflow you are talking about? |
Backup/restore solutions like Velero don't have direct access to etcd; they just read a resource from the API, store it, and then later use the API to re-create that resource. When you create a resource you can't specify what its status stanza should look like. (And honestly it would often not be a good idea to do so anyway, as the status of the new resource could differ from the old one.) In a kdcluster, the status stanza contains info about the process of reconciliation, for example whether there are pending "add" notifies that should be sent to a currently down container once it comes back up. So we have to restore or re-create that status. |
Got it, when Valero backs up kdcluster resource, when it is re-creating it can get the status from backed up kdcluster resource right? why do we need a new crd ? |
Ignore this, KD needs a k8s native resource and you already explained config map is a bad choice for various reasons. |
Right. I should also add here that the (reasonable) approach by Velero is that it doesn't provide any hooks for any controllers to directly intervene in how it re-creates various resources -- this is true for native K8s resources as well as for CRs. Velero just re-creates the spec, and then it is up to the relevant controller to figure out how to go from there. |
Backup-and-restore for kdclusters currently has three issues.
A kdcluster needs to be restored AFTER the relevant kdapp.
Depending on how we attack this, the kdcluster probably also needs to be restored after the native K8s resources that make up the kdcluster.
We lose the kdcluster status stanza, when that kdcluster resource is re-created. There's important stuff in there!
Thoughts about ordering w.r.t. the kdapp:
If we capture the specific distro ID and version of the kdapp in the kdcluster status, then we could do something like:
(To detect when we are restoring a kdcluster, rather than creating a new one; e.g. we could check for the presence of KD's finalizer.)
It's important to track & honor the distro ID and version, to make sure we really are reconnecting with the same kdapp -- not some other kdapp resource created in the interim between backup/restore that happens to have the same name as the old one but different contents.
Thoughts about ordering w.r.t. component K8s resources:
This one is hard to handle programmatically, as from KD's perspective there's no detectable difference between "the restore process hasn't restored my statefulset yet" vs. "my statefulset is missing and must be re-created".
We seem to only have two choices. Either require that kdcluster gets restored last, or have a post-restoration kdcluster NOT immediately start reconciliation until it is explicitly told to do so. (The "explicit tell" could take many forms.)
Thoughts about losing the status stanza:
Ideally, perhaps, our important state would all be out there in the native K8s resources. Some thoughts about that (and why we aren't doing it currently) in this gist: https://gist.github.com/joel-bluedata/ce39dd74f960fe773ad20a011eb7086d
Another alternative to using the status stanza would be to use an actual database rather than storing stuff in etcd documents. This isn't currently under consideration, but we're aware that it's a possibility.
The track we're pursuing at the moment involves storing the state for a kdcluster in some other document. There would be some advantages to stashing these documents in their own special namespace, but that complicates some common backup/restore scenarios. So it looks like such a document would live in the same namespace as its kdcluster.
It would be somewhat natural to use a configmap or secret for this purpose. Drawbacks however:
Using a CR for this purpose would solve the above issues, so probably we'll need to add a CR that is configmap-like but used only for storing kdcluster state.
The other issue to settle is the relationship between the status stanza and this new resource. A good end-state could be to have this new resource be the authoritative state tracker for the kdcluster, and the status stanza only would duplicate parts of that info out of convenience for the end-user. However, to tackle this work in stages, and to minimally disrupt existing stuff that looks at the kdcluster status stanza, as a first cut we're thinking of making this new resource just a mirror of the status stanza. The status stanza is still authoritative. The mirror is used to restore the status stanza when it goes missing.
The text was updated successfully, but these errors were encountered: