-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure if fractional CPU resource requested for kdcluster #466
Labels
Comments
joel-bluedata
added a commit
to joel-bluedata/kubedirector
that referenced
this issue
May 12, 2021
… check There are situations where a client is not allowed to change a role spec: if the previous change is still being processed, or if the role has nonzero member count. As per issue bluek8s#466 though, there are some cases where a resource quantity is getting its format changed, e.g. from "0.5" to "500m", without its quantity actually being changed. Sometimes this is even happening as part of a status write from KD itself. We're not explicitly asking for this format change -- something along the way in client-go or K8s is doing it -- but whatever the reason, we need to allow it to happen. So before we compare old spec to new spec, we'll copy any quantity-equivalent resource values from new spec into old spec. That way the subsequent bulk compare of the two specs will not see those values as changing.
joel-bluedata
added a commit
to joel-bluedata/kubedirector
that referenced
this issue
May 12, 2021
… check There are situations where a client is not allowed to change a role spec: if the previous change is still being processed, or if the role has nonzero member count. As per issue bluek8s#466 though, there are some cases where a resource quantity is getting its format changed, e.g. from "0.5" to "500m", without its quantity actually being changed. Sometimes this is even happening as part of a status write from KD itself. We're not explicitly asking for this format change -- something along the way in client-go or K8s is doing it -- but whatever the reason, we need to allow it to happen. So before we compare old spec to new spec, we'll copy any quantity-equivalent resource values from new spec into old spec. That way the subsequent bulk compare of the two specs will not see those values as changing.
joel-bluedata
added a commit
to joel-bluedata/kubedirector
that referenced
this issue
May 13, 2021
… check There are situations where a client is not allowed to change a role spec: if the previous change is still being processed, or if the role has nonzero member count. As per issue bluek8s#466 though, there are some cases where a resource quantity is getting its format changed, e.g. from "0.5" to "500m", without its quantity actually being changed. Sometimes this is even happening as part of a status write from KD itself. We're not explicitly asking for this format change -- something along the way in client-go or K8s is doing it -- but whatever the reason, we need to allow it to happen. So before we compare old spec to new spec, we'll copy any quantity-equivalent resource values from new spec into old spec. That way the subsequent bulk compare of the two specs will not see those values as changing.
joel-bluedata
added a commit
to joel-bluedata/kubedirector
that referenced
this issue
May 14, 2021
Motivated by issue bluek8s#466 initially, but really we should be doing this everywhere. Along with general regression-test creations of existing kdapp types while monitoring the KD logs, did these specific tests: - creating kdcluster with 0.5 CPU (works now) - changing kdcluster role spec to something "semantically equivalent", for role with existing members (works now) - changing kdcluster role spec to something non-equivalent, for role with existing members (blocked, as it should be) - changing the members count for a scaleout kdcluster role (works, as it should) - manually editing a kdcluster status stanza (blocked, as it should be) - manually editing a kdconfig status stanza (blocked, as it should be)
joel-bluedata
added a commit
to joel-bluedata/kubedirector
that referenced
this issue
May 14, 2021
Motivated by issue bluek8s#466 initially, but really we should be doing this everywhere. Along with general regression-test creations of existing kdapp types while monitoring the KD logs, did these specific tests: - creating kdcluster with 0.5 CPU (works now) - changing kdcluster role spec to something "semantically equivalent", for role with existing members (works now) - changing kdcluster role spec to something non-equivalent, for role with existing members (blocked, as it should be) - changing the members count for a scaleout kdcluster role (works, as it should) - manually editing a kdcluster status stanza (blocked, as it should be) - manually editing a kdconfig status stanza (blocked, as it should be)
aleshchynskyi
pushed a commit
to aleshchynskyi/kubedirector
that referenced
this issue
Jun 10, 2021
Motivated by issue bluek8s#466 initially, but really we should be doing this everywhere. Along with general regression-test creations of existing kdapp types while monitoring the KD logs, did these specific tests: - creating kdcluster with 0.5 CPU (works now) - changing kdcluster role spec to something "semantically equivalent", for role with existing members (works now) - changing kdcluster role spec to something non-equivalent, for role with existing members (blocked, as it should be) - changing the members count for a scaleout kdcluster role (works, as it should) - manually editing a kdcluster status stanza (blocked, as it should be) - manually editing a kdconfig status stanza (blocked, as it should be)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Observed on K8s 1.20.2.
If I try to create a kdcluster with a "0.5" CPU resource, it doesn't reach a final configured state. The process is blocked with these errors in the KD log:
{"level":"error","ts":1619800365.0318282,"logger":"controller_kubedirectorcluster","msg":"trying status update again in 16s; failed","Request.Namespace":"import-test","Request.Name":"centos8","error":"admission webhook "validate-cr.kubedirector.hpe.com" denied the request: \nChange to spec not allowed before previous spec change has been processed.", ... }
If I enable more detailed kdcluster logging for the API server, I see that the problematic request includes a spec which is changing the CPU resource from "0.5" to "500m". Obviously this is quantitatively equivalent, but it triggers the "is the spec being changed" check in the validator which is just doing an object comparison.
If I just use "500m" in the initial creation, everything works fine.
A few questions at this point:
Is this behavior new to 1.20, or have we just not seen the issue before because people tend to request whole-number CPU resource values?
The KD code is writing directly to the CR's status subresource, so why is the API server receiving a PUT to the CR's spec? The rejection of this PUT is being seen by the status write though, so the spec update is happening somewhere along the code path for that. Maybe in the controller-runtime client code?
Where is this resource value getting changed from "0.5" to "500m"? We're not doing it explicitly in the KD code. Again maybe something further down that status update callstack.
We could work around this issue by adding logic to KD, probably in the block of code currently marked by "Shortcut out of here if the spec is not being changed" in admitClusterCR. We could specifically dig into the resource properties, see if they are quantitatively equivalent, and if so then copy the new properties into the old object before doing the object compare.
But it would be nice to understand a bit more about what is happening.
The text was updated successfully, but these errors were encountered: