Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure if fractional CPU resource requested for kdcluster #466

Closed
joel-bluedata opened this issue Apr 30, 2021 · 0 comments
Closed

failure if fractional CPU resource requested for kdcluster #466

joel-bluedata opened this issue Apr 30, 2021 · 0 comments
Assignees
Labels
Priority: High Project: Cluster Reconcile beyond simple xlate of model to K8s spec Type: Bug

Comments

@joel-bluedata
Copy link
Member

Observed on K8s 1.20.2.

If I try to create a kdcluster with a "0.5" CPU resource, it doesn't reach a final configured state. The process is blocked with these errors in the KD log:

{"level":"error","ts":1619800365.0318282,"logger":"controller_kubedirectorcluster","msg":"trying status update again in 16s; failed","Request.Namespace":"import-test","Request.Name":"centos8","error":"admission webhook "validate-cr.kubedirector.hpe.com" denied the request: \nChange to spec not allowed before previous spec change has been processed.", ... }

If I enable more detailed kdcluster logging for the API server, I see that the problematic request includes a spec which is changing the CPU resource from "0.5" to "500m". Obviously this is quantitatively equivalent, but it triggers the "is the spec being changed" check in the validator which is just doing an object comparison.

If I just use "500m" in the initial creation, everything works fine.

A few questions at this point:

  • Is this behavior new to 1.20, or have we just not seen the issue before because people tend to request whole-number CPU resource values?

  • The KD code is writing directly to the CR's status subresource, so why is the API server receiving a PUT to the CR's spec? The rejection of this PUT is being seen by the status write though, so the spec update is happening somewhere along the code path for that. Maybe in the controller-runtime client code?

  • Where is this resource value getting changed from "0.5" to "500m"? We're not doing it explicitly in the KD code. Again maybe something further down that status update callstack.

We could work around this issue by adding logic to KD, probably in the block of code currently marked by "Shortcut out of here if the spec is not being changed" in admitClusterCR. We could specifically dig into the resource properties, see if they are quantitatively equivalent, and if so then copy the new properties into the old object before doing the object compare.

But it would be nice to understand a bit more about what is happening.

@joel-bluedata joel-bluedata added Priority: High Project: Cluster Reconcile beyond simple xlate of model to K8s spec Type: Bug labels Apr 30, 2021
@joel-bluedata joel-bluedata self-assigned this May 12, 2021
joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue May 12, 2021
… check

There are situations where a client is not allowed to change a role spec: if the previous change is still being processed, or if the role has nonzero member count.

As per issue bluek8s#466 though, there are some cases where a resource quantity is getting its format changed, e.g. from "0.5" to "500m", without its quantity actually being changed. Sometimes this is even happening as part of a status write from KD itself. We're not explicitly asking for this format change -- something along the way in client-go or K8s is doing it -- but whatever the reason, we need to allow it to happen.

So before we compare old spec to new spec, we'll copy any quantity-equivalent resource values from new spec into old spec. That way the subsequent bulk compare of the two specs will not see those values as changing.
joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue May 12, 2021
… check

There are situations where a client is not allowed to change a role spec: if the previous change is still being processed, or if the role has nonzero member count.

As per issue bluek8s#466 though, there are some cases where a resource quantity is getting its format changed, e.g. from "0.5" to "500m", without its quantity actually being changed. Sometimes this is even happening as part of a status write from KD itself. We're not explicitly asking for this format change -- something along the way in client-go or K8s is doing it -- but whatever the reason, we need to allow it to happen.

So before we compare old spec to new spec, we'll copy any quantity-equivalent resource values from new spec into old spec. That way the subsequent bulk compare of the two specs will not see those values as changing.
joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue May 13, 2021
… check

There are situations where a client is not allowed to change a role spec: if the previous change is still being processed, or if the role has nonzero member count.

As per issue bluek8s#466 though, there are some cases where a resource quantity is getting its format changed, e.g. from "0.5" to "500m", without its quantity actually being changed. Sometimes this is even happening as part of a status write from KD itself. We're not explicitly asking for this format change -- something along the way in client-go or K8s is doing it -- but whatever the reason, we need to allow it to happen.

So before we compare old spec to new spec, we'll copy any quantity-equivalent resource values from new spec into old spec. That way the subsequent bulk compare of the two specs will not see those values as changing.
joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue May 14, 2021
Motivated by issue bluek8s#466 initially, but really we should be doing this everywhere.

Along with general regression-test creations of existing kdapp types while monitoring the KD logs, did these specific tests:

- creating kdcluster with 0.5 CPU (works now)
- changing kdcluster role spec to something "semantically equivalent", for role with existing members (works now)
- changing kdcluster role spec to something non-equivalent, for role with existing members (blocked, as it should be)
- changing the members count for a scaleout kdcluster role (works, as it should)
- manually editing a kdcluster status stanza (blocked, as it should be)
- manually editing a kdconfig status stanza (blocked, as it should be)
joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue May 14, 2021
Motivated by issue bluek8s#466 initially, but really we should be doing this everywhere.

Along with general regression-test creations of existing kdapp types while monitoring the KD logs, did these specific tests:

- creating kdcluster with 0.5 CPU (works now)
- changing kdcluster role spec to something "semantically equivalent", for role with existing members (works now)
- changing kdcluster role spec to something non-equivalent, for role with existing members (blocked, as it should be)
- changing the members count for a scaleout kdcluster role (works, as it should)
- manually editing a kdcluster status stanza (blocked, as it should be)
- manually editing a kdconfig status stanza (blocked, as it should be)
aleshchynskyi pushed a commit to aleshchynskyi/kubedirector that referenced this issue Jun 10, 2021
Motivated by issue bluek8s#466 initially, but really we should be doing this everywhere.

Along with general regression-test creations of existing kdapp types while monitoring the KD logs, did these specific tests:

- creating kdcluster with 0.5 CPU (works now)
- changing kdcluster role spec to something "semantically equivalent", for role with existing members (works now)
- changing kdcluster role spec to something non-equivalent, for role with existing members (blocked, as it should be)
- changing the members count for a scaleout kdcluster role (works, as it should)
- manually editing a kdcluster status stanza (blocked, as it should be)
- manually editing a kdconfig status stanza (blocked, as it should be)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Project: Cluster Reconcile beyond simple xlate of model to K8s spec Type: Bug
Projects
None yet
Development

No branches or pull requests

1 participant