-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control: Ensure endpoints are driven to readiness #1014
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When there are multiple replicas of a controller--especially the destination controller--the proxy creates a load balancer to distribute requests across all controller pods. linkerd/linkerd2#6146 describes a situation where controller connections fail to be established because the client stalls for 50s+ between initiating a connection and sending a TLS ClientHello, long after the server has timed out the idle connection. As it turns out, the controller client does not necessarily drive all of its endpoints to readiness. Because load balancers are designed to process requests when only a subset of endpoints are available, the load balancer cannot be responsible for driving all endpoints in a service to readiness and we need a `SpawnReady` layer that is responsible for driving individual endpoints to readiness. While the outbound proxy's balancers are instrumented with this layer, the controller clients were not configured this way when load balancers were introduced. We likely have not encountered this previously because the balancer should effectively hide this problem in most cases: as long as a single endpoint is available requests should be processed as expected; and if there are no endpoints available, the balancer would drive at least one to readiness in order to process requests.
kleimkuhler
approved these changes
May 20, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
hawkw
approved these changes
May 20, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
May 27, 2021
* Controller clients of components with more than one replica could fail to drive all connections to completion. This could result in timeouts showing up in logs, but would not have prevented proxies from communicating with controllers. #6146 * linkerd/linkerd2-proxy#992 made the `l5d-dst-override` header required for ingress-mode proxies. This behavior has been reverted so that requests without this header are forwarded to their original destination. * OpenCensus trace spans for HTTP requests no longer include query parameters. --- * ci: Update/pin action dependencies (linkerd/linkerd2-proxy#1012) * control: Ensure endpoints are driven to readiness (linkerd/linkerd2-proxy#1014) * Make span name without query string (linkerd/linkerd2-proxy#1013) * ingress: Restore original dst address routing (linkerd/linkerd2-proxy#1016) * ci: Restict permissions in Actions (linkerd/linkerd2-proxy#1019) * Forbid unsafe code in most module (linkerd/linkerd2-proxy#1018)
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
May 27, 2021
* Controller clients of components with more than one replica could fail to drive all connections to completion. This could result in timeouts showing up in logs, but would not have prevented proxies from communicating with controllers. #6146 * linkerd/linkerd2-proxy#992 made the `l5d-dst-override` header required for ingress-mode proxies. This behavior has been reverted so that requests without this header are forwarded to their original destination. * OpenCensus trace spans for HTTP requests no longer include query parameters. --- * ci: Update/pin action dependencies (linkerd/linkerd2-proxy#1012) * control: Ensure endpoints are driven to readiness (linkerd/linkerd2-proxy#1014) * Make span name without query string (linkerd/linkerd2-proxy#1013) * ingress: Restore original dst address routing (linkerd/linkerd2-proxy#1016) * ci: Restict permissions in Actions (linkerd/linkerd2-proxy#1019) * Forbid unsafe code in most module (linkerd/linkerd2-proxy#1018)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When there are multiple replicas of a controller--especially the
destination controller--the proxy creates a load balancer to distribute
requests across all controller pods.
linkerd/linkerd2#6146 describes a situation where controller connections
fail to be established because the client stalls for 50s+ between
initiating a connection and sending a TLS ClientHello, long after the
server has timed out the idle connection.
As it turns out, the controller client does not necessarily drive all of
its endpoints to readiness. Because load balancers are designed to
process requests when only a subset of endpoints are available, the load
balancer cannot be responsible for driving all endpoints in a service to
readiness and we need a
SpawnReady
layer that is responsible fordriving individual endpoints to readiness. While the outbound proxy's
balancers are instrumented with this layer, the controller clients were
not configured this way when load balancers were introduced.
We likely have not encountered this previously because the balancer
should effectively hide this problem in most cases: as long as a single
endpoint is available requests should be processed as expected; and if
there are no endpoints available, the balancer would drive at least one
to readiness in order to process requests.