Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Windows workers #17555

Merged
merged 11 commits into from
Aug 6, 2021

Conversation

davinci26
Copy link
Member

@davinci26 davinci26 commented Jul 30, 2021

Commit Message:

Fixing an issue where Envoy workers are not picking up connections on Windows.

The root cause of the issue is in the Windows kernel.

There are two issues that we need to consider:

  1. If we want to listen from a duplicated socket then we need to duplicate it after we call listen on the original socket.
  2. If duplicated sockets try to accept at the same time, then one of the accept calls might block. Even if the sockets are non-blocking.

The best way to work around that issue is to only listen/accept connections from the first worker thread and then use ExactConnectionBalance to dispatch the connection to another worker thread. This is a temporary solution until the underlying issue is fixed on Windows.

Signed-off-by: Sotiris Nanopoulos sonanopo@microsoft.com
Additional Description:
Risk Level: low windows only
Testing: integration tests added
Docs Changes: Added
Release Notes: Added
Platform Specific Features: windows only
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #17555 was opened by davinci26.

see: more, trace.

@davinci26 davinci26 force-pushed the attemptingToFixConcurrency branch from 476c6b3 to 4c348c9 Compare July 30, 2021 21:11
Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
@davinci26 davinci26 force-pushed the attemptingToFixConcurrency branch from d21592f to 3ce5724 Compare August 2, 2021 01:01
Sotiris Nanopoulos added 3 commits August 2, 2021 12:36
Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
…urrency

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Sotiris Nanopoulos added 2 commits August 2, 2021 17:46
Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
@davinci26 davinci26 changed the title [WIP] Draft CI, Enable Windows workers Enable Windows workers Aug 3, 2021
@davinci26 davinci26 marked this pull request as ready for review August 3, 2021 16:19
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this generally makes sense to me modulo small comments. I would definitely have a release note for this, and also a TODO and tracking issue for a better OS level solution?

/wait

// We set the counters for the two workers to see how many connections each handles.
uint64_t w0_ctr = 0;
uint64_t w1_ctr = 0;
constexpr int loops = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be considered about flakes here. Can you make sure to run this test 1000 times? I think with REUSE_PORT on linux and exact balance on Windows this is probably OK, but it might flake on OSX. I'm not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a mac to test it. I have put this in a loop on Windows to verify the fix and it did not flake.

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Sotiris Nanopoulos added 3 commits August 4, 2021 09:39
…weird whitespace

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM modulo small comment typo.

/wait

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
@ggreenway
Copy link
Contributor

@davinci26 can you write a commit message for this? I don't think what is in the PR description is accurate for what this is.

@mattklein123 mattklein123 merged commit 1260c5c into envoyproxy:main Aug 6, 2021
davinci26 pushed a commit to davinci26/envoy that referenced this pull request Aug 23, 2021
Fixing an issue where Envoy workers are not picking up connections on Windows.

The root cause of the issue is in the Windows kernel.

There are two issues that we need to consider:

If we want to listen from a duplicated socket then we need to duplicate it after we call listen on the original socket.
If duplicated sockets try to accept at the same time, then one of the accept calls might block. Even if the sockets are non-blocking.
The best way to work around that issue is to only listen/accept connections from the first worker thread and then use ExactConnectionBalance to dispatch the connection to another worker thread. This is a temporary solution until the underlying issue is fixed on Windows.

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
davinci26 pushed a commit to davinci26/envoy that referenced this pull request Aug 23, 2021
Fixing an issue where Envoy workers are not picking up connections on Windows.

The root cause of the issue is in the Windows kernel.

There are two issues that we need to consider:

If we want to listen from a duplicated socket then we need to duplicate it after we call listen on the original socket.
If duplicated sockets try to accept at the same time, then one of the accept calls might block. Even if the sockets are non-blocking.
The best way to work around that issue is to only listen/accept connections from the first worker thread and then use ExactConnectionBalance to dispatch the connection to another worker thread. This is a temporary solution until the underlying issue is fixed on Windows.

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
wrowe pushed a commit that referenced this pull request Aug 31, 2021
* Enable Windows workers (#17555)

Fixing an issue where Envoy workers are not picking up connections on Windows.

The root cause of the issue is in the Windows kernel.

There are two issues that we need to consider:

If we want to listen from a duplicated socket then we need to duplicate it after we call listen on the original socket.
If duplicated sockets try to accept at the same time, then one of the accept calls might block. Even if the sockets are non-blocking.
The best way to work around that issue is to only listen/accept connections from the first worker thread and then use ExactConnectionBalance to dispatch the connection to another worker thread. This is a temporary solution until the underlying issue is fixed on Windows.

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* Guard the test to be Windows only, since we can't gurantee it without setting resuse_port on linux

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* trigger ci

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* reorder current.rst

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* revert change

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
wrowe pushed a commit that referenced this pull request Aug 31, 2021
* Enable Windows workers (#17555)

Fixing an issue where Envoy workers are not picking up connections on Windows.

The root cause of the issue is in the Windows kernel.

There are two issues that we need to consider:

If we want to listen from a duplicated socket then we need to duplicate it after we call listen on the original socket.
If duplicated sockets try to accept at the same time, then one of the accept calls might block. Even if the sockets are non-blocking.
The best way to work around that issue is to only listen/accept connections from the first worker thread and then use ExactConnectionBalance to dispatch the connection to another worker thread. This is a temporary solution until the underlying issue is fixed on Windows.

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* fix spelling mistake

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* fix spelling v2

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* trigger ci

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>

* revert changelog changes

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
Fixing an issue where Envoy workers are not picking up connections on Windows.

The root cause of the issue is in the Windows kernel.

There are two issues that we need to consider:

If we want to listen from a duplicated socket then we need to duplicate it after we call listen on the original socket.
If duplicated sockets try to accept at the same time, then one of the accept calls might block. Even if the sockets are non-blocking.
The best way to work around that issue is to only listen/accept connections from the first worker thread and then use ExactConnectionBalance to dispatch the connection to another worker thread. This is a temporary solution until the underlying issue is fixed on Windows.

Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants