Skip to content

Commit ce71ae0

Browse files
author
Sotiris Nanopoulos
authored
[Backport 1.18] Enable Windows workers (#17555) (#17814)
* Enable Windows workers (#17555) Fixing an issue where Envoy workers are not picking up connections on Windows. The root cause of the issue is in the Windows kernel. There are two issues that we need to consider: If we want to listen from a duplicated socket then we need to duplicate it after we call listen on the original socket. If duplicated sockets try to accept at the same time, then one of the accept calls might block. Even if the sockets are non-blocking. The best way to work around that issue is to only listen/accept connections from the first worker thread and then use ExactConnectionBalance to dispatch the connection to another worker thread. This is a temporary solution until the underlying issue is fixed on Windows. Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com> * fix spelling mistake Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com> * fix spelling v2 Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com> * trigger ci Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com> * revert changelog changes Signed-off-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
1 parent f27442c commit ce71ae0

File tree

4 files changed

+71
-0
lines changed

4 files changed

+71
-0
lines changed

docs/root/intro/arch_overview/intro/threading_model.rst

+4
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,7 @@ to have Envoy forcibly balance connections between worker threads. To support th
2424
Envoy allows for different types of :ref:`connection balancing
2525
<envoy_v3_api_field_config.listener.v3.Listener.connection_balance_config>` to be configured on each :ref:`listener
2626
<arch_overview_listeners>`.
27+
28+
On Windows the kernel is not able to balance the connections properly with the async IO model that Envoy is using.
29+
Until this is fixed by the platform, Envoy will enforce listener connection balancing on Windows. This allows us to
30+
balance connections between different worker threads. This behavior comes with a performance penalty.

docs/root/version_history/current.rst

+3
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ Minor Behavior Changes
99

1010
Bug Fixes
1111
---------
12+
*Changes expected to improve the state of the world and are unlikely to have negative effects*
13+
14+
* listener: fixed an issue on Windows where connections are not handled by all worker threads.
1215

1316
Removed Config or Runtime
1417
-------------------------

source/server/listener_impl.cc

+14
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,19 @@ void ListenerImpl::buildFilterChains() {
523523
void ListenerImpl::buildSocketOptions() {
524524
// TCP specific setup.
525525
if (connection_balancer_ == nullptr) {
526+
#ifdef WIN32
527+
// On Windows we use the exact connection balancer to dispatch connections
528+
// from worker 1 to all workers. This is a perf hit but it is the only way
529+
// to make all the workers do work.
530+
// TODO(davinci26): We can be faster here if we create a balancer implementation
531+
// that dispatches the connection to a random thread.
532+
ENVOY_LOG(warn,
533+
"ExactBalance was forced enabled for TCP listener '{}' because "
534+
"Envoy is running on Windows."
535+
"ExactBalance is used to load balance connections between workers on Windows.",
536+
config_.name());
537+
connection_balancer_ = std::make_shared<Network::ExactConnectionBalancerImpl>();
538+
#else
526539
// Not in place listener update.
527540
if (config_.has_connection_balance_config()) {
528541
// Currently exact balance is the only supported type and there are no options.
@@ -531,6 +544,7 @@ void ListenerImpl::buildSocketOptions() {
531544
} else {
532545
connection_balancer_ = std::make_shared<Network::NopConnectionBalancerImpl>();
533546
}
547+
#endif
534548
}
535549

536550
if (config_.has_tcp_fast_open_queue_length()) {

test/integration/integration_test.cc

+50
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,56 @@ TEST_P(IntegrationTest, PerWorkerStatsAndBalancing) {
142142
check_listener_stats(0, 1);
143143
}
144144

145+
// Make sure all workers pick up connections
146+
#ifdef WIN32
147+
// We can only guarantee this on Windows without the reuse_port changes.
148+
TEST_P(IntegrationTest, AllWorkersAreHandlingLoad) {
149+
concurrency_ = 2;
150+
initialize();
151+
152+
std::string worker0_stat_name, worker1_stat_name;
153+
if (GetParam() == Network::Address::IpVersion::v4) {
154+
worker0_stat_name = "listener.127.0.0.1_0.worker_0.downstream_cx_total";
155+
worker1_stat_name = "listener.127.0.0.1_0.worker_1.downstream_cx_total";
156+
} else {
157+
worker0_stat_name = "listener.[__1]_0.worker_0.downstream_cx_total";
158+
worker1_stat_name = "listener.[__1]_0.worker_1.downstream_cx_total";
159+
}
160+
161+
test_server_->waitForCounterEq(worker0_stat_name, 0);
162+
test_server_->waitForCounterEq(worker1_stat_name, 0);
163+
164+
// We set the counters for the two workers to see how many connections each handles.
165+
uint64_t w0_ctr = 0;
166+
uint64_t w1_ctr = 0;
167+
constexpr int loops = 5;
168+
for (int i = 0; i < loops; i++) {
169+
constexpr int requests_per_loop = 4;
170+
std::array<IntegrationCodecClientPtr, requests_per_loop> connections;
171+
for (int j = 0; j < requests_per_loop; j++) {
172+
connections[j] = makeHttpConnection(lookupPort("http"));
173+
}
174+
175+
auto worker0_ctr = test_server_->counter(worker0_stat_name);
176+
auto worker1_ctr = test_server_->counter(worker1_stat_name);
177+
auto target = w0_ctr + w1_ctr + requests_per_loop;
178+
while (test_server_->counter(worker0_stat_name)->value() +
179+
test_server_->counter(worker1_stat_name)->value() <
180+
target) {
181+
timeSystem().advanceTimeWait(std::chrono::milliseconds(10));
182+
}
183+
w0_ctr = test_server_->counter(worker0_stat_name)->value();
184+
w1_ctr = test_server_->counter(worker1_stat_name)->value();
185+
for (int j = 0; j < requests_per_loop; j++) {
186+
connections[j]->close();
187+
}
188+
}
189+
190+
EXPECT_TRUE(w0_ctr > 1);
191+
EXPECT_TRUE(w1_ctr > 1);
192+
}
193+
#endif
194+
145195
TEST_P(IntegrationTest, RouterDirectResponseWithBody) {
146196
const std::string body = "Response body";
147197
const std::string file_path = TestEnvironment::writeStringToFileForTest("test_envoy", body);

0 commit comments

Comments
 (0)