Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: resolve race condition in test-net-write-fully-async-* #57022

Merged
merged 5 commits into from
Feb 25, 2025

Conversation

mcollina
Copy link
Member

@mcollina mcollina commented Feb 13, 2025

No description provided.

Signed-off-by: Matteo Collina <hello@matteocollina.com>
@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Feb 13, 2025
@mcollina
Copy link
Member Author

This removes the flakiness on my Mac OS X machine.

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 13, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 13, 2025
@nodejs-github-bot

This comment was marked as outdated.

Signed-off-by: Matteo Collina <hello@matteocollina.com>
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 13, 2025
// Wait for the server to be ready to receive data
// otherwise the test might be flaky as the mustCall above
// might not be invoked.
conn.once('data', common.mustCall(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. The connection was just established, why we need to wait for data from the server?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the active while loop above was starving the event loop to the point that the server was closed before it had the chance to run the JS code in the listener.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I still don't understand, what listener?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server listener was never executed in some cases because the server was shut down early in https://github.com/nodejs/node/pull/57022/files#diff-b2324542b7f8149fd690cc0f9c263464d266e54d4e6b7740a241ab107e662ed1R21.

This test failed 1/10000 on my mac before this.

Copy link
Member

@lpinca lpinca Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd, the server is closed after 20 'drain' events. Are you sure that the connection is established when the test fails?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce only with this

diff --git a/test/parallel/test-net-write-fully-async-buffer.js b/test/parallel/test-net-write-fully-async-buffer.js
index 3a3426ba70..6f6123731c 100644
--- a/test/parallel/test-net-write-fully-async-buffer.js
+++ b/test/parallel/test-net-write-fully-async-buffer.js
@@ -16,7 +16,7 @@ const server = net.createServer(common.mustCall(function(conn) {
     let count = 0;
 
     function writeLoop() {
-      if (count++ === 200) {
+      if (count++ === 1) {
         conn.destroy();
         server.close();
         return;

but that is expected (1 'drain' event vs 200).

Can you please also apply the patch from #57022 (comment)? I get some 'read ECONNRESET' errors without it when testing

diff --git a/test/parallel/test-net-write-fully-async-buffer.js b/test/parallel/test-net-write-fully-async-buffer.js
index 3a3426ba70..e3652dcfcc 100644
--- a/test/parallel/test-net-write-fully-async-buffer.js
+++ b/test/parallel/test-net-write-fully-async-buffer.js
@@ -16,8 +16,8 @@ const server = net.createServer(common.mustCall(function(conn) {
     let count = 0;
 
     function writeLoop() {
-      if (count++ === 200) {
-        conn.destroy();
+      if (count++ === 1) {
+        conn.end();
         server.close();
         return;
       }

which is also expected.

Copy link
Member Author

@mcollina mcollina Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I'm lost what patch do you want me to apply or what is wrong with the current state of the PR.

Copy link
Member

@lpinca lpinca Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This #57022 (comment), move server.close() under conn.resume(). After the change from conn.destroy() to conn.end() an ECONNRESET error can be emitted on the client if the same issue we are discussing occurs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still puzzled as to how this problem could occur, but now the test is better regardless.

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 13, 2025
@nodejs-github-bot

This comment was marked as outdated.

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.10%. Comparing base (2bd5694) to head (2f9f49c).
Report is 190 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #57022      +/-   ##
==========================================
- Coverage   89.16%   89.10%   -0.07%     
==========================================
  Files         665      665              
  Lines      192602   193203     +601     
  Branches    37050    37221     +171     
==========================================
+ Hits       171732   172147     +415     
- Misses      13678    13769      +91     
- Partials     7192     7287      +95     

see 93 files with indirect coverage changes

Signed-off-by: Matteo Collina <hello@matteocollina.com>
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 15, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 15, 2025
@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

Signed-off-by: Matteo Collina <hello@matteocollina.com>
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 15, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 15, 2025
@nodejs-github-bot

This comment was marked as outdated.

Co-authored-by: Luigi Pinca <luigipinca@gmail.com>
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 15, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 15, 2025
@nodejs-github-bot

This comment was marked as outdated.

@lpinca
Copy link
Member

lpinca commented Feb 15, 2025

Changes are ok, but the more I think about it, the more it does not make sense. The client was writing at least 200 MB of data and was emitting 200 'drain' events with data not flowing on the other end (socket.resume() never called). This seems more a bug than a race condition in the test ¯_(ツ)_/¯.

@nodejs-github-bot

This comment was marked as outdated.

@mcollina
Copy link
Member Author

Seems to be blocked by nodejs/build#4022.

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 16, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 16, 2025
@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Collaborator

@lpinca lpinca added commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. commit-queue Add this label to land a pull request using GitHub Actions. labels Feb 25, 2025
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Feb 25, 2025
@nodejs-github-bot nodejs-github-bot merged commit 9840ca3 into nodejs:main Feb 25, 2025
65 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 9840ca3

aduh95 pushed a commit that referenced this pull request Mar 9, 2025
Signed-off-by: Matteo Collina <hello@matteocollina.com>
PR-URL: #57022
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. needs-ci PRs that need a full CI run. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants