Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket leak with large YAML test files #313

Open
scottwallacesh opened this issue Mar 22, 2022 · 3 comments
Open

Socket leak with large YAML test files #313

scottwallacesh opened this issue Mar 22, 2022 · 3 comments

Comments

@scottwallacesh
Copy link
Contributor

I have a YAML file with nearly 2000 tests in it. When invoked from the command line, I run out of open file handles due to large amounts of sockets left open:

ERROR: gabbi-runner.input_/foo/bar/__test_l__
	[Errno 24] Too many open files

By default a Linux user has 1024 file handles:

$ ulimit -n
1024

Inspecting the open file handles:

$ ls -l /proc/$(pgrep gabbi-run)/fd | awk '{print $NF}' | cut -f1 -d: | sort | uniq -c
      1 0
      2 /path/to/a/file.txt
      1 /path/to/another/file.yaml
   1021 socket
@scottwallacesh
Copy link
Contributor Author

scottwallacesh commented Mar 22, 2022

I've looked into this and have come up with a hypothesis: the test suite is being built ahead of time and each test is populated with an instance of httpclient.get_http(...) which itself is an instance of Http which inherits from urllib3.PoolManager. This means each test uses a connection pool once, but then the connection pool is left in memory with an open socket that isn't used again. After around 1021 tests, we run out of file handles.

A potential, working solution for this could be the following but it feels more like a hack than anything else.

--- a/gabbi/httpclient.py
+++ b/gabbi/httpclient.py
@@ -51,6 +51,8 @@ class Http(urllib3.PoolManager):
         headers = response.headers
         headers['status'] = str(status)
         headers['reason'] = reason
+
+        self.clear()
         return headers, content
 

@cdent
Copy link
Owner

cdent commented Mar 22, 2022

Good find.

The main reason pool's aren't reused is so that each test can set its own requirements for hostname, verbosity, and cert validation.

And the reason they don't go out of scope is because every test in a single yaml file is kept in memory for the duration of the suite, as this allows the back references ($RESPONSE, $HISTORY, etc). So if you're oriented towards big test suites you should keep that in mind as it might create some memory issues for you. The original design of gabbi (before gabbi-run was a thing) was oriented towards being able to parallelise suites, with each suite being a short "arc" of an API interaction. So not much attention was paid to what would happen with a single big yaml file.

Also, using PoolManager was a late addition. Earlier versions used httplib2, which is much simpler, but it looked like it was no longer being maintained, so a switch was made.

self.clear() is probably a pretty good solution in this case, since there isn't a simple way to reuse the Pools while preserving the goal of tests being able to define the inputs to get_http individually. If we wanted to be clever we could memoize calls to get_http, but that hardly seems worth it here.

Thoughts?

@scottwallacesh
Copy link
Contributor Author

We could clear the pool, as I've done above, for a short-term solution whilst thinking about a better long-term solution. My initial thoughts are to utilise a shared HTTP pool somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants