Spawn worker in custom environment #1739

lhnwrk · 2024-05-06T16:48:04Z

As per #1461, MLServer currently spawns worker with the same Python version as the main process, so user-provided environment is forced to match the Python version of the server. However, multiprocessing allows setting the worker's executable path, this PR extends support for spawning workers in custom environments.

sakoush · 2024-05-09T19:54:39Z

@lhnwrk many thanks for your contribution to add the ability to run mlserver workers on different python versions.

Internally mlserver communicates with these parallel workers using multiprocessing.Queue (check here) and therefore our implementation requires that the main process and the workers to use the same python version (or at least compatible versions) in terms of communication and serialisation. We have not tested actually what versions of python can interplay. It is unlikely though that workers can be on python 2.x for example.

Having said that, the change you propose is slightly orthogonal to the above point as there is an argument to be made that we should set multiprocessing.set_executable from the custom env anyway.

We suggest the following for us to be able to accept this change:

Extend the unit tests so that workers can be created using different python versions to back up your change.
Provide a list of cases as docs where this change can / cannot be applicable.

We can provide more pointers if you are happy to address the above.

cc: @jesse-c, @lc525

lhnwrk · 2024-05-11T05:54:05Z

AFAIK we only need the worker process to have the same pickle protocol as the main process for multiprocessing.Queue to pass serialized objects between them. Unfortunately we don't get to control what protocol multiprocessing uses, its ForkingPickler simply falls back to the default pickle protocol. Version 4 has been the default since python 3.8, so any versions since should be compatible with one another as pickle guarantees backwards compatibility.

I've updated the docs to point out the caveats for users, and extended the test suites to include custom environments with different python versions. For now they include the main process's python, the minimum tested python (3.9), and the maximum tested python (3.10) so we cover all cases of running a worker process with the same, lower, or higher python version.

Let me know if there's anything else!

sakoush

@lhnwrk many thanks for addressing the comments. I left minor clarification questions.

sakoush · 2024-05-13T08:21:13Z

tests/conftest.py

+            MIN_PYTHON_VERSION,
+            marks=pytest.mark.skipif(
+                MIN_PYTHON_VERSION >= PYTHON_VERSION,
+                reason="requires lower Python version",


I am not sure what does the reason mean here? for example if the current system python version is 3.8 then the parameter (3, 9) is not going to be used? and therefore the reason is probably misleading. Could you clarify please?

My original thought was to test three cases of a worker environment with the same, lower, or higher Python version than the main process, so MIN_PYTHON_VERSION is only tested when it's lower than the main Python for example. This is updated now to just test all python versions between MIN_PYTHON_VERSION and MAX_PYTHON_VERSION.

tests/conftest.py

sakoush · 2024-05-13T08:24:07Z

docs/user-guide/custom.md

+using pickled objects. Custom environments therefore **must** use the same
+version of MLServer and a compatible version of Python with the same [default
+pickle protocol](https://docs.python.org/3/library/pickle.html#pickle.DEFAULT_PROTOCOL)
+as the main process.


could you also add a note about the specific python versions that are currently supported. i.e. a table showcasing main process and parallel workers python versions?

Added a table here to clarify supported/tested worker environment!

sakoush · 2024-05-13T08:34:39Z

tests/testdata/environment.yml

@@ -2,7 +2,7 @@ name: custom-runtime-environment
 channels:
  - conda-forge
 dependencies:
-  - python == 3.8
+  - python == 3.9


sakoush · 2024-05-15T19:02:32Z

@lhnwrk could you rebase on top of master to make sure that tests are passing on this PR?

lhnwrk · 2024-05-15T22:47:28Z

@sakoush Rebased on master!

lhnwrk · 2024-05-15T23:00:41Z

tests/testdata/environment.yml

  - scikit-learn == 1.0.2
  - pip:
-      - mlserver == 1.3.0.dev2
+      - git+${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git@${GITHUB_REF}


I switched to installing mlserver from git here since worker environment has to match main process for this PR. GitHub Actions should set these environment variables to the fork/branch so the worker environment installs the same mlserver, but default to SeldonIO/MLServer's master branch in tox.ini otherwise.

@lhnwrk is it possible to install from the local mlserver directory so it is easier logic and also locally we might want to be testing from changes done locally?

I did go with this option first, but as it turns out this breaks docker build tests since the local mlserver directory is not available inside the container and the template Dockerfile only copies environment file. It has been a hassle to test locally though, what do you think if we use a separate yaml with a pinned version of mlserver for the CLI build test and install local directory for others?

sakoush

@lhnwrk many thanks for the changes, i did add a couple more minor comments but it looks great otherwise.

I will also restart the failing CI tests.

sakoush · 2024-05-16T12:38:16Z

mlserver/parallel/pool.py

@@ -24,6 +25,14 @@
 InferencePoolHook = Callable[[Worker], Awaitable[None]]


+def _spawn_worker(settings: Settings, responses: Queue, env: Optional[Environment]):


add return type hint

sakoush · 2024-05-16T14:36:09Z

tests/testdata/environment.yml

  - scikit-learn == 1.0.2
  - pip:
-      - mlserver == 1.3.0.dev2
+      - git+${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git@${GITHUB_REF}


@lhnwrk is it possible to install from the local mlserver directory so it is easier logic and also locally we might want to be testing from changes done locally?

sakoush

lgtm! nice one @lhnwrk.

sakoush reviewed May 13, 2024

View reviewed changes

lnguyenPHI and others added 7 commits May 15, 2024 15:04

Spawn worker in custom environment

b42ac68

Linting

68e7361

Update tests

cd2aa8e

Pass exit_code to inference pool

185b9ce

Add python version tests

66a6fe5

Update docs

ad973a4

Revert registry test

f45c086

lhnwrk force-pushed the worker-env branch from 2df71c9 to f45c086 Compare May 15, 2024 19:37

lnguyenPHI added 5 commits May 15, 2024 21:54

Install mlserver from github in environment.yml

45bae01

Test all python versions

d6d7720

Save yml to tarball path

1f134c1

Formatting

442eb14

Add compatibility table to docs

eea9b4d

lhnwrk commented May 15, 2024

View reviewed changes

sakoush reviewed May 16, 2024

View reviewed changes

lnguyenPHI and others added 3 commits May 17, 2024 01:47

Render environment.yml for docker builds

87d3d0b

Add type hint

7bba93a

Use github ref

5105567

sakoush approved these changes May 17, 2024

View reviewed changes

sakoush merged commit b848836 into SeldonIO:master May 17, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spawn worker in custom environment #1739

Spawn worker in custom environment #1739

lhnwrk commented May 6, 2024 •

edited

Loading

sakoush commented May 9, 2024

lhnwrk commented May 11, 2024 •

edited

Loading

sakoush left a comment

sakoush May 13, 2024

lhnwrk May 15, 2024

sakoush May 13, 2024

lhnwrk May 15, 2024

sakoush May 13, 2024

sakoush commented May 15, 2024

lhnwrk commented May 15, 2024

lhnwrk May 15, 2024

sakoush May 16, 2024

lhnwrk May 17, 2024

sakoush left a comment

sakoush May 16, 2024

sakoush May 16, 2024

sakoush left a comment

		@@ -24,6 +25,14 @@
		InferencePoolHook = Callable[[Worker], Awaitable[None]]


		def _spawn_worker(settings: Settings, responses: Queue, env: Optional[Environment]):

Spawn worker in custom environment #1739

Spawn worker in custom environment #1739

Conversation

lhnwrk commented May 6, 2024 • edited Loading

sakoush commented May 9, 2024

lhnwrk commented May 11, 2024 • edited Loading

sakoush left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakoush commented May 15, 2024

lhnwrk commented May 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakoush left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakoush left a comment

Choose a reason for hiding this comment

lhnwrk commented May 6, 2024 •

edited

Loading

lhnwrk commented May 11, 2024 •

edited

Loading