Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawn worker in custom environment #1739

Merged
merged 15 commits into from
May 17, 2024
Merged

Spawn worker in custom environment #1739

merged 15 commits into from
May 17, 2024

Conversation

lhnwrk
Copy link
Contributor

@lhnwrk lhnwrk commented May 6, 2024

As per #1461, MLServer currently spawns worker with the same Python version as the main process, so user-provided environment is forced to match the Python version of the server. However, multiprocessing allows setting the worker's executable path, this PR extends support for spawning workers in custom environments.

@sakoush
Copy link
Member

sakoush commented May 9, 2024

@lhnwrk many thanks for your contribution to add the ability to run mlserver workers on different python versions.

Internally mlserver communicates with these parallel workers using multiprocessing.Queue (check here) and therefore our implementation requires that the main process and the workers to use the same python version (or at least compatible versions) in terms of communication and serialisation. We have not tested actually what versions of python can interplay. It is unlikely though that workers can be on python 2.x for example.

Having said that, the change you propose is slightly orthogonal to the above point as there is an argument to be made that we should set multiprocessing.set_executable from the custom env anyway.

We suggest the following for us to be able to accept this change:

  • Extend the unit tests so that workers can be created using different python versions to back up your change.
  • Provide a list of cases as docs where this change can / cannot be applicable.

We can provide more pointers if you are happy to address the above.

cc: @jesse-c, @lc525

@lhnwrk
Copy link
Contributor Author

lhnwrk commented May 11, 2024

AFAIK we only need the worker process to have the same pickle protocol as the main process for multiprocessing.Queue to pass serialized objects between them. Unfortunately we don't get to control what protocol multiprocessing uses, its ForkingPickler simply falls back to the default pickle protocol. Version 4 has been the default since python 3.8, so any versions since should be compatible with one another as pickle guarantees backwards compatibility.

I've updated the docs to point out the caveats for users, and extended the test suites to include custom environments with different python versions. For now they include the main process's python, the minimum tested python (3.9), and the maximum tested python (3.10) so we cover all cases of running a worker process with the same, lower, or higher python version.

Let me know if there's anything else!

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhnwrk many thanks for addressing the comments. I left minor clarification questions.

MIN_PYTHON_VERSION,
marks=pytest.mark.skipif(
MIN_PYTHON_VERSION >= PYTHON_VERSION,
reason="requires lower Python version",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what does the reason mean here? for example if the current system python version is 3.8 then the parameter (3, 9) is not going to be used? and therefore the reason is probably misleading. Could you clarify please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thought was to test three cases of a worker environment with the same, lower, or higher Python version than the main process, so MIN_PYTHON_VERSION is only tested when it's lower than the main Python for example. This is updated now to just test all python versions between MIN_PYTHON_VERSION and MAX_PYTHON_VERSION.

using pickled objects. Custom environments therefore **must** use the same
version of MLServer and a compatible version of Python with the same [default
pickle protocol](https://docs.python.org/3/library/pickle.html#pickle.DEFAULT_PROTOCOL)
as the main process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add a note about the specific python versions that are currently supported. i.e. a table showcasing main process and parallel workers python versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a table here to clarify supported/tested worker environment!

@@ -2,7 +2,7 @@ name: custom-runtime-environment
channels:
- conda-forge
dependencies:
- python == 3.8
- python == 3.9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good spot

@sakoush
Copy link
Member

sakoush commented May 15, 2024

@lhnwrk could you rebase on top of master to make sure that tests are passing on this PR?

@lhnwrk
Copy link
Contributor Author

lhnwrk commented May 15, 2024

@sakoush Rebased on master!

- scikit-learn == 1.0.2
- pip:
- mlserver == 1.3.0.dev2
- git+${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git@${GITHUB_REF}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to installing mlserver from git here since worker environment has to match main process for this PR. GitHub Actions should set these environment variables to the fork/branch so the worker environment installs the same mlserver, but default to SeldonIO/MLServer's master branch in tox.ini otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhnwrk is it possible to install from the local mlserver directory so it is easier logic and also locally we might want to be testing from changes done locally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did go with this option first, but as it turns out this breaks docker build tests since the local mlserver directory is not available inside the container and the template Dockerfile only copies environment file. It has been a hassle to test locally though, what do you think if we use a separate yaml with a pinned version of mlserver for the CLI build test and install local directory for others?

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhnwrk many thanks for the changes, i did add a couple more minor comments but it looks great otherwise.

I will also restart the failing CI tests.

@@ -24,6 +25,14 @@
InferencePoolHook = Callable[[Worker], Awaitable[None]]


def _spawn_worker(settings: Settings, responses: Queue, env: Optional[Environment]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add return type hint

- scikit-learn == 1.0.2
- pip:
- mlserver == 1.3.0.dev2
- git+${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git@${GITHUB_REF}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhnwrk is it possible to install from the local mlserver directory so it is easier logic and also locally we might want to be testing from changes done locally?

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! nice one @lhnwrk.

@sakoush sakoush merged commit b848836 into SeldonIO:master May 17, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants