Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-18036: [Packaging] Build Python wheel for musllinux #45470

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

nveloso
Copy link

@nveloso nveloso commented Feb 9, 2025

Rationale for this change

Please check #18036.

What changes are included in this PR?

Almost everything needed for building and testing python wheels for musllinux.
The service python-wheel-musllinux-test-unittests is currently broken (see next section) and I need to test running the alpine-linux-verify-rc docker image.

Are these changes tested?

I was able to successfully generate a musllinux wheel by running the following:

docker-compose build python-wheel-musllinux-1-2
docker-compose run python-wheel-musllinux-1-2

I was also able to run python-wheel-musllinux-test-imports with no errors.

I'm not able to run python-wheel-musllinux-test-unittests because there are 2 tests failing and I don't think they are related with my changes. Can you please confirm?
The failing tests are:

  • test_uwsgi_integration
  • test_print_stats

I believe the root cause is the same which is related to this:
/arrow/cpp/src/arrow/filesystem/s3fs.cc:3461: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit !!! uWSGI process 3487 got Segmentation Fault !!!

Do you have any idea of what it might be?

Here are some logs of the failed tests:

====================================================================================== FAILURES ======================================================================================
_______________________________________________________________________________ test_uwsgi_integration _______________________________________________________________________________

    @pytest.mark.s3
    def test_uwsgi_integration():
        # GH-44071: using S3FileSystem under uwsgi shouldn't lead to a crash at shutdown
        try:
            subprocess.check_call(["uwsgi", "--version"])
        except FileNotFoundError:
            pytest.skip("uwsgi not installed on this Python")

        port = find_free_port()
        args = ["uwsgi", "-i", "--http", f"127.0.0.1:{port}",
                "--wsgi-file", os.path.join(here, "wsgi_examples.py")]
        proc = subprocess.Popen(args, stdin=subprocess.DEVNULL)
        # Try to fetch URL, it should return 200 Ok...
        try:
            url = f"http://127.0.0.1:{port}/s3/"
            start_time = time.time()
            error = None
            while time.time() < start_time + 5:
                try:
                    with urlopen(url) as resp:
                        assert resp.status == 200
                    break
                except OSError as e:
                    error = e
                    time.sleep(0.1)
            else:
                pytest.fail(f"Could not fetch {url!r}: {error}")
        finally:
            proc.terminate()
        # ... and uwsgi should gracefully shutdown after it's been asked above
>       assert proc.wait() == 30  # UWSGI_END_CODE = 30
E       AssertionError: assert -11 == 30
E        +  where -11 = wait()
E        +    where wait = <Popen: returncode: -11 args: ['uwsgi', '-i', '--http', '127.0.0.1:49245', '...>.wait

usr/local/lib/python3.9/site-packages/pyarrow/tests/test_fs.py:2052: AssertionError
-------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------
2.0.28
-------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------
*** Starting uWSGI 2.0.28 (64bit) on [Sat Feb  8 18:56:14 2025] ***
compiled with version: 13.2.1 20231014 on 31 October 2024 19:02:44
os: Linux-6.8.0-50-generic #51-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov  9 18:03:35 UTC 2024
nodename: ae5a02215122
machine: aarch64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /
detected binary path: /usr/local/bin/python3.9
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 127.0.0.1:49245 fd 4
spawned uWSGI http 1 (pid: 3488)
uwsgi socket 0 bound to TCP address 127.0.0.1:40033 (port auto-assigned) fd 3
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
Python version: 3.9.19 (main, Mar 20 2024, 20:45:15)  [GCC 12.2.1 20220924]
--- Python VM already initialized ---
Python main interpreter initialized at 0xeff90822b840
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 72904 bytes (71 KB) for 1 cores
*** Operational MODE: single process ***
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0xeff90822b840 pid: 3487 (default app)
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
spawned uWSGI worker 1 (and the only) (pid: 3487, cores: 1)
[pid: 3487|app: 0|req: 1/1] 127.0.0.1 () {30 vars in 346 bytes} [Sat Feb  8 18:56:14 2025] GET /s3/ => generated 12 bytes in 20 msecs (HTTP/1.1 200) 1 headers in 44 bytes (1 switches on core 0)
/arrow/cpp/src/arrow/filesystem/s3fs.cc:3461:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
!!! uWSGI process 3487 got Segmentation Fault !!!
________________________________________________________________________ test_print_stats[system_memory_pool] ________________________________________________________________________

pool_factory = <cyfunction system_memory_pool at 0xe04d9b5c3ad0>

    @pytest.mark.parametrize('pool_factory', supported_factories())
    def test_print_stats(pool_factory):
        code = f"""if 1:
            import pyarrow as pa

            pool = pa.{pool_factory.__name__}()
            buf = pa.allocate_buffer(64, memory_pool=pool)
            pool.print_stats()
            """
        res = subprocess.run([sys.executable, "-c", code], check=True,
                             universal_newlines=True, stdout=subprocess.PIPE,
                             stderr=subprocess.PIPE)
        if sys.platform == "linux":
            # On Linux at least, all memory pools should emit statistics
>           assert res.stderr.strip() != ""
E           AssertionError: assert '' != ''
E            +  where '' = <built-in method strip of str object at 0xe04d9c3ec6f0>()
E            +    where <built-in method strip of str object at 0xe04d9c3ec6f0> = ''.strip
E            +      where '' = CompletedProcess(args=['/usr/local/bin/python', '-c', 'if 1:\n        import pyarrow as pa\n\n        pool = pa.system...= pa.allocate_buffer(64, memory_pool=pool)\n        pool.print_stats()\n        '], returncode=0, stdout='', stderr='').stderr

usr/local/lib/python3.9/site-packages/pyarrow/tests/test_memory.py:295: AssertionError

There is also a lot of skipped tests (603) and I'm not sure if this is ok. Here is the final report:
============================================= 2 failed, 7200 passed, 603 skipped, 12 xfailed, 2 xpassed, 5 warnings in 80.21s (0:01:20) ==============================================

Are there any user-facing changes?

I don't think so.

Copy link

github-actions bot commented Feb 9, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@github-actions github-actions bot added the awaiting review Awaiting review label Feb 9, 2025
@kou
Copy link
Member

kou commented Feb 10, 2025

@github-actions crossbow submit wheel-musllinux-*

Copy link

Revision: 732fc35

Submitted crossbow builds: ursacomputing/crossbow @ actions-d80189b582

Task Status
wheel-musllinux-1-2-cp310-cp310-amd64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-arm64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-amd64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-amd64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-arm64 GitHub Actions

@kou kou changed the title GH-18036: [Packaging] Build python wheel for musl linux GH-18036: [Packaging] Build Python wheel for musllinux Feb 10, 2025
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR

@pitrou
Copy link
Member

pitrou commented Feb 10, 2025

I'm not able to run python-wheel-musllinux-test-unittests because there are 2 tests failing and I don't think they are related with my changes. Can you please confirm? The failing tests are:

* test_uwsgi_integration

This one should be investigated, as it ends with a crash in uWSGI, even though the test is meant to check that uWSGI doesn't crash.

* test_print_stats

This one looks like the test is too strict (it assumes that Linux implies glibc), we should probably relax it on musllinux.

@nveloso
Copy link
Author

nveloso commented Feb 15, 2025

I'm not able to run python-wheel-musllinux-test-unittests because there are 2 tests failing and I don't think they are related with my changes. Can you please confirm? The failing tests are:

* test_uwsgi_integration

This one should be investigated, as it ends with a crash in uWSGI, even though the test is meant to check that uWSGI doesn't crash.

This looks related with #44071. It's leaking S3 structures on purpose but it's failing. I was able to check that it failed when calling GetClientFinalizer(). Do you know how to fix this?

EDIT: I was able to make this successful by not leaking S3ClientFinalizer. I don't know what are the implications of this. Can you please check if this makes any sense?

* test_print_stats

This one looks like the test is too strict (it assumes that Linux implies glibc), we should probably relax it on musllinux.

Yes, I checked the glibc malloc source code and they have the malloc_stats method but musl implementation does not expose the same method. Because of that I added a new condition to only check the stderr if musl is not detected on linux.

@pitrou
Copy link
Member

pitrou commented Feb 17, 2025

I get the following error when trying to run the new Alpine Python container:

$ archery docker run alpine-linux-python bash
Traceback (most recent call last):
  ...
ValueError: Found errors with docker-compose:
 - Service `alpine-linux-cpp - alpine-linux-python` is defined in `x-hierarchy` bot not in `services`
 - Service `alpine-linux-cpp` is defined in `services` but not in `x-hierarchy`
 - Service `alpine-linux-python` is defined in `services` but not in `x-hierarchy`

@pitrou
Copy link
Member

pitrou commented Feb 17, 2025

EDIT: I was able to make this successful by not leaking S3ClientFinalizer. I don't know what are the implications of this. Can you please check if this makes any sense?

It might, but it would be nice to get a gdb backtrace of the uWSGI crash before going any further. It is weird that creating a piece of leaked memory would trigger a crash.

@nveloso
Copy link
Author

nveloso commented Feb 17, 2025

It might, but it would be nice to get a gdb backtrace of the uWSGI crash before going any further. It is weird that creating a piece of leaked memory would trigger a crash.

Here is the stacktrace from gdb:

(gdb) bt full
#0  0x0000f1ed0376e7e0 in ?? () from /usr/local/bin/../lib/libpython3.9.so.1.0
No symbol table info available.
#1  0x0000f1ed0376f630 in ?? () from /usr/local/bin/../lib/libpython3.9.so.1.0
No symbol table info available.
#2  0x0000f1ed037cc734 in _Py_VaBuildStack () from /usr/local/bin/../lib/libpython3.9.so.1.0
No symbol table info available.
#3  0x0000f1ed03717998 in ?? () from /usr/local/bin/../lib/libpython3.9.so.1.0
No symbol table info available.
#4  0x0000f1ed0378d9a8 in PyObject_CallFunction () from /usr/local/bin/../lib/libpython3.9.so.1.0
No symbol table info available.
#5  0x0000f1ed037c7468 in PyImport_Import () from /usr/local/bin/../lib/libpython3.9.so.1.0
No symbol table info available.
#6  0x0000f1ed037c737c in PyImport_ImportModule () from /usr/local/bin/../lib/libpython3.9.so.1.0
No symbol table info available.
#7  0x0000f1ed0321aa60 in get_uwsgi_pydict () from /usr/local/lib/python3.9/site-packages/pyuwsgi.cpython-39-aarch64-linux-gnu.so
No symbol table info available.
#8  0x0000f1ed03217c5c in uwsgi_python_atexit () from /usr/local/lib/python3.9/site-packages/pyuwsgi.cpython-39-aarch64-linux-gnu.so
No symbol table info available.
#9  0x0000f1ed03204928 in uwsgi_plugins_atexit () from /usr/local/lib/python3.9/site-packages/pyuwsgi.cpython-39-aarch64-linux-gnu.so
No symbol table info available.
#10 0x0000f1ed0396deb8 in ?? () from /lib/ld-musl-aarch64.so.1
No symbol table info available.
#11 0x0000f1ed032b99f0 in ?? () from /usr/local/lib/python3.9/site-packages/pyuwsgi.cpython-39-aarch64-linux-gnu.so
No symbol table info available.

@pitrou
Copy link
Member

pitrou commented Feb 18, 2025

Here is the stacktrace from gdb:

This is weird, as it doesn't seem to call into Arrow at all? The entire stacktrace is in the Python interpreter.

@pitrou
Copy link
Member

pitrou commented Feb 18, 2025

I think we can simply decide to skip the uwsgi-based test on musllinux, it's not a very important one anyway.

@assignUser
Copy link
Member

@nveloso Thanks for this great first contribution to Arrow!

Sidenote: the wheel builds seem to be pretty inefficient, are we building libarrow 12 times instead of once for amd64 and once for arm64? Of course not something for this PR :)

@nveloso
Copy link
Author

nveloso commented Feb 20, 2025

Thanks for this great first contribution to Arrow!

Sure! Thanks for all the reviews so far!

Sidenote: the wheel builds seem to be pretty inefficient, are we building libarrow 12 times instead of once for amd64 and once for arm64? Of course not something for this PR :)

That's weird... I don't see a reason to why that's happening. How did you check it's building libarrow 12 times for amd64?
Do you think @kou's comment might be related? Will it improve with Mono installed?

@kou
Copy link
Member

kou commented Feb 22, 2025

@github-actions crossbow submit wheel-musllinux-*

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 22, 2025

This comment was marked as outdated.

@assignUser
Copy link
Member

assignUser commented Feb 22, 2025

I don't see a reason to why that's happening.

@nveloso That's just because we build libarrow for each python version (i.e. each line in this table is a job and a build of libarrow), which is not necessary, we could build it once per arch and reuse in the wheel builds. But the current way the jobs are structured doesn't allow for that. But again, that's not on you or something for this PR :)

@nveloso nveloso force-pushed the python-wheel-for-alpine branch from 6894b10 to f61a9e0 Compare February 25, 2025 09:50
@kou
Copy link
Member

kou commented Feb 25, 2025

@github-actions crossbow submit wheel-musllinux-*

Copy link

Revision: f61a9e0

Submitted crossbow builds: ursacomputing/crossbow @ actions-8e3be683cb

Task Status
wheel-musllinux-1-2-cp310-cp310-amd64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-arm64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-amd64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-amd64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-arm64 GitHub Actions

@nveloso
Copy link
Author

nveloso commented Feb 27, 2025

It failed to build wheels for python3.13 because there is no alpine3.18 image with python3.13. Do you see any issues with upgrading alpine from 3.18 to 3.21?

The free-threaded builds also failed. I have to add new dockerfiles for it. I'll also have to build python3.13 free-threaded because alpine does not have any image with that python version.

@kou
Copy link
Member

kou commented Feb 28, 2025

It seems that "musllinux_1_2" is based on Alpine Linux 3.21: https://github.com/pypa/manylinux?tab=readme-ov-file#musllinux_1_2-alpine-linux-321-based-313-compatible

Why do we use Alpine Linux 3.18 for building wheels?

We use the official Docker image for "musllinux_1_2": quay.io/pypa/musllinux_1_2_x86_64

It must use Alpine Linux 3.21.

@raulcd
Copy link
Member

raulcd commented Feb 28, 2025

we can override the default ALPINE_LINUX version from .env by updating the build args on docker-compose. We could also potentially update the default one on .env but that's not strictly required for this PR.

@nveloso
Copy link
Author

nveloso commented Feb 28, 2025

Yes we use the official Docker image for "musllinux_1_2": quay.io/pypa/musllinux_1_2_x86_64 for building the wheel.
The issue here is when we try to run the tests. The tests will use an alpine base image. The current version it will use is the one defined in the .env file. Right now there is the ALPINE_LINUX variable that is set to 3.18. I was wondering if we could update this variable in the .env to be 3.21 instead. Maybe I can just override it as @raulcd said.

@kou
Copy link
Member

kou commented Feb 28, 2025

OK. Can we use something like the following?

diff --git a/dev/tasks/python-wheels/github.linux.yml b/dev/tasks/python-wheels/github.linux.yml
index 0aa8bf9b23..6a7d86c321 100644
--- a/dev/tasks/python-wheels/github.linux.yml
+++ b/dev/tasks/python-wheels/github.linux.yml
@@ -31,6 +31,9 @@ jobs:
     runs-on: ubuntu-24.04-arm
     {% endif %}
     env:
+      {% if linux_wheel_name == "musl" and linux_wheel_version == "1_2" %}
+      ALPINE_LINUX: "3.21"
+      {% endif %}
       # archery uses these environment variables
       {% if arch == "amd64" %}
       ARCH: amd64

… binary does not work on that architecture

Add free-threaded docker services to test the musl wheel
@nveloso nveloso force-pushed the python-wheel-for-alpine branch from 4e8c71a to 870a758 Compare March 1, 2025 13:05
@nveloso
Copy link
Author

nveloso commented Mar 5, 2025

@kou Can you please re-trigger the bot to see if the new changes work?

@kou
Copy link
Member

kou commented Mar 5, 2025

@github-actions crossbow submit wheel-musllinux-*

Copy link

github-actions bot commented Mar 5, 2025

Revision: 870a758

Submitted crossbow builds: ursacomputing/crossbow @ actions-582a8974a4

Task Status
wheel-musllinux-1-2-cp310-cp310-amd64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-arm64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-amd64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-amd64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-arm64 GitHub Actions

@kou
Copy link
Member

kou commented Mar 6, 2025

@github-actions crossbow submit wheel-musllinux-*

Copy link

github-actions bot commented Mar 6, 2025

Revision: c10ed15

Submitted crossbow builds: ursacomputing/crossbow @ actions-e8aca3b5a3

Task Status
wheel-musllinux-1-2-cp310-cp310-amd64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-arm64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-amd64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-amd64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-arm64 GitHub Actions

Add packaging dependency to requirements-wheel-test.txt
Sort parameters of github action in alphabetical order
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants