-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent segmentation fault running Node emulated with target arm64, host amd64 #215
Comments
I have stumbled on similar thing in different context and ecosystem docker/setup-qemu-action#188 We have a Ruby app depends on quite a bunch of "native" extensions packages - that are packages having to be built through gcc Targeting we have consistent Similar thing happened with a Go project needed some stuff built with gcc. I have also managed to reproduce it locally with qemu v7. What worked was updating to qemu v8, give it a try if it will help. |
Thanks @smoke. This repo only supports up to v7 though right? In the meantime I will try and get a stack, I can see it has been removed in the core dump logout above. |
I've got a similar thing with regular segfaults in gcc while building C or Go applications. It appears that when I create an image with a more recent version of qemu (8.1.5) which is the latest in the repo and use that to install the emulators, the segfault does not happen. Is there a reason that the later versions (later than v7) have not been pushed to dockerhub as the latest version? |
Hitting similar issues when compiling php extensions while building a base image on top of neither v7.0.0 nor 8.1.5 are working in this case. |
When you say locally, do you mean outside of a container, just on the CPU? I've just tried to reproduce with simply It could be a (Edited, realised I'm running into this on both ubuntu 20.04 and 22.04) |
I've also started hitting segmentation faults today when building a multi-platform Docker image on GitHub Actions with a In my case, the segmentation faults occur when compiling the CPP application (that is deployed via the Docker image) during the build of the
The error occurs at random steps in the compilation process. I've had no issues building this Docker image hundreds of times over the past year until today; the last successful build with the same code revision before the segmentation faults started happening was at 00:00 GMT today (23.01.2025), so it looks like something must've changed/been updated since then (or I somehow got lucky before and didn't hit the issue; that seems a bit unlikely though, considering I've had a segmentation fault in every single one of the six workflow runs I did today). Edit: Using QEMU 8.1.5 (as suggested by @smoke) instead of 7.0.0 (which is currently still tagged as - name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:qemu-v8.1.5 Edit 2: Using QEMU 9.2.0 also works fine for me: - name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:qemu-v9.2.0 |
This works around the compilation issues observed with QEMU v7 (which is still used by docker/setup-qemu-action by default) as described in tonistiigi/binfmt#215.
This works around the compilation issues observed with QEMU 7.x (which is still used by docker/setup-qemu-action by default) as described in tonistiigi/binfmt#215.
@ajbarber I have used locally |
I'm also encountering this. I'm using the github qemu action to build a variety of ruby versions for different architectures. Since the last 3 days, Some of the errors I encountered
I run my actions on ubuntu-24.04, and coincidentally there was a runner image update right when this started. I compared builds and with It updates buildx from
|
By limiting it to the only used platform we hopefully can work around issues where files magically matched the ppc64 handler. Example output from dmesg: segfault at 116643c0 ip 00000000004fa380 sp 00007ffe80c32758 error 4 in qemu-ppc64-static[fa380,401000+340000] likely on CPU 6 (core 0, socket 0) Xref: tonistiigi/binfmt#215 Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
This fixes a qemu build issue observed on the (non versioned) tonistiigi/binfmt:latest@sha256:f6b82a01e1... qemu-user-static deploy image. Example failure as observed in the GitHub Actions: Traceback (most recent call last): File "/usr/bin/py3compile", line 323, in <module> main() File "/usr/bin/py3compile", line 302, in main compile(files, versions, File "/usr/bin/py3compile", line 187, in compile cfn = interpreter.cache_file(fn, version) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/share/python3/debpython/interpreter.py", line 212, in cache_file (fname[:-3], self.magic_tag(version), last_char)) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/share/python3/debpython/interpreter.py", line 246, in magic_tag return self._execute('import imp; print(imp.get_tag())', version) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/share/python3/debpython/interpreter.py", line 359, in _execute raise Exception('{} failed with status code {}'.format(command, output['returncode'])) Exception: ('python3.11', '-c', 'import imp; print(imp.get_tag())') failed with status code -11 Xref: tonistiigi/binfmt#215 Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
Update: So having excluded qemu by running their binary directly, now I suspect the issue is somewhere within |
@ajbarber After some digging I'm pretty sure this issue relates to a kernel hardening. This also explains why various qemu versions are affected. More details can be found in this Debian bug: [1]. This bug first happened after [2] was applied (which later was reverted) and reverted again [3] after a fix for QEMU in Debian was available. Probably Ubuntu included just the kernel patch (revert-revert) but not the QEMU patch which then broke things again. |
This apparently fixes sporadic crashes of arm64 image builds, see also [1] and [2]. Ubuntu's version of qemu-user does not seem to have this fixed yet either, therefore inject the current Debian package. In addition, this moves away from the floating docker.io/tonistiigi/binfmt:latest that docker/setup-qemu-action@v3 uses. This loose coupling is questionable, not only in the light of this issue. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087822 [2] tonistiigi/binfmt#215 Co-Developed-by: Felix Moessbauer <felix.moessbauer@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
This apparently fixes sporadic crashes of arm64 image builds, see also [1] and [2]. Ubuntu's version of qemu-user does not seem to have this fixed yet either, therefore inject the current Debian package. In addition, this moves away from the floating docker.io/tonistiigi/binfmt:latest that docker/setup-qemu-action@v3 uses. This loose coupling is questionable, not only in the light of this issue. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087822 [2] tonistiigi/binfmt#215 Co-Developed-by: Felix Moessbauer <felix.moessbauer@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Thanks very much @fmoessbauer. Reading some of those materials you linked, the segfaulting in question of qemu depends not only on release versions, but also the configuration flags passed at build time. We have maintainers of qemu saying clearly not to configure with binfmt/scripts/configure_qemu.sh Line 65 in 85908cc
To confirm @fmoessbauer's hypothesis, I also replicated the crash/no crash behaviour of I think qemu was forward patched in 8.1 to deal with things either way: https://gitlab.com/qemu-project/qemu/-/issues/1763#note_1508827541 So we need either to remove the line above in @tonistiigi do you accept PRs? |
Yes. Do you have example repro as well for this case? |
|
This apparently fixes sporadic crashes of arm64 image builds, see also [1] and [2]. Ubuntu's version of qemu-user does not seem to have this fixed yet either, therefore inject the current Debian package. In addition, this moves away from the floating docker.io/tonistiigi/binfmt:latest that docker/setup-qemu-action@v3 uses. This loose coupling is questionable, not only in the light of this issue. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087822 [2] tonistiigi/binfmt#215 Co-Developed-by: Felix Moessbauer <felix.moessbauer@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
I couldn't find this commit anywhere in this repo Edit: oops i searched the wrong repo |
He is talking about the docker/setup-qemu-action repository but it is not a solution. |
@saltydk yes exactly, unfortunately it crash in the build process now. It's very unreliable, it worked for 2 weeks after my last comment |
The current theory is the kernel bump lead the runners into this issue https://gitlab.com/qemu-project/qemu/-/issues/1913 which was fixed only recently in version 9.2.2 which the container used with latest does not run yet. |
Works for me now with: jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:master Look like the |
No, I made a PR to update qemu to 9.2.2 like suggested here. I was waiting for my builds to finish and with that qemu version everything is green for me as well. Hopefully this works for others too and when the latest tag its switched over it's finally fixed |
this worked for me! https://github.com/cagnulein/qdomyos-zwift/actions/runs/13582493111 |
This reverts commit 3dbfd17. This shouldn't be necessary any longer; QEMU in tonistiigi/binfmt:latest has been updated to 9.2.2, which has this bug fixed - see docker/setup-qemu-action#198 (comment) tonistiigi/binfmt#215 (comment) and https://gitlab.com/qemu-project/qemu/-/issues/1913.
Pin to uraimo/run-on-arch-action@51a6e8b, which includes QEMU 9.2.2 from tonistiigi/binfmt:latest that fixes this - see: uraimo/run-on-arch-action#160 tonistiigi/binfmt#215 (comment) and https://gitlab.com/qemu-project/qemu/-/issues/1913.
Pin to uraimo/run-on-arch-action@51a6e8b, which includes QEMU 9.2.2 from tonistiigi/binfmt:latest that fixes the current CI failure on ppc64le - see: uraimo/run-on-arch-action#160 tonistiigi/binfmt#215 (comment) and https://gitlab.com/qemu-project/qemu/-/issues/1913.
Upgrade `uraimo/run-on-arch-action` to v3, which includes QEMU 9.2.2 from `tonistiigi/binfmt:latest` that fixes the current CI failure on ppc64le - see: uraimo/run-on-arch-action#160 tonistiigi/binfmt#215 (comment) and https://gitlab.com/qemu-project/qemu/-/issues/1913.
Upgrade `uraimo/run-on-arch-action` to v3, which includes QEMU 9.2.2 from `tonistiigi/binfmt:latest` that fixes the current CI failure on ppc64le - see: uraimo/run-on-arch-action#160 tonistiigi/binfmt#215 (comment) and https://gitlab.com/qemu-project/qemu/-/issues/1913.
We are still facing the segfault while building docker images for I tried a few different combinations and they all faced the exact same segfault:
Looks like #244 is also able to reproduce the segfault on the latest 9.2.2. |
@jaywonchung I think you likely need to put |
Wow that did the trick. Thank you so much @bmhowe23. |
I am still facing a similar issue building I am using the latest qemu DIND:
segmentation fault occurs during the apt-get update.
Any help on this would be greatly appreciated! |
Hello, thanks for binfmt.
When running
on a Dockerfile which installs node on ubuntu,
node --version
in the container build intermittently segfaults. I captured it happening withQEMU_STRACE=1
as below.For comparison I also provide a log with
node --version
working, at the same point as where the crash occurred above, but this time working, as it is an intermittent problem.Systemd coredump:
Dockerfile:
The text was updated successfully, but these errors were encountered: