Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL: bazel crashed due to an internal error: UnsatisfiedLinkError nativeIsSymlinkOrJunction #20677

Closed
jayconrod opened this issue Dec 26, 2023 · 9 comments
Labels
area-Windows Windows-specific issues and feature requests team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug untriaged

Comments

@jayconrod
Copy link
Contributor

Description of the bug:

When running bazel --output_user_root=C:\_b clean --expunge in our Windows CI environment, we get the following error.

java.lang.UnsatisfiedLinkError: 'int com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(java.lang.String, boolean[], java.lang.String[])'
	at com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(Native Method)
	at com.google.devtools.build.lib.windows.WindowsFileOperations.isSymlinkOrJunction(WindowsFileOperations.java:141)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.isSymlinkOrJunction(WindowsFileSystem.java:275)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.fileIsSymbolicLink(WindowsFileSystem.java:135)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.stat(WindowsFileSystem.java:158)
	at com.google.devtools.build.lib.vfs.FileSystem.statNullable(FileSystem.java:525)
	at com.google.devtools.build.lib.vfs.FileSystem.readdir(FileSystem.java:646)
	at com.google.devtools.build.lib.vfs.Path.readdir(Path.java:268)
	at com.google.devtools.build.lib.exec.BinTools.scanDirectoryRecursively(BinTools.java:132)
	at com.google.devtools.build.lib.exec.BinTools.forProduction(BinTools.java:80)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:1299)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:1024)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:769)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:95)

Oddly, our regular bazel test command succeeds, when run immediately after. The only difference in configuration I can think of is that the bazel test command is invoked with --nosystem_rc. Our system rc file just contains startup --output_user_root=C:\\_b, so I don't think it makes a difference for this purpose.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

The command that fails is:

bazel --output_user_root=C:\_b clean --expunge

I'm not sure what causes this. I've seen it happen in our CI environment, but not when that command is run directly from a shell. I couldn't reproduce it on a dev machine.

Which operating system are you running Bazel on?

Windows Server 2022, on x86_64

What is the output of bazel info release?

release 8.0.0-pre.20231030.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

This is via Bazelisk v1.19.0.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

n/a

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Not sure. We first saw this about a week after we upgraded from 7.0.0-pre.20230926.1 to 8.0.0-pre.20231030.2.

Have you found anything relevant by searching the web?

#19250 has the same error message, but that happened on ARM64 whereas I'm seeing this on x86_64. It also seems like that was more consistently reproducible.

Any other information, logs, or outputs that you want to share?

cc @meteorcloudy in case you might know something from #19250 which would affect x86_64 too.

@sgowroji sgowroji added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Dec 27, 2023
@sgowroji sgowroji added the area-Windows Windows-specific issues and feature requests label Dec 27, 2023
@meteorcloudy
Copy link
Member

I couldn't reproduce this issue on a Windows VM.

pcloudy@BAZEL-WINDOWS-P C:\Users\pcloudy\workdir\bazel>bazel version
Bazelisk version: v1.16.0
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
Build label: 8.0.0-pre.20231030.2
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Nov 14 23:02:05 2023 (1700002925)
Build timestamp: 1700002925
Build timestamp as int: 1700002925

pcloudy@BAZEL-WINDOWS-P C:\Users\pcloudy\workdir\bazel>bazel --output_user_root=C:\_b --nosystem_rc clean --expunge
Starting local Bazel server and connecting to it...
INFO: Starting clean.

Anything else useful for debugging? Does it still happen with USE_BAZEL_VERSION=last_green?

@jayconrod
Copy link
Contributor Author

I tried setting USE_BAZEL_VERSION=last_green, still crashes. I noticed bazel version also crashes in this configuration, so it's not an issue with bazel clean specifically.

As I mentioned earlier, we do have a script that runs bazel test successfully. It's in a different configuration, but I can go through and figure out what the difference is. I'm guessing this is a misconfiguration on my side.

What would cause this error though? I'm guessing Bazel needs to extract its JNI DLL and then fails to load that, but why might that happen?

Here's the output from bazel version

WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
c:\programdata\bazel.bazelrc
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from c:\actionsrunner\_work\engflow\engflow\.bazelrc: --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1 --host_jvm_args=-XX:InitialRAMPercentage=25 --host_jvm_args=-XX:MaxRAMPercentage=70
Server crashed during startup. Now printing c:\_b\bpc5uood\server\jvm.out
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.UnsatisfiedLinkError: 'int com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(java.lang.String, boolean[*** java.lang.String[***)'
	at com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(Native Method)
	at com.google.devtools.build.lib.windows.WindowsFileOperations.isSymlinkOrJunction(WindowsFileOperations.java:141)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.isSymlinkOrJunction(WindowsFileSystem.java:275)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.fileIsSymbolicLink(WindowsFileSystem.java:135)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.stat(WindowsFileSystem.java:158)
	at com.google.devtools.build.lib.vfs.FileSystem.statNullable(FileSystem.java:5[25](https://github.com/EngFlow/engflow/actions/runs/7388647371/job/20099887654#step:3:25))
	at com.google.devtools.build.lib.vfs.FileSystem.readdir(FileSystem.java:646)
	at com.google.devtools.build.lib.vfs.Path.readdir(Path.java:[26](https://github.com/EngFlow/engflow/actions/runs/7388647371/job/20099887654#step:3:26)8)
	at com.google.devtools.build.lib.exec.BinTools.scanDirectoryRecursively(BinTools.java:132)
	at com.google.devtools.build.lib.exec.BinTools.forProduction(BinTools.java:80)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:1302)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:10[27](https://github.com/EngFlow/engflow/actions/runs/7388647371/job/20099887654#step:3:27))
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:771)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:95)

@tjgq
Copy link
Contributor

tjgq commented Jan 2, 2024

The DLL where the nativeIsSymlinkOrJunction method is defined is loaded elsewhere (specifically here). If it hasn't been loaded by the time we get to that code, it's too late. But, for reasons, a failure to load the JNI doesn't produce an error, it only prints a warning to the server log.

Is there anything in the server log (cat $(bazel info server_log)) indicating that the JNI wasn't successfully loaded?

@jayconrod
Copy link
Contributor Author

Thanks @tjgq, I didn't know where to look for the server log. bazel info also crashed, but it was easy enough to find based on other output.

The root cause is that our CI script deletes not only the contents of $TEMP but the directory itself, so Bazel fails to extract the JNI DLL. The stack is below.

240102 18:51:06.927:WT 1 [com.google.devtools.build.lib.jni.JniLoader.*** Failed to load JNI library from resource
java.nio.file.NoSuchFileException: C:\Users\ADMINI~1\AppData\Local\Temp\bazel-jni.15087218107965058839
	at java.base/sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at java.base/sun.nio.fs.WindowsFileSystemProvider.createDirectory(Unknown Source)
	at java.base/java.nio.file.Files.createDirectory(Unknown Source)
	at java.base/java.nio.file.TempFileHelper.create(Unknown Source)
	at java.base/java.nio.file.TempFileHelper.createTempDirectory(Unknown Source)
	at java.base/java.nio.file.Files.createTempDirectory(Unknown Source)
	at com.google.devtools.build.lib.jni.JniLoader.loadLibrary(JniLoader.java:91)
	at com.google.devtools.build.lib.jni.JniLoader.(JniLoader.java:60)
	at com.google.devtools.build.lib.platform.SystemSuspensionModule.(SystemSuspensionModule.java:30)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unknown Source)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.ensureClassInitialized(Unknown Source)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(Unknown Source)
	at java.base/jdk.internal.reflect.ReflectionFactory.newConstructorAccessor(Unknown Source)
	at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.createModules(BlazeRuntime.java:[78](https://github.com/EngFlow/engflow/actions/runs/7389414975/job/20102203579#step:3:78)5)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:761)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:[95](https://github.com/EngFlow/engflow/actions/runs/7389414975/job/20102203579#step:3:95))

It's reasonable for Bazel to assume the temp directory exists and crash if it doesn't, so perhaps this issue can be closed without action. However, I'd love a more helpful error message. If the server crashes on startup for whatever reason, it would be great for the client to print that log or tell the user where to find it.

@tjgq
Copy link
Contributor

tjgq commented Jan 2, 2024

Yes, we should have failed earlier with the more informative error message (and not expect users to go look for it in the log). I have a pending change to do so.

copybara-service bot pushed a commit to bazelbuild/intellij that referenced this issue Jan 2, 2024
Unlike on Unix, we can't fall back to an alternative implementation. Failure to emit an early error would cause later native method calls to produce a less informative error message.

See bazelbuild/bazel#20677 for a recent example where this would have been helpful.

Also note some minor changes:
* Since we attempt to load the JNI in two different ways, make sure to propagate both exceptions (one suppressed by the other).
* The default switch case is unnecessary, as errorprone already enforces exhaustiveness.

PiperOrigin-RevId: 595206761
copybara-service bot pushed a commit that referenced this issue Jan 2, 2024
Unlike on Unix, we can't fall back to an alternative implementation. Failure to emit an early error would cause later native method calls to produce a less informative error message.

See #20677 for a recent example where this would have been helpful.

Also note some minor changes:
* Since we attempt to load the JNI in two different ways, make sure to propagate both exceptions (one suppressed by the other).
* The default switch case is unnecessary, as errorprone already enforces exhaustiveness.

PiperOrigin-RevId: 595206761
Change-Id: I4b6258d59d9a4fcc9f83418375acc51a32d7f4a4
@tjgq
Copy link
Contributor

tjgq commented Jan 3, 2024

Yes, we should have failed earlier with the more informative error message (and not expect users to go look for it in the log). I have a pending change to do so.

See 2d93597.

@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
mai93 pushed a commit to bazelbuild/intellij that referenced this issue Jan 15, 2024
Unlike on Unix, we can't fall back to an alternative implementation. Failure to emit an early error would cause later native method calls to produce a less informative error message.

See bazelbuild/bazel#20677 for a recent example where this would have been helpful.

Also note some minor changes:
* Since we attempt to load the JNI in two different ways, make sure to propagate both exceptions (one suppressed by the other).
* The default switch case is unnecessary, as errorprone already enforces exhaustiveness.

(cherry picked from commit e95d1ee)
tjgq added a commit to tjgq/bazel that referenced this issue Jan 26, 2024
…error.

Unlike on Unix, we can't fall back to an alternative implementation. Failure to emit an early error would cause later native method calls to produce a less informative error message.

See bazelbuild#20677 for a recent example where this would have been helpful.

Also note some minor changes:
* Since we attempt to load the JNI in two different ways, make sure to propagate both exceptions (one suppressed by the other).
* The default switch case is unnecessary, as errorprone already enforces exhaustiveness.

PiperOrigin-RevId: 595206761
Change-Id: I4b6258d59d9a4fcc9f83418375acc51a32d7f4a4
github-merge-queue bot pushed a commit that referenced this issue Jan 27, 2024
…error. (#21090)

Unlike on Unix, we can't fall back to an alternative implementation.
Failure to emit an early error would cause later native method calls to
produce a less informative error message.

See #20677 for a recent
example where this would have been helpful.

Also note some minor changes:
* Since we attempt to load the JNI in two different ways, make sure to
propagate both exceptions (one suppressed by the other).
* The default switch case is unnecessary, as errorprone already enforces
exhaustiveness.

PiperOrigin-RevId: 595206761
Change-Id: I4b6258d59d9a4fcc9f83418375acc51a32d7f4a4
@matts1
Copy link
Contributor

matts1 commented May 29, 2024

FWIW, I had a very similar issue.

I had the following error message pop up:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.UnsatisfiedLinkError: 'void com.google.devtools.build.lib.profiler.SystemNetworkStats.getNetIfAddrsNative(java.util.List)'
	at com.google.devtools.build.lib.profiler.SystemNetworkStats.getNetIfAddrsNative(Native Method)
	at com.google.devtools.build.lib.profiler.SystemNetworkStats.getNetIfAddrs(SystemNetworkStats.java:118)
	at com.google.devtools.build.lib.profiler.NetworkMetricsCollector.getLocalLoopbackInterfaces(NetworkMetricsCollector.java:127)
	at com.google.devtools.build.lib.profiler.NetworkMetricsCollector.collectSystemNetworkUsages(NetworkMetricsCollector.java:70)
	at com.google.devtools.build.lib.profiler.CollectLocalResourceUsage$Collector.run(CollectLocalResourceUsage.java:259)

Upon following @tjgq's instructions of opening the server log, I found the root cause to be:

240529 00:16:54.381:I 1 [com.google.devtools.build.lib.analysis.BlazeVersionInfo.logVersionInfo] Bazel version info: Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Thu Jan 01 00:00:00 1970 (0)
Build timestamp: Thu Jan 01 00:00:00 1970 (0)
Build timestamp as int: 0
240529 00:16:54.586:WT 1 [com.google.devtools.build.lib.jni.JniLoader.<clinit>] Failed to load JNI library
java.lang.UnsatisfiedLinkError: /tmp/bazel-jni.1234945722920452513/libunix_jni.so: libstdc++.so.6: cannot open shared object file: No such file or directory
	at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
	at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:350)
	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:216)
	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:158)
	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2399)
	at java.base/java.lang.Runtime.load0(Runtime.java:852)
	at java.base/java.lang.System.load(System.java:2030)
	at com.google.devtools.build.lib.jni.JniLoader.loadLibrary(JniLoader.java:104)
	at com.google.devtools.build.lib.jni.JniLoader.<clinit>(JniLoader.java:45)
	at com.google.devtools.build.lib.platform.SystemSuspensionModule.<clinit>(SystemSuspensionModule.java:30)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1160)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.ensureClassInitialized(MethodHandleAccessorFactory.java:340)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(MethodHandleAccessorFactory.java:103)
	at java.base/jdk.internal.reflect.ReflectionFactory.newConstructorAccessor(ReflectionFactory.java:173)
	at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Constructor.java:549)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.createModules(BlazeRuntime.java:799)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:775)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:103)

Should we treat the failure to load libunix_jni.so as an error too? One of the problems with not treating it as an error is that bazel info server_log was also crashing bazel for me 😬 .

@comius comius reopened this May 29, 2024
@coeuvre
Copy link
Member

coeuvre commented May 29, 2024

Which Bazel version did you use? This specific error should be fixed by 388f687 where it checks whether the JNI was successfully loaded before calling into native methods.

@matts1
Copy link
Contributor

matts1 commented May 29, 2024

I was using last_green from a few days ago. I can confirm that when I sync to head your fix appears to work for my specific issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Windows Windows-specific issues and feature requests team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug untriaged
Projects
None yet
Development

No branches or pull requests

9 participants