Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel crashes with javax.net.ssl.SSLException ... BAD_DECRYPT #18965

Closed
coryan opened this issue Jul 18, 2023 · 4 comments
Closed

Bazel crashes with javax.net.ssl.SSLException ... BAD_DECRYPT #18965

coryan opened this issue Jul 18, 2023 · 4 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@coryan
Copy link

coryan commented Jul 18, 2023

Description of the bug:

We use GCS as a remote cache. Our builds on macOS (and only macOS so far) fail from time to time with errors such as:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=ConfiguredTargetKey{label=//google/cloud/aiplatform:v1_samples_vizier_client_samples, 

File:[[<execution_root>]bazel-out/darwin-fastbuild/bin]_solib_darwin_x86_64/libexternal_Scom_Ugoogle_Uprotobuf_Ssrc_Sgoogle_Sprotobuf_Slibany_Uproto.upb.dylib, File:[[<execution_root>]bazel-out/darwin-fastbuild/bin]google/cloud/aiplatform/v1_samples_vizier_client_samples]}', ...)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:642)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:480)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more
Caused by: javax.net.ssl.SSLException: error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1071)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1365)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1305)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1392)
	at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:216)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1342)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1235)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1284)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
	... 21 more

https://github.com/googleapis/google-cloud-cpp/actions/runs/5582043990/jobs/10200901653

Running the build again succeeds.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I do not have any simple repro.

Which operating system are you running Bazel on?

macOS 12.6.6 21G646

What is the output of bazel info release?

release 6.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

N/A

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

N/A

Have you found anything relevant by searching the web?

I found a previous report:

#15142

It was closed because (I think) the original bug requested an upgrade of netty, and that was done, but the motivation (fixing this bug) remained.

I also found:

netty/netty#11815

Any other information, logs, or outputs that you want to share?

https://github.com/googleapis/google-cloud-cpp/actions/runs/5582043990/jobs/10200901653 may be handy, though I think it will expire in 90d or so.

@Pavank1992 Pavank1992 added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Jul 18, 2023
@wilwell wilwell added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jul 18, 2023
@coeuvre
Copy link
Member

coeuvre commented Jul 18, 2023

Thanks for reporting the issue. I think the error is from upstream and we cannot easily fix it. However, I do understand it's annoying to see this transient error.

I am proposing to add this error and a retriable HTTP error so that Bazel can automatically resend the network request in this case. I don't know the security implications of retrying a SSL error so I am going to wait for a few days before I am actually working on the fix. Please speak up if you don't agree.

@coryan
Copy link
Author

coryan commented Jul 28, 2023

Retrying sounds like a great idea in this case.

@coryan
Copy link
Author

coryan commented Jul 28, 2023

coeuvre added a commit to coeuvre/bazel that referenced this issue Aug 28, 2023
Fixes bazelbuild#18965.

Closes bazelbuild#19321.

PiperOrigin-RevId: 560034218
Change-Id: I19a595fb6ba8e0a4cf46f25b0b33807282ca586c
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

devbww added a commit to devbww/google-cloud-cpp that referenced this issue Oct 19, 2023
Now that the build uses Bazel v6.4.0, which includes the fix
for bazelbuild/bazel#18965, we can
remove the retry.
devbww added a commit to googleapis/google-cloud-cpp that referenced this issue Oct 20, 2023
Now that the build uses Bazel v6.4.0, which includes the fix
for bazelbuild/bazel#18965, we can
remove the retry added in #12127.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants