-
-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oci_image
based on pytorch
fails with "could not parse reference"
#436
Comments
Could you please share the version of rules_oci you are using? pytorch is known to have huge layers which might not fit into memory therefore get OOM killed? |
Sorry, forgot: it is the latest (1.4.3). Yes, layers are pretty large. Any way we can test this theory? I suppose I could try to find another big image to see if I get the same error. |
you can try another big image to see if it also fails. |
cross ref: bazelbuild/bazel#17368 |
Yes, Sahin, you are right. I tested
I guess my only question at this point is whether the failure message can hint to this being the problem? It took wayy to much effort to figure this out. |
OOM kills are hard to detect unfortunately. there are some efforts on bazel side to estimate resource usage for actions but that's not fixed yet. |
We can do better here though, we can print en error message to the log if the process gets killed for any reason and hint that it might be due to insufficient memory. |
fixed by #560 |
I am trying to build a simple "empty" image based on pytorch:
Building
pytorch_image
fails with an error I cannot decipher:This only happens when using the pytorch image, any other base image we are using is OK.
I'd love to share a full reproduction, but I can so far only reproduce this in our custom RBE environment, which is not easy to share here. My hope is that somebody can tell me if this is a problem with
rules_oci
,pytorch
or RBE?Thanks so much!
The text was updated successfully, but these errors were encountered: