-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LC_ALL, LANG, and TF_FORCE_GPU_ALLOW_GROWTH ENVs to Dockerfile #212
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Agreeing on the l10n variables, but surprised to see |
Or is this something that we do in fact need to see as deployment detail (deciding to prevent multiple use of the GPU in some cases, and allowing it in others)? |
@bertsky |
Yes, that was my previous question. I have no problem imagining this to be useful in some circumstances. For example, forcing one processor per GPU makes runtime races for GPU resources more controllable (and allow early CPU fallback). But if the processors may grow the allocated GPU RAM, then it might depend on input data (image sizes) and other random features (like which combination of workflow steps happens to run at the same time) whether or not OOM occurs.
That's a good argument, but you could also Anyway, I think it's enough to remember this might become an issue and adopt the envvar solution for now. |
@bertsky In the case you described, a user could also force a single GPU per process by |
Yep, good idea!
It depends on the kind of GPU (how much RAM) and the kind of compute task (RAM requirements). For the case where 1 job already takes more than half of the memory, exclusive allocation is okay and early CPU fallback better than late OOM failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modification is fine, thank you, although more comments would have been nice.
As discussed in OCR-D/ocrd_calamari#46 (comment), the Dockerfile currently omits a number of important environmental variables:
LC_ALL
,LANG
, andTF_FORCE_GPU_ALLOW_GROWTH
.The
LC_ALL
andLANG
environmental variablesUnless the
LC_ALL
andLANG
environmental variables are configured on the host device to use Unicode, running Python in the produced Docker image fails with the following error:This pull request overrides the values of the
LC_ALL
andLANG
environmental variables to make the Dockerfile portable.The
TF_FORCE_GPU_ALLOW_GROWTH
environmental variableUnless the
TF_FORCE_GPU_ALLOW_GROWTH
environmental variable istrue
, a singlecalamari-recognize
process will consume all VRAM, although it only needs ca 4G:This pull request sets the value of the
TF_FORCE_GPU_ALLOW_GROWTH
environmental variable totrue
to make Tensoflow to only allocate GPU memory as needed.