Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LC_ALL, LANG, and TF_FORCE_GPU_ALLOW_GROWTH ENVs to Dockerfile #212

Merged
merged 1 commit into from
Oct 20, 2020

Conversation

Witiko
Copy link
Contributor

@Witiko Witiko commented Oct 19, 2020

As discussed in OCR-D/ocrd_calamari#46 (comment), the Dockerfile currently omits a number of important environmental variables: LC_ALL, LANG, and TF_FORCE_GPU_ALLOW_GROWTH.

The LC_ALL and LANG environmental variables

Unless the LC_ALL and LANG environmental variables are configured on the host device to use Unicode, running Python in the produced Docker image fails with the following error:

RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/en/7.x/python3/ for mitigation steps.

This system supports the C.UTF-8 locale which is recommended.
You might be able to resolve your issue by exporting the
following environment variables:

    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8

This pull request overrides the values of the LC_ALL and LANG environmental variables to make the Dockerfile portable.

The TF_FORCE_GPU_ALLOW_GROWTH environmental variable

Unless the TF_FORCE_GPU_ALLOW_GROWTH environmental variable is true, a single calamari-recognize process will consume all VRAM, although it only needs ca 4G:

Sat Oct 17 19:35:37 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:3F:00.0 Off |                    0 |
| N/A   50C    P0    28W /  70W |  14850MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:B2:00.0 Off |                    0 |
| N/A   48C    P0    27W /  70W |    106MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4697      C   .../headless-tf1/bin/python3    14847MiB |
|    1   N/A  N/A      4697      C   .../headless-tf1/bin/python3      103MiB |
+-----------------------------------------------------------------------------+

This pull request sets the value of the TF_FORCE_GPU_ALLOW_GROWTH environmental variable to true to make Tensoflow to only allocate GPU memory as needed.

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bertsky
Copy link
Collaborator

bertsky commented Oct 19, 2020

Agreeing on the l10n variables, but surprised to see TF_FORCE_GPU_ALLOW_GROWTH here. This is an implementation detail for just one package of one module. The TF processors should instead use config.gpu_options.allow_growth = True IMHO.

@bertsky
Copy link
Collaborator

bertsky commented Oct 19, 2020

but surprised to see TF_FORCE_GPU_ALLOW_GROWTH here. This is an implementation detail for just one package of one module. The TF processors should instead use config.gpu_options.allow_growth = True IMHO.

Or is this something that we do in fact need to see as deployment detail (deciding to prevent multiple use of the GPU in some cases, and allowing it in others)?

@Witiko
Copy link
Contributor Author

Witiko commented Oct 19, 2020

@bertsky TF_FORCE_GPU_ALLOW_GROWTH could definitely be substituted by patches to the individual modules (although I only encountered issues with calamari, this issue likely affects other modules as well). However, fixing the allocation issue using an environmental variable seems to give the same result with less work and less deviation from the upsteam. Do you suppose allowing TF to allocate the whole VRAM even though it only needs a fraction could be useful for some users?

@bertsky
Copy link
Collaborator

bertsky commented Oct 19, 2020

Do you suppose allowing TF to allocate the whole RAM even though it only needs a fraction could be useful for some users?

Yes, that was my previous question. I have no problem imagining this to be useful in some circumstances. For example, forcing one processor per GPU makes runtime races for GPU resources more controllable (and allow early CPU fallback). But if the processors may grow the allocated GPU RAM, then it might depend on input data (image sizes) and other random features (like which combination of workflow steps happens to run at the same time) whether or not OOM occurs.

However, fixing the allocation issue using an environmental variable seems to give the same result with less work and less deviation from the upsteam.

That's a good argument, but you could also os.setenv['TF_FORCE_GPU_ALLOW_GROWTH'] in ocrd_calamari for that.

Anyway, I think it's enough to remember this might become an issue and adopt the envvar solution for now.

@bertsky bertsky self-requested a review October 19, 2020 19:29
@Witiko
Copy link
Contributor Author

Witiko commented Oct 19, 2020

@bertsky In the case you described, a user could also force a single GPU per process by docker run --env TF_FORCE_GPU_ALLOW_GROWTH=false. However, TF_FORCE_GPU_ALLOW_GROWTH=true seems as the saner default setting, since GPUs are costly and allocating a single GPU per process is usually a waste, especially on shared computing nodes.

@bertsky
Copy link
Collaborator

bertsky commented Oct 20, 2020

In the case you described, a user could also force a single GPU per process by docker run --env TF_FORCE_GPU_ALLOW_GROWTH=false.

Yep, good idea!

However, TF_FORCE_GPU_ALLOW_GROWTH=true seems as the saner default setting, since GPUs are costly and allocating a single GPU per process is usually a waste, especially on shared computing nodes.

It depends on the kind of GPU (how much RAM) and the kind of compute task (RAM requirements). For the case where 1 job already takes more than half of the memory, exclusive allocation is okay and early CPU fallback better than late OOM failure.

Copy link
Collaborator

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modification is fine, thank you, although more comments would have been nice.

@stweil stweil merged commit 67bd9dd into OCR-D:master Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants