Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty images gracefully #48

Closed
kba opened this issue Dec 17, 2020 · 3 comments
Closed

Handle empty images gracefully #48

kba opened this issue Dec 17, 2020 · 3 comments
Assignees

Comments

@kba
Copy link
Member

kba commented Dec 17, 2020

OK, I've installed OCR-d for the first time, it worked in most parts out of the box and I was able to reproduce the problem. Your errors seem to be caused by OCR-d processors, not by calamari.
Somehow the line segmentation produces empty lines or lines that are outside of text regions. When the empty images are converted to numpy (by ocrd_calamari, not by calamari), numpy throws an uncaught exception. You could fix it by inserting before line 77 in ocrd_calamari/recognize.py something like line_image = line_image if all(line_image.size) else [[0]], but that's only a temporary hack to avoid the error. I'm also not sure if their workspace.image_from_segment or even the line segmentation processor is supposed to produce empty lines at all, so maybe the real problem is somewhere deeper in the guts of the OCR-d machinery.

Originally posted by @andbue in Calamari-OCR/calamari#193 (comment)

@mikegerber
Copy link
Collaborator

Workflow by @jbarth-ubhd that should produce the problem:

image: https://digi.ub.uni-heidelberg.de/diglitData/v/ocrd/hdz1886a_-_248_4.tif

workflow:

ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-001 -P model $HOME/ocrd_models/sbb/binarization/models
ocrd-cis-ocropy-deskew -I OCR-D-001 -O OCR-D-002
ocrd-sbb-textline-detector -I OCR-D-002 -O OCR-D-003 -P model $HOME/ocrd_models/sbb/textline
ocrd-calamari-recognize -I OCR-D-003 -O OCR-D-OCR -P checkpoint "$HOME/ocrd_models/calamari/calamari_models/gt4histocr/*.ckpt.json"

Calamari-OCR/calamari#193 (comment)

@mikegerber
Copy link
Collaborator

To reproduce, use this workspace: https://qurator-data.de/~mike.gerber/2021-01%20ocrd_calamari-issue-48/workspace.zip (created with the image and the commands above) and

ocrd-calamari-recognize -I OCR-D-003 -O OCR-D-OCR -P checkpoint ".../path/to/gt4histocr/*.ckpt.json"

mikegerber added a commit that referenced this issue Jan 20, 2021
check for empty line image, ht @andbue, fix #48
@mikegerber
Copy link
Collaborator

Alright, bug is fixed by #49, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants