Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion fails when adding metadata after cloning document root #3036

Closed
juwu-odoo opened this issue Jan 9, 2025 · 8 comments · Fixed by #3040
Closed

Assertion fails when adding metadata after cloning document root #3036

juwu-odoo opened this issue Jan 9, 2025 · 8 comments · Fixed by #3040
Labels
is-regression Regression introduced as a side-effect of another change PdfWriter The PdfWriter component is affected

Comments

@juwu-odoo
Copy link

What Happened?

Hello, we are using pypdf at Odoo to generate pdf invoices, but we've been encountering an error when using the latest versions when attempting to run add_metadata() after clone_reader_document_root() which seems to be a bug.

Please see a reference discussion here: odoo/odoo#185673

Environment

$ python -m platform
Linux-6.8.0-51-generic-x86_64-with-glibc2.35

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.1.0, crypt_provider=('cryptography', '3.4.8'), PIL=9.0.1

Happens with pypdf==5.0.0 as well.

Code + PDF

import pypdf

r = pypdf.PdfReader('INV_2025_00010.pdf')
w = pypdf.PdfWriter()
w.clone_reader_document_root(r)
w.add_metadata({"/foo": "bar"})

PDF file: INV_2025_00010.pdf

Traceback

Traceback (most recent call last):
  File "/home/odoo/git/pypdf-test/test.py", line 6, in <module>
    w.add_metadata({"/foo": "bar"})
  File "/home/odoo/work/multiverse/venvs/17.0/lib/python3.10/site-packages/pypdf/_writer.py", line 1620, in add_metadata
    assert isinstance(self._info, DictionaryObject)
AssertionError

Notes

pypdf 5.0.0 introduced a line in clone_reader_document_root which deletes _info_obj from the writer:
a790532#diff-a0fbeffe546f1499bac6a87052eb32cf536be2731684405ef6bb0266a6a4da5dR1202

An assertion then fails when later adding metadata, causing the traceback since self._info = None:

assert isinstance(self._info, DictionaryObject)

@xmo-odoo and I's best guess is that add_metadata() should have been updated to handle a possible self._info = None (and/or initialize it) with the change introduced in 5.0.0.

@stefan6419846 stefan6419846 added PdfWriter The PdfWriter component is affected is-regression Regression introduced as a side-effect of another change labels Jan 9, 2025
@stefan6419846
Copy link
Collaborator

Thanks for the report. Is there any reason for using this specific code in your case instead of using PdfWriter(clone_from=r) or similar?

@juwu-odoo
Copy link
Author

We're currently using clone_reader_document_root() so that we can perform some post-processing steps afterwards.
It seems that clone_from= calls clone_document_from_reader()that also copies the metadata, which I believe we are not using.

@pubpub-zz
Copy link
Collaborator

@juwu-odoo
can you test with @stefan6419846 proposal ? there might be some hidden initialization that could prevent the issue

@juwu-odoo
Copy link
Author

Yes, both PdfWriter(clone_from=r) and clone_document_from_reader() works in the example case since it reinitializes self._info_obj from reader._info:

self._info_obj = cast(

Though I'll have to test our specific usecase more to make sure the extra metadata doesn't interfere.

Are you saying this is intended behavior and we should use this function instead?

@xmo-odoo
Copy link
Contributor

Thanks for the report. Is there any reason for using this specific code in your case instead of using PdfWriter(clone_from=r) or similar?

clone_document_from_reader might be an option. The alternate constructor is not, as the project is still compatible with pypdf2 1.26 and 2.12 (because that's what's package in ubuntu 22.04, 24.04, and debian bookworm).

@stefan6419846
Copy link
Collaborator

While there might be older versions in the OS repositories and you want to support them, we offer no support for them anymore. I personally would never use these old versions in production as in the meantime there have been tons of fixes and enhancements.

As for the current regression, this is not intended, although I am not sure whether clone_reader_document_root should indeed be public API. Nevertheless, you are of course always invited to propose a proper fix through a PR with a corresponding test.

@xmo-odoo
Copy link
Contributor

xmo-odoo commented Jan 10, 2025

While there might be older versions in the OS repositories and you want to support them, we offer no support for them anymore.

Yes, we know and don't expect such, I'm just explaining why PdfWriter(clone_from=r) is not an option for us.

As for the current regression, this is not intended, although I am not sure whether clone_reader_document_root should indeed be public API. Nevertheless, you are of course always invited to propose a proper fix through a PR with a corresponding test.

Thanks. Would having add_metadata check the state of _info and initialise it (essentially what metadata also does) work? This branch could also be removed from metadata

One point of note, it might be a good idea to review uses of assert isinstance(...) in the codebase, as apparently for mypy it's essentially the same thing as a typing.cast with a runtime check. So in this case it hid that add_metadata is operating on an Optional without handling the None case.

@stefan6419846
Copy link
Collaborator

The add_metadata fix sounds suitable. As for the assertions, you are of course invited to review and update such things as well - there has not been a strict need for dealing differently with it in the past.

xmo-odoo added a commit to xmo-odoo/pypdf that referenced this issue Jan 10, 2025
The `assert isinstance` hid the information that `_info` is optional,
and `add_metadata` may try to add metadata to an `_info` which is
completely missing.

Properly initialise `_info` if it's not set.

Fixes py-pdf#3036
xmo-odoo added a commit to xmo-odoo/pypdf that referenced this issue Jan 10, 2025
The `assert isinstance` hid the information that `_info` is
`Optional`, in which case `add_metadata` would raise an assertion
error.

Properly initialise `_info` if it's not set.

Fixes py-pdf#3036
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-regression Regression introduced as a side-effect of another change PdfWriter The PdfWriter component is affected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants