ENH: Add Jupyter Notebook integration for PdfReader #2375

MartinThoma · 2023-12-28T08:45:45Z

See

Without this PR

With this PR

See * https://ipython.readthedocs.io/en/stable/config/integrating.html#MyObject._repr_mimebundle_ * https://discourse.jupyter.org/t/what-are-include-exclude-parameter-in-repr-mimebundle-for/23125

MartinThoma · 2023-12-28T08:48:47Z

I could not find any documentation regarding the include / exclude parameters, but ChatGPT thinks it should be used like this (which sounds reasonable):

def _repr_mimebundle_(self, include=None, exclude=None):
    data = {
        'text/plain': 'This is a plain text representation.',
        'text/html': '<strong>This is an HTML representation.</strong>',
        'application/json': '{"key": "value"}'
    }

    if include is not None:
        # Filter representations based on include list
        data = {k: v for k, v in data.items() if k in include}

    if exclude is not None:
        # Remove representations based on exclude list
        data = {k: v for k, v in data.items() if k not in exclude}

    return data

MartinThoma · 2023-12-28T08:50:12Z

We could add something similar for PdfWriter and PageObject. Maybe even for annotations (creating a reader and a blank page + adding the annotation + rendering it)

codecov · 2023-12-28T08:51:48Z

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (195d82e) 94.45% compared to head (33a627d) 94.35%.

Files	Patch %	Lines
pypdf/_reader.py	11.11%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2375      +/-   ##
==========================================
- Coverage   94.45%   94.35%   -0.10%     
==========================================
  Files          43       43              
  Lines        7575     7584       +9     
  Branches     1515     1519       +4     
==========================================
+ Hits         7155     7156       +1     
- Misses        257      265       +8     
  Partials      163      163

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stefan6419846 · 2023-12-28T09:08:02Z

Upstream implementation of includes/excludes: https://github.com/ipython/ipython/blob/d0e254420445c2204a2b39d28948cd6127717fb1/IPython/core/formatters.py#L151-L157

MartinThoma · 2023-12-28T09:53:34Z

Thanks! Then the ChatGPT code is perfect 🎉

@shartzog

## What's new pypdf==4.0.0 is a big milestone forward: * We finally have a layout-mode text extraction. This enables users who want to detect / extract tables with heuristics to give it a try. * We deprecated a lot of the old PyPDF2 API that was either not following PEP8 naming styles or was not using a property. Users comming from PyPDF2 might want to switch first to pypdf<4.0.0 to get helpful error messages that show the new API in their speicific cases. A big 'Thank you!' the the whole pypdf community for your work. Thanks to you, pypdf is better than ever. Kudos to @shartzog who added the layout-mode with his first contribution! ### Deprecations (DEP) - Drop Python 3.6 support (#2369) by @MartinThoma - Remove deprecated code (#2367) by @MartinThoma - Remove deprecated XMP properties (#2386) by @stefan6419846 ### New Features (ENH) - Add "layout" mode for text extraction (#2388) by @shartzog - Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma - Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846 ### Bug Fixes (BUG) - PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66 - Add support for GBK2K cmaps (#2385) by @stefan6419846 ### Documentation (DOC) - Add pmiller66 for #2406 as a contributor by @MartinThoma - Add missing expand parameter (#2393) by @Atomnp - Resolve build warnings (#2380) by @stefan6419846 - Fix testing prerequisites (#2381) by @stefan6419846 - Improve formatting of contributors page (#2383) by @stefan6419846 - Add Tobeabellwether as a contributor for #2341 by @MartinThoma ### Developer Experience (DEV) - Make dependabot aware of our PR prefixes (#2415) by @stefan6419846 - Fail on Sphinx issues (#2405) by @stefan6419846 - Move title check to own workflow (#2384) by @MasterOdin - Write to temporary files instead of the working directory (#2379) by @stefan6419846 - Ensure that the PR titles have the correct format (#2378) by @stefan6419846 ### Maintenance (MAINT) - Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma - Return None instead of -1 when page is not attached (#2376) by @MartinThoma - Replace warning with logging.error (#2377) by @MartinThoma ### Testing (TST) - Add missing pytest.mark.samples annotations (#2412) by @kitterma - Correctly close temporary files (#2396) by @stefan6419846 - Fix side effect #2379 (#2395) by @pubpub-zz - Add test for layout extraction mode (#2390) by @MartinThoma ### Code Style (STY) - Use the UserAccessPermissions enum (#2398) by @MartinThoma - Run black (#2370) by @MartinThoma [Full Changelog](3.17.4...4.0.0)

ENH: Add Jupyter Notebook integration

98e3ee6

See * https://ipython.readthedocs.io/en/stable/config/integrating.html#MyObject._repr_mimebundle_ * https://discourse.jupyter.org/t/what-are-include-exclude-parameter-in-repr-mimebundle-for/23125

MartinThoma added the is-feature A feature request label Dec 28, 2023

include/exclude

33a627d

MartinThoma merged commit a91e9f6 into main Dec 28, 2023

MartinThoma deleted the jupyter-notebook-integration branch December 28, 2023 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add Jupyter Notebook integration for PdfReader #2375

ENH: Add Jupyter Notebook integration for PdfReader #2375

MartinThoma commented Dec 28, 2023 •

edited

Loading

MartinThoma commented Dec 28, 2023

MartinThoma commented Dec 28, 2023

codecov bot commented Dec 28, 2023 •

edited

Loading

stefan6419846 commented Dec 28, 2023 •

edited

Loading

MartinThoma commented Dec 28, 2023

ENH: Add Jupyter Notebook integration for PdfReader #2375

ENH: Add Jupyter Notebook integration for PdfReader #2375

Conversation

MartinThoma commented Dec 28, 2023 • edited Loading

Without this PR

With this PR

MartinThoma commented Dec 28, 2023

MartinThoma commented Dec 28, 2023

codecov bot commented Dec 28, 2023 • edited Loading

Codecov Report

stefan6419846 commented Dec 28, 2023 • edited Loading

MartinThoma commented Dec 28, 2023

MartinThoma commented Dec 28, 2023 •

edited

Loading

codecov bot commented Dec 28, 2023 •

edited

Loading

stefan6419846 commented Dec 28, 2023 •

edited

Loading