BUG: Fix test_watermarking_reportlab_rendering() by Lucas-C · Pull Request #2203 · py-pdf/pypdf

Lucas-C · 2023-09-19T19:47:31Z

This fixes the issue spotted in #2191

The solution was to re-introduce calls to PageObject._push_pop_gs(),
in PageObject._merge_page & PageObject._merge_page_writer(),
but to optimize PageObject._push_pop_gs() by introducing a ContentsStream.isolate_graphics_state() method.

Lucas-C · 2023-09-19T19:50:08Z

I wonder if we could use this opportunity to consider that use_original=True is always passed to PageObject._push_pop_gs(), and hence never re-recreate a ContentStream object, which could both simplify the code and improve the performances...

Edit: I did that in the 2nd commit of this PR, and it seems to work fine.

tests/test_writer.py

pubpub-zz · 2023-09-19T20:42:09Z

I wonder if we could use this opportunity to consider that use_original=True is always passed to PageObject._push_pop_gs(), and hence never re-recreate a ContentStream object, which could both simplify the code and improve the performances...

What about to do the merge just work on the decode byte streams? to conversion to operations takes some times and we just need something like:
Newcontent.decodedstream.set_data( b"q\n" + page1.content.get_data() +b"Q\nq\n" + page2.rotation( if required) + page2.content.get_data() +b"Q\n")

no ?

codecov · 2023-09-19T20:47:34Z

Codecov Report

Patch coverage: 92.30% and project coverage change: +0.01% 🎉

Comparison is base (34c6875) 94.37% compared to head (bef49af) 94.38%.
Report is 4 commits behind head on main.

❗ Current head bef49af differs from pull request most recent head 22fb6c5. Consider uploading reports for the commit 22fb6c5 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2203      +/-   ##
==========================================
+ Coverage   94.37%   94.38%   +0.01%     
==========================================
  Files          43       43              
  Lines        7588     7588              
  Branches     1496     1497       +1     
==========================================
+ Hits         7161     7162       +1     
+ Misses        263      262       -1     
  Partials      164      164

Files Changed	Coverage Δ
pypdf/generic/_data_structures.py	`91.88% <83.33%> (-0.07%)`	⬇️
pypdf/_page.py	`94.35% <100.00%> (+0.18%)`	⬆️
pypdf/_utils.py	`98.58% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Lucas-C · 2023-09-20T05:34:30Z

What about to do the merge just work on the decode byte streams? to conversion to operations takes some times and we just need something like:

Sorry, I don't quite understand your suggestion 😅
Where are you suggesting to perform that change exactly?
What is Newcontent.decodedstream in the line of code you mentioned?
Overall, isn't what you suggest already part of this PR?

pubpub-zz · 2023-09-20T17:10:31Z

Sorry, I don't quite understand your suggestion 😅

It is quite a draft. I will make a test and prepare a PR if performance is good 😉

Lucas-C · 2023-09-20T19:30:51Z

It is quite a draft. I will make a test and prepare a PR if performance is good 😉

Alright! 🙂

In the meantime, would you or @MartinThoma be OK to merge this PR?

pubpub-zz · 2023-09-20T20:44:58Z

for me, this PR is completely valid and what I have in mind will not modify this PR.

tests/test_writer.py

stefan6419846 · 2023-09-21T19:04:13Z

In PyMuPDF, double-wrapping will be prevented: https://github.com/pymupdf/PyMuPDF/blob/1c2f1da3eb2541f1c8dbd50acb3a916939c99d3e/src/__init__.py#L9141-L9146 (the is_wrapped check uses additionally some caching mechanism to increase performance, see https://github.com/pymupdf/PyMuPDF/blob/1c2f1da3eb2541f1c8dbd50acb3a916939c99d3e/src/__init__.py#L8864-L8876 as well).

Would this be a possible enhancement here as well? Otherwise, running pypdf on the same PDF page multiple times might generate lots of "useless" wrappers.

MartinThoma · 2023-09-24T09:36:43Z

Thank you for taking care of this @Lucas-C 🙏 I love that you're so actively contributing now 🎉

To me the PR looks good. @stefan6419846 left two interesting comments. I will take care of the docstring part, but the double-wrapping might be something interesting for another PR.

Lucas-C · 2023-09-24T09:56:08Z

To me the PR looks good. @stefan6419846 left two interesting comments. I will take care of the docstring part, but the double-wrapping might be something interesting for another PR.

I agree! 🙂

If I understand correctly, your suggestion @stefan6419846 would be to add a .is_wrapped attribute to ContentStream, in order to avoid re-wrapping in q ... Q it if it's already done?
I wonder if that can really happen often in practice: have you witnessed it?
Could you maybe provide a minimal test case reproducing this?

stefan6419846 · 2023-09-24T10:44:00Z

If I understand correctly, your suggestion @stefan6419846 would be to add a .is_wrapped attribute to ContentStream, in order to avoid re-wrapping in q ... Q it if it's already done?

Yes, something like this.

I wonder if that can really happen often in practice: have you witnessed it?
Could you maybe provide a minimal test case reproducing this?

It generally is good practice to already have an isolated graphics state for each page of the PDF file, so a "clean" PDF file would already have wrapped pages. pypdf will currently add another wrapping layer each time.

An easy example to reproduce this is by adding two different watermarks/overlays/backgrounds for example:

from pypdf import PdfReader, PdfWriter


watermark1 = PdfReader('watermark1.pdf').pages[0]
watermark2 = PdfReader('watermark2.pdf').pages[0]

writer = PdfWriter(clone_from='file.pdf')
for page in writer.pages:
    page.merge_page(watermark1)
    page.merge_page(watermark2)

Suppose we start with self._data being q\n 841.680[...]0 Do\nQ\n\n \n. Running the above example leads to self._data being q\nq\nq\n 841[...]ET\nQ\nQ\n\nQ\n, thus we have isolated the graphics state three times, although once should be enough.

pubpub-zz · 2023-09-24T16:20:25Z

@MartinThoma I've found a "regression" in the test the skipping decorator if ghostscript is not present is missing
fixed in reference PR

closes py-pdf#1910 address regression from py-pdf#2203

…2213) See #1910 address regression from #2203

@pubpub-zz

## What's new ### Bug Fixes (BUG) - PDF size increases because of too high float writing precision (#2213) by @pubpub-zz - Fix test_watermarking_reportlab_rendering() (#2203) by @LucasCimon ### Documentation (DOC) - Fix typos and add a paragraph to ViewerPreferences docs (#2199) by @marcstober - How to install pypi from any branch (#2209) by @pubpub-zz - Update copyright footer in docs (#2207) by @marcstober ### Developer Experience (DEV) - Let dependabot update Github Actions by @MartinThoma ### Maintenance (MAINT) - Update .pre-commit-config.yaml by @MartinThoma [Full Changelog](3.16.1...3.16.2)

Lucas-C commented Sep 19, 2023

View reviewed changes

tests/test_writer.py Show resolved Hide resolved

Lucas-C mentioned this pull request Sep 19, 2023

TST: Issue with merging pdfkit #2191

Merged

Lucas-C force-pushed the fix-test_watermarking_reportlab_rendering branch from f8e7fad to c256b74 Compare September 19, 2023 19:52

Fix test_watermarking_reportlab_rendering()

a8ba231

Lucas-C force-pushed the fix-test_watermarking_reportlab_rendering branch from c256b74 to a8ba231 Compare September 19, 2023 20:08

Getting rid of PageObject._push_pop_gs()

bef49af

Lucas-C requested a review from MartinThoma September 20, 2023 05:35

stefan6419846 reviewed Sep 21, 2023

View reviewed changes

tests/test_writer.py Outdated Show resolved Hide resolved

Adjust test docstring

22fb6c5

MartinThoma changed the title ~~Fix test_watermarking_reportlab_rendering()~~ BUG: Fix test_watermarking_reportlab_rendering() Sep 24, 2023

MartinThoma merged commit 91b6dcd into main Sep 24, 2023

MartinThoma deleted the fix-test_watermarking_reportlab_rendering branch September 24, 2023 09:39

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this pull request Sep 24, 2023

BUG : pdf size increases because of float writing precision

d076f76

closes py-pdf#1910 address regression from py-pdf#2203

pubpub-zz mentioned this pull request Sep 24, 2023

BUG: PDF size increases because of too high float writing precision #2213

Merged

MartinThoma pushed a commit that referenced this pull request Sep 24, 2023

BUG: PDF size increases because of too high float writing precision (#…

e3f60c1

…2213) See #1910 address regression from #2203

stefan6419846 mentioned this pull request Sep 26, 2023

Each page merge will add another graphics state isolation call #2219

Closed

MartinThoma mentioned this pull request Sep 27, 2023

Stamp is scaled #2221

Closed

pubpub-zz mentioned this pull request Sep 28, 2023

ENH: Merge improvement #2226

Draft

Conversation

Lucas-C commented Sep 19, 2023

Uh oh!

Lucas-C commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pubpub-zz commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Lucas-C commented Sep 20, 2023

Uh oh!

pubpub-zz commented Sep 20, 2023

Uh oh!

Lucas-C commented Sep 20, 2023

Uh oh!

pubpub-zz commented Sep 20, 2023

Uh oh!

Uh oh!

stefan6419846 commented Sep 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MartinThoma commented Sep 24, 2023

Uh oh!

Lucas-C commented Sep 24, 2023

Uh oh!

stefan6419846 commented Sep 24, 2023

Uh oh!

pubpub-zz commented Sep 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Lucas-C commented Sep 19, 2023 •

edited

Loading

pubpub-zz commented Sep 19, 2023 •

edited

Loading

codecov bot commented Sep 19, 2023 •

edited

Loading

stefan6419846 commented Sep 21, 2023 •

edited

Loading