Write() refactor to use new line wrapping code #346

gmischler · 2022-03-03T16:10:55Z

Fixes #340

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md
The PR description or comment contains a picture of a cute animal (not mandatory but encouraged 🦒 )

After changing .write() to use the new line wrapping code, it now handles soft hyphens.

Besides .write() itself, changes were necessary two other methods:

._render_styled_cell_text() needed to learn how to set the current position to the end of the rendered text (for .write() really a bit left of the actual end).
When MultiLineBreak.get_line_of_given_width() ends up with a single word taking more space than the whole line, it just splits it apart unceremonously (with not even a hyphen). Since .write() often starts on an already partially populated line where the remaining available space may be very short, this could happen frequently but is not desired in this context. I added an option no_wordsplit=False to prevent that when set to True. If the current line has characters added but no break hint, then it will rewind its internal state and return an empty line, causing .write() to switch to a new line.

I have refactored the purely internal ._render_styled_cell_text() to use newpos_x=X.RIGHT and newpos_y=Y.TOP in place of ln=0. This opens the way for many other options.

	newpos_x: Current position in x after the call.
		X.LEFT    - left end of the cell
		X.RIGHT   - right end of the cell (default)
		X.START   - start of actual text
		X.END     - end of actual text
		X.WCONT   - for write() to continue next (slightly left of X.END
		X.CENTER  - center of actual text
		X.LMARGIN - left page margin (start of printable area)
		X.RMARGIN - right page margin (end of printable area)
	newpos_y: Current position in y after the call.
		Y.TOP     - top of the first line (default)
		Y.LAST    - top of the last line (same as TOP for single-line text)
		Y.NEXT    - top of next line (bottom of current text)
		Y.TMARGIN - top page margin (start of printable area)
		Y.BMARGIN - bottom page margin (end of printable area)

Not all of those have an immediately obvious use case, but they're essentially free and someone might find them practical for their purposes.
This change demonstrates that the concept works very nicely and intuitively. The next step will be to add them to the API of the public methods and deprecating ln=# (along with center=) in a follow-up PR.

There used to be no tests whatsoever for .write() itself, it only got tested incidentally because it is used by html.py. I've written some tests to verify the basic functionality, as well as that it handles page breaks and soft hyphens correctly.
At that opportunity, I've renamed the directory test/cells to test/text and collected all text related tests in there.
Additionally I created tests to verify that '._render_styled_cell_text()' determines the new positions correctly, since not all of them are currently exposed to the public API.

To make the code more flexible for future additions, I had to move the logic that sets the word spacing for justified text from .multi_cell() to ._render_styled_cell_text(). While doing so, I managed to optimize the PDF file size a bit, by only emitting "Tw" commands when actually necessary, and shortening the explicit word positioning values for unicode fonts to three decimals (analog to "Tw").
I also noticed that ._perform_page_break() caused unnecessary "Tw" entries in the PDF. It makes sense to reset "Tw" to the default 0 at the end of each page so that each page starts with a clean slate (and other software can extract individual pages without causing trouble). But recreating a non-0 value at the beginning of the new page is pointless, since typically each line will use a different value anyway. I removed that part, resulting in cleaner output.

The following PDFs needed to be changed to pass existing tests:

text/test_multi_cell_justified_with_unicode_font.pdf - Eliminated unnecessary "Tw" entries and shortened word positioning values
text/multi_cell_markdown_with_ttf_fonts.pdf - Shortened word positioning values
html/html_headings_line_height.pdf - Visible change! The new algorithm managed to fit one more word onto a line in the "P" paragraph
outline/2_pages_outline.pdf - Fewer unnecessary "Tw" entries
outline/html_toc.pdf - Minimal change in invisible whitespace (write_html() produces a lot of spurious whitespace, which should probably get fixed...).

*Values of csv files are converted by position, instead of content * Updated tests to check for regression * Updated documentation and tests to include multiline text.

restrict decimal seperator replacement to float fields

… fields.

gmischler · 2022-03-03T16:17:17Z

=========================== short test summary info ============================
FAILED test/end_to_end_legacy/charmap/test_charmap.py::test_first_999_chars[DejaVuSans.ttf]
FAILED test/end_to_end_legacy/charmap/test_charmap.py::test_first_999_chars[DroidSansFallback.ttf]
FAILED test/end_to_end_legacy/charmap/test_charmap.py::test_first_999_chars[Roboto-Regular.ttf]
FAILED test/end_to_end_legacy/charmap/test_charmap.py::test_first_999_chars[cmss12.ttf]
================= 4 failed, 846 passed, 11 warnings in 45.36s ==================

I had wondered why those tests failed on my system, and assumed it was because I have different versions of these fonts installed.
Now they fail here as well.
What's going on there?

gmischler · 2022-03-03T22:15:11Z

What's going on there?

Heh, now I had a closer look...

Those tests use .write() to list all characters in the respective fonts, including \u00ad, aka. soft-hyphen.

...which now gets eaten and never appears in the file. And to make things worse, eliminating that character causes all following characters to be indexed differently, which means the files are not different by just one character, but actually in most of the lines.

What is the best solution here?
a) Do we simply replace the files? Apparently PDF files show a soft-hyphen in the data as an actual hyphen, so there is also a visible difference.
b) Do we need a print_sh parameter to .write() and .multi_cell(), which just treats soft hyphens as printable characters (default False)? I guess that would be the cleaner solution.

Edit: I went with the second option. It was easy enough to implement.

…nto write_refactor

codecov · 2022-03-03T23:00:06Z

Codecov Report

Merging #346 (3a72364) into master (ef745ca) will increase coverage by 0.26%.
The diff coverage is 95.26%.

@@            Coverage Diff             @@
##           master     #346      +/-   ##
==========================================
+ Coverage   90.58%   90.85%   +0.26%     
==========================================
  Files          20       20              
  Lines        5885     5936      +51     
  Branches     1182     1199      +17     
==========================================
+ Hits         5331     5393      +62     
+ Misses        331      320      -11     
  Partials      223      223

Impacted Files	Coverage Δ
fpdf/__init__.py	`100.00% <ø> (ø)`
fpdf/fpdf.py	`86.60% <93.16%> (+0.72%)`	⬆️
fpdf/line_break.py	`99.25% <100.00%> (+0.23%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef745ca...3a72364. Read the comment docs.

Lucas-C · 2022-03-04T20:49:46Z

I'm reviewieng the code right now.
The change in test/html/html_headings_line_height.pdf seems a bit suspicious to me...
The text seemed better justified before. Now the "P" paragraph seems to bleed a little on the right.
Do you agree?

Lucas-C · 2022-03-04T20:24:00Z

fpdf/__init__.py

@@ -39,6 +41,8 @@
    "__license__",
    # Classes
    "FPDF",
+    "X",


This is really short for a class name...
It's not good if you read from fpdf import X, Y and you really don't know what are those X & Y objects!
I suggest to rename those enums XAlign & YAlign.
What do you think @gmischler?

I see your point. They're not really alignments, though...
XPos & YPos ?

OK, I'm fine with that!

Lucas-C · 2022-03-04T20:25:10Z

fpdf/fpdf.py

        )

    def _render_styled_cell_text(
        self,
+        text_line,


The parameter class (TextLine) could be added as annotation here, to improve readability

What do you mean by "annotation"?
It is mentioned in the docstring.

I meant type hints, like this:

def _render_styled_cell_text( self, text_line : TextLine,

Ah, I've never done that before...
I'll figure it out!

Lucas-C · 2022-03-04T20:26:27Z

fpdf/line_break.py

@@ -75,6 +76,8 @@ def __init__(self):
        #     SpaceHint is used fo this purpose.
        # 3 - position of last inserted soft-hyphen
        #     HyphenHint is used fo this purpose.
+        #     If print_sh=True, soft-hyphen is treated as


Shouldn't this big comment be converted to a docstring,
so that it gets rendered in the API docs?
https://pyfpdf.github.io/fpdf2/fpdf/

I don't think this class is part of the public API, and most of the comment is about its internals.
But the part I added about the print_sh parameter should indeed go into a docstring.

Lucas-C · 2022-03-04T20:27:30Z

fpdf/line_break.py

            # HYPHEN is inserted instead of SOFT_HYPHEN
            character = HYPHEN
        return self.size_by_style(character, style)

-    def get_line_of_given_width(self, maximum_width):
+    # pylint: disable=too-many-return-statements
+    def get_line_of_given_width(self, maximum_width, no_wordsplit=False):


Parameters with "no" in their names should be avoided, regarding code readability...
Could we replace this by wordsplit=True maybe?

I named it this way because I like it when the default for an unusual condition is False... 😉
But yeah, wordsplit=True is probably easier on the eyes.

Lucas-C · 2022-03-04T20:28:40Z

fpdf/fpdf.py

        # Font styles preloading must be performed before any call to FPDF.get_string_width:
        txt = self.normalize_text(txt)
        styled_txt_frags = self._preload_font_styles(txt, markdown)
        return self._render_styled_cell_text(
+            TextLine(styled_txt_frags, 0.0, 0, False),


The text_width, number_of_spaces_between_words & justify parameters could be passed by name here, to improve readability.

Lucas-C · 2022-03-04T20:41:15Z

fpdf/fpdf.py

@@ -2446,6 +2530,8 @@ def multi_cell(
            max_line_height (int): optional maximum height of each sub-cell generated
            markdown (bool): enable minimal markdown-like markup to render part
                of text as bold / italics / underlined. Default to False.
+            print_sh (bool): Treat a soft-hyphen (\\u00ad) as a normal printable
+                                character, instead of a line breaking opportunity. Default value: False


Just a detail: could you align the start of this costring line to match the "of text as bold" line above please?

Vim getting confused about something... will do.

test/text/test_render_styled.py

test/text/test_write.py

Lucas-C · 2022-03-04T20:54:36Z

fpdf/fpdf.py

-                Possible values are: `L` or empty string: left align (default value) ;
-                `C`: center ; `R`: right align
+            newpos_x (Enum X): New current position in x after the call.
+                X.LEFT    - left end of the cell


Shouldn't those details about all possible values be moved into docstrings in the enums definitions?

Since the Enums are meant for (eventual) public consumption...
Yeah, sounds like a good idea.

Lucas-C · 2022-03-04T20:56:32Z

fpdf/fpdf.py

-            # adjustment before each space
-            if self.ws and self.unifontsubset:
+
+            word_spacing = 0


What is the relation of this variable with self.ws?
Could a short comment be added here please, explaning things a little?

gmischler · 2022-03-04T21:17:40Z

Do you agree?

Hard to tell by eye. I was hoping that the new algorithm is just particularly efficient... 😉

On closer inspection, it does seem that the right margin is about one mm (~= c_margin) narrower than the left there. If I understand the workings of write_html() correctly, the the whole paragraph gets fed to .write() in one piece. In that case, and starting at ' l_margin' , it should behave exactly the same as .multi_cell(). I'll have to check if that produces the same effect with this font size, or what the difference might be.

Edit:
Ok, found the problem. I need to subtract c_margin from the printable width at least one more time than I actually did..
This is actually a quite confusing topic, because c_margin gets handled in many different places throughout the pipeline. I'll see if I can simplify this a bit. The user facing methods really shouldn't have to deal with it.

fpdf/fpdf.py

Lucas-C

Looks good to me!
Great work there @gmischler, thank you.
Ping me when you'd like this PR to be merged 😊

Lucas-C · 2022-03-05T09:41:40Z

Also, I have just merged #347 so you will have to rebase

gmischler · 2022-03-05T10:46:58Z

Also, I have just merged #347 so you will have to rebase

I've noticed... 😉

gmischler · 2022-03-05T13:41:32Z

Ok, I think this is ready to be merged now.

I changed newpos_[xy] to new[xy] for convenience, which should be equally understandable.
Once I had figured out annotations (I hope...), I also added them to other methods that I had already touched anyway.
I also fixed all the many instances where the docstrings incorrectly indicated argument types as "int" (often probably used as a placeholder for "number").

And after black had annoyed me about it all that time, I finally gave in and let it fix the (entirely unrelated) spacing on two lines of drawing.py.

There's an error with black in the 3.10 tests, but unfortunately it doesn't tell us what it thinks is wrong. Black doesn't complain here locally, so what's this about?

Lucas-C · 2022-03-06T19:32:24Z

I made some tests with black and Python 3.10 on the code from your write_refactor branch:

# python --version
Python 3.10.2

# pip install --upgrade black
...
# black --version
black, 22.1.0 (compiled: yes)

# black fpdf/drawing.py
reformatted fpdf/drawing.py

All done! ✨ 🍰 ✨
1 file reformatted.

# git diff
diff --git a/fpdf/drawing.py b/fpdf/drawing.py
index b3776be..738df9f 100644
--- a/fpdf/drawing.py
+++ b/fpdf/drawing.py
@@ -512,7 +512,7 @@ class Point(NamedTuple):
             The scalar result of the distance computation.
         """

-        return (self.x ** 2 + self.y ** 2) ** 0.5
+        return (self.x**2 + self.y**2) ** 0.5

     @force_document
     def __add__(self, other):
@@ -2660,7 +2660,7 @@ class Arc(NamedTuple):
         lam_da = (prime.x / radii.x) ** 2 + (prime.y / radii.y) ** 2

         if lam_da > 1:
-            radii = Point(x=(lam_da ** 0.5) * radii.x, y=(lam_da ** 0.5) * radii.y)
+            radii = Point(x=(lam_da**0.5) * radii.x, y=(lam_da**0.5) * radii.y)

         sign = (self.large != self.sweep) - (self.large == self.sweep)
         rxry2 = (radii.x * radii.y) ** 2

Lucas-C · 2022-03-06T19:33:42Z

I think you should leave fpdf/drawing.py as it was and/or maybe try to upgrade your local version of black?

gmischler · 2022-03-06T19:53:00Z

I think you should leave fpdf/drawing.py as it was and/or maybe try to upgrade your local version of black?

Weird.
black 22.1.0 just reverted the changes that black 21.9b0 had insisted on...
Wasn't the purpose of such a tool to bring more consistency?

Oh, and "ping", @Lucas-C 🔔

gmischler added 30 commits September 18, 2021 16:33

Fix parsing of csv template files

71bd53d

*Values of csv files are converted by position, instead of content * Updated tests to check for regression * Updated documentation and tests to include multiline text.

fixes suggested by static code check

d70bc37

Update template.py

6ed9686

restrict decimal seperator replacement to float fields

now it's dark.

fa62a8d

do some hardcoded template tests without multiline

f1d7802

first round Splitting Template() into FlexTemplate()

ec69b8f

offset and rotate for render(), first test

6771592

small fixes and cleanup

536e819

removing mistaken checkin

42e0d27

test for multipage Template(); Template.code39 with standard template…

5b1d889

… fields.

refer defaults to type handlers, x2 optional for barcodes

92c9e28

more template and flextemplate tests

0195db4

Merge remote-tracking branch 'upstream/master'

bb97a63

static check fixes

e5ab09c

more pylint

fdb03de

blackity-black

56f639e

even blacker

bef03d1

Expand docstrings, update help, hide private methods.

cad0264

Issues from PR review

0f99984

Merge remote-tracking branch 'upstream/master'

1af4365

Issue #226 solved: Rotate anything anywhere

bba1ca1

Issue #238 solved - split_multicell doesn't modify target document

b89147a

Documentation details and corrections

13e9739

breaking up long line

058f7ea

rotation fix slightly changed barcode output

5807548

Update CHANGELOG.md

557148a

Include _write() in template rotation test

ddba2ce

FlexTemplate.render() with scaling

4847792

empty text field - consistency between T and W

6f2c98f

Enforce user input types as early as possible

2b2d82e

gmischler added 3 commits March 2, 2022 23:26

Move word spacing code to _render_styled_cell_text()

427b61b

Merge branch 'PyFPDF:master' into master

27b85f4

Merge branch 'PyFPDF:master' into write_refactor

fb7dfac

gmischler requested a review from Lucas-C as a code owner March 3, 2022 16:10

gmischler added 2 commits March 3, 2022 23:56

print_sh option for write() and multi_cell()

c7d911b

Merge branch 'write_refactor' of https://github.com/gmischler/fpdf2 i…

2774b24

…nto write_refactor

tabs to spaces

ff3b19a

Lucas-C reviewed Mar 4, 2022

View reviewed changes

Apply PR review

5884cc8

Lucas-C reviewed Mar 5, 2022

View reviewed changes

fpdf/fpdf.py Show resolved Hide resolved

Lucas-C approved these changes Mar 5, 2022

View reviewed changes

gmischler added 4 commits March 5, 2022 11:48

Merge branch 'PyFPDF:master' into master

c99ac7d

Merge branch 'master' into write_refactor

155f758

merge origin updates

baee4b7

newpos_[xy] to new[xy], annotations, docstring fixes

ce2fc7c

revert drawing.py, after black 22.1 made up its mind

3a72364

Lucas-C approved these changes Mar 7, 2022

View reviewed changes

Lucas-C merged commit a9ccc9f into py-pdf:master Mar 7, 2022

gmischler deleted the write_refactor branch March 7, 2022 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write() refactor to use new line wrapping code #346

Write() refactor to use new line wrapping code #346

gmischler commented Mar 3, 2022 •

edited

Loading

gmischler commented Mar 3, 2022

gmischler commented Mar 3, 2022 •

edited

Loading

codecov bot commented Mar 3, 2022 •

edited

Loading

Lucas-C commented Mar 4, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022

Lucas-C Mar 5, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022 •

edited

Loading

Lucas-C Mar 5, 2022

gmischler Mar 5, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022

Lucas-C Mar 4, 2022

gmischler Mar 4, 2022

gmischler commented Mar 4, 2022 •

edited

Loading

Lucas-C left a comment

Lucas-C commented Mar 5, 2022

gmischler commented Mar 5, 2022

gmischler commented Mar 5, 2022 •

edited

Loading

Lucas-C commented Mar 6, 2022 •

edited

Loading

Lucas-C commented Mar 6, 2022

gmischler commented Mar 6, 2022 •

edited

Loading

Write() refactor to use new line wrapping code #346

Write() refactor to use new line wrapping code #346

Conversation

gmischler commented Mar 3, 2022 • edited Loading

gmischler commented Mar 3, 2022

gmischler commented Mar 3, 2022 • edited Loading

codecov bot commented Mar 3, 2022 • edited Loading

Codecov Report

Lucas-C commented Mar 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmischler Mar 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmischler commented Mar 4, 2022 • edited Loading

Lucas-C left a comment

Choose a reason for hiding this comment

Lucas-C commented Mar 5, 2022

gmischler commented Mar 5, 2022

gmischler commented Mar 5, 2022 • edited Loading

Lucas-C commented Mar 6, 2022 • edited Loading

Lucas-C commented Mar 6, 2022

gmischler commented Mar 6, 2022 • edited Loading

gmischler commented Mar 3, 2022 •

edited

Loading

gmischler commented Mar 3, 2022 •

edited

Loading

codecov bot commented Mar 3, 2022 •

edited

Loading

gmischler Mar 4, 2022 •

edited

Loading

gmischler commented Mar 4, 2022 •

edited

Loading

gmischler commented Mar 5, 2022 •

edited

Loading

Lucas-C commented Mar 6, 2022 •

edited

Loading

gmischler commented Mar 6, 2022 •

edited

Loading