Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added in wrapping on zero width space to better support word wrapping in languages like Thai #1191

Merged
merged 19 commits into from
Jun 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
8e0ac53
added in modification to linebreak.py and accompanying test to wrap z…
carlhiggs Jun 3, 2024
b0b0183
ran black
carlhiggs Jun 3, 2024
c9209bc
updated documentation to clarify that both ' ' and U+200B count as sp…
carlhiggs Jun 3, 2024
4f95856
updated CHANGELOG.md to resolve conflict following rebasing with mast…
carlhiggs Jun 3, 2024
da0326f
made sure that the zero-width-space test string was not too long and …
carlhiggs Jun 3, 2024
096f8c6
made string concatenation explicit to make pylint happier
carlhiggs Jun 3, 2024
1ecd089
updated wrapping to account for additional space characters, and reve…
carlhiggs Jun 3, 2024
e22148b
updated line_break.py to list BREAKING_SPACE_SYMBOLS as a single list…
carlhiggs Jun 5, 2024
9c3cc31
partial update rebasing main and addressing some (but not yet all) up…
carlhiggs Jun 19, 2024
a75c41e
updated test reference PDFs using direct downloads from FPDF2 master …
carlhiggs Jun 21, 2024
8412b5b
addressed conflicts (updates to linespace break code and associated t…
carlhiggs Jun 3, 2024
4ded107
updated documentation to clarify that both ' ' and U+200B count as sp…
carlhiggs Jun 3, 2024
5ae4847
updated CHANGELOG.md
carlhiggs Jun 3, 2024
7eb4048
made sure that the zero-width-space test string was not too long and …
carlhiggs Jun 3, 2024
73e0e4a
made string concatenation explicit to make pylint happier
carlhiggs Jun 3, 2024
e82090c
updated wrapping to account for additional space characters, and reve…
carlhiggs Jun 3, 2024
29a786f
corrected pdfs
carlhiggs Jun 21, 2024
c2c7b7e
updated pdfs following fresh installation of dependencies as per #1191
carlhiggs Jun 24, 2024
a9a5c7f
corrected changelog and updated line_break.py to wrap on breaking spa…
carlhiggs Jun 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',

## [2.7.10] - Not released yet
### Added
* Wrapping words on spaces now considers all common space symbols in addition to regular spaces (' '), addressing issues with word-wrapping for languages like Thai, as per [#1190](https://github.com/py-pdf/fpdf2/issues/1190) and [#1191](https://github.com/py-pdf/fpdf2/pull/1191).
* [`Templates`](https://py-pdf.github.io/fpdf2/fpdf/Templates.html) can now be also defined in JSON files.
* support to optionally set `wrapmode` in templates (default `"WORD"` can optionally be set to `"CHAR"` to support wrapping on characters for scripts like Chinese or Japanese) - _cf._ [#1159](https://github.com/py-pdf/fpdf2/issues/1159) - thanks to @carlhiggs
* support for quadratic and cubic Bézier curves with [`FPDF.bezier()`](https://py-pdf.github.io/fpdf2/fpdf/Shapes.html#fpdf.fpdf.FPDF.bezier) - thanks to @awmc000
Expand Down
24 changes: 22 additions & 2 deletions fpdf/line_break.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,24 @@
SOFT_HYPHEN = "\u00ad"
HYPHEN = "\u002d"
SPACE = " "
BREAKING_SPACE_SYMBOLS = [
" ",
"\u200b", # | ZERO WIDTH SPACE
"\u2000", # | EN QUAD
"\u2001", # | EM QUAD
"\u2002", # | EN SPACE
"\u2003", # | EM SPACE
"\u2004", # | THREE-PER-EM SPACE
"\u2005", # | FOUR-PER-EM SPACE
"\u2006", # | SIX-PER-EM SPACE
"\u2008", # | PUNCTUATION SPACE
"\u2009", # | THIN SPACE
"\u200A", # | HAIR SPACE
"\u205F", # | MEDIUM MATHEMATICAL SPACE
"\u3000", # | IDEOGRAPHIC SPACE
"\u0009", # | TAB
]
BREAKING_SPACE_SYMBOLS_STR = "".join(BREAKING_SPACE_SYMBOLS)
NBSP = "\u00a0"
NEWLINE = "\n"
FORM_FEED = "\u000c"
Expand Down Expand Up @@ -449,7 +467,7 @@ def add_character(
self.fragments.append(Fragment("", graphics_state, k, url))
active_fragment = self.fragments[-1]

if character == SPACE:
if character in BREAKING_SPACE_SYMBOLS_STR:
self.space_break_hint = SpaceHint(
original_fragment_index,
original_character_index,
Expand Down Expand Up @@ -668,7 +686,9 @@ def get_line(self):
trailing_form_feed=character == FORM_FEED,
)
if current_line.width + character_width > max_width:
if character == SPACE: # must come first, always drop a current space.
if (
character in BREAKING_SPACE_SYMBOLS_STR
): # must come first, always drop a current space.
self.character_index += 1
return current_line.manual_break(self.align)
if self.wrapmode == WrapMode.CHAR:
Expand Down
28 changes: 28 additions & 0 deletions test/fonts/test_wraps_zerowidthspace.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from pathlib import Path

from fpdf import FPDF

from test.conftest import assert_pdf_equal

HERE = Path(__file__).resolve().parent


def test_wraps_zerowidthspace(tmp_path):
pdf = FPDF()
pdf.add_font(fname=HERE / "Waree.ttf")
pdf.set_font("Waree", size=12)
pdf.add_page()
pdf.write(
8,
"Thai (ideally wouldn't wrap after the space after '1000'): "
+ "นโยบาย\u200Bสาธารณะ\u200Bมี\u200Bความ\u200Bสำคัญ\u200Bต่อ\u200B"
+ "การ\u200Bสนับสนุน\u200Bการ\u200Bออก\u200Bแบบ\u200Bและ\u200Bการ"
+ "\u200Bสร้าง\u200Bชุมชน\u200Bและ\u200Bเมือง\u200Bสุขภาพ\u200Bดี\u200B"
+ "และ\u200Bยั่งยืน รายการ\u200Bตรวจ\u200Bสอบนโยบาย\u200Bความ\u200B"
+ "ท้าทาย 1,000 เมือง\u200Bสำหรับ\u200Bใช้\u200Bเพื่อ\u200Bประเมิน\u200Bการ"
+ "\u200Bมี\u200Bอยู่\u200Bและ\u200Bคุณภาพ\u200Bของ\u200Bนโยบาย\u200Bที่"
+ "\u200Bสอด\u200Bคล้อง\u200Bกับ\u200Bหลัก\u200Bฐาน\u200Bและ\u200Bหลัก"
+ "\u200Bการ\u200Bสำหรับ\u200Bเมือง\u200Bที่\u200Bมี\u200Bสุขภาพ\u200Bดี"
+ "\u200Bและ\u200Bยั่งยืน",
)
assert_pdf_equal(pdf, HERE / "thai_wraps_zerowidthspace.pdf", tmp_path)
Binary file added test/fonts/thai_wraps_zerowidthspace.pdf
Binary file not shown.
Loading