Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement line wrap support #5

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

johnnyshields
Copy link

@johnnyshields johnnyshields commented Feb 23, 2025

Fixes #4

This PR is a hot mess ๐Ÿ”ฅโ˜•๐Ÿถ๐Ÿ”ฅ However it may also be "good enough" to use.

As per #4, the core problem is that, currently the prawn-rtl-support gem first Bidi-reorders RTL text, then Prawn computes line wrapping. This causes the lines to be effectively reversed, i.e. bottom-to-top ordering.

This PR "fixes" this as follows:

  1. The patch for Prawn::Text::Formatted::Box now only performs Arabic connecting. This is a text "preprocessing" step; we can't do reordering here.
  2. Prawn::Text::Formatted::Arranger is the class that handles "arranging" within each line. This is where this PR implements the reordering; the assumption is that splitting text into lines first, then reordering each line with Bidi, always yields the correct result. This has two sub-steps:
    • First, if the line contains any RTL chars, we try to join all the neighboring "text fragments" within the line that have the same style, as this improves Bidi's ability to reorder (Bidi works better with larger chunks). I haven't looked in detail, but at least in my PDFs, I see Prawn often breaking Arabic into "one fragment per word", which is annoying, it may be because I am using Arabic as a fallback font, I'm not sure.
    • After the initial grouping of fragments, we need to reorder the fragments while preserving their styles, To do this, we run the fragments through a new method Prawn::Rtl::Connector.reorder_fragments. This is a magic method that "tags" each fragment with marker chars on either side, runs them through Bidi reorder, and then recomposes the text with an index number of style. For example, given fragments ["foo ", "bar ", "baz"] (supposing foo, etc. are RTL chars) this will yield [[2, "zab"], [1, " rab"], [0, " oof"]].
  3. Prawn::Text::Formatted::Fragment is now (un-)patched to no longer do a naive character reverse if direction == :rtl

Alternative approaches:

  • The line breaking could be implemented in Prawn::Text::Formatted::Box#form_fragments_from_like_font_glyph_pairs, I'm not sure.
  • It may be necessary to introduce a concept of "text style groups" (i.e. text having the same style) above fragments, which would eliminate the need for the marker mess. That way we could apply the Bidi algorithm to the text within a style group before it is fragmented. Need to study more about how Bidi does line breaking.

Limitations:

  1. Line length computation is now probably not-exactly correct for Arabic and other languages with ligatures, because line length will be calculated on the "un-reordered" original text. It should be relatively close however, it may cause slight text over/underflow.
  2. Some aspects of inline formatting, such as multiple fonts on the same line, may be broken.
  3. Prawn::Rtl::Connector.reorder_fragments struggles with whitespace and punctuation, for example "foo: bar" we might expect to become "rab :oof" but it becomes "rab: oof" instead. This could possibly be hacked.
  4. Prawn::Rtl::Connector.reorder_fragments uses magic chars in the Unicode Private User Area. This will break FontAwesome and other icon fonts using chars in this area, if they are combined on the same line as RTL chars.
  5. The Arabic connector has some minor incorrect edge cases. For example, ู…ุฑุญุจู‹ุง becomes ู…ุฑุญุจู€ู‹ุง.
  6. Need to do a lot more testing in the wild.

image

Misc changes (for performance):

  1. Arabic connecting is skipped if the text does not contain basic Arabic chars.
  2. #include_rtl? now uses direct regex on strings rather than doing TwitterCldr::Shared::Bidi object allocations.

@oleksii-leonov
Copy link
Member

Hi @johnnyshields,

Thank you a lot! I will try to reanimate the project and add some basic tests and CI. We are still using prawn-rtl-support in production (it's not abandoned).

Some time ago, I started looking in another direction to use Harfbuzz, a de-facto standard library to generate symbol placement, with ligatures support and all the edge cases already covered.

There are Harbuzz binding to Ruby: https://github.com/jslabovitz/harfbuzz-gem (Oh, I see you already found it jslabovitz/harfbuzz-gem#2 :).

Ideally, we should pass input characters to Harbuzz, get back proper positions and use them in Prawn to place characters. The current approach (when we pick proper character variants manually by the table) was working as a fast fix to make Arabic text readable, but it's very limited, and the result is not pretty.

@johnnyshields
Copy link
Author

johnnyshields commented Feb 24, 2025

@oleksii-leonov yep I was looking at the same thing. Someone already implemented Harfbuzz for an equivalent library in Python called fpdf2:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-line wrapping support
2 participants