Refactor Reline::Unicode ed_ vi_ em_ methods #720

tompng · 2024-06-09T06:58:37Z

Depend on #759 (split some commits to it)

Refactor Reline::Unicode vi_ ed_ em_ methods

Rewrite them with about -250 lines.
Update questionable test case.

Description of refactor

Using get_prev_mbchar_size and get_next_mbchar_size in a while loop is extreamly inefficient.
Using take_while to an array of grapheme_cluster is simple and fast enough.

Performance

Scenario: paste 2KB text aaaaa...aaaaa to IRB and press OPTION+LEFT.

# before (slow because time complexity was O(n^2))
irb(main):002> Reline::Unicode.em_backward_word('a'*2000,2000)
processing time: 0.265469s
=> [2000, 2000]

# after
irb(main):002> Reline::Unicode.em_backward_word('a'*2000,2000) 
processing time: 0.001357s
=> 2000

lib/reline/unicode.rb

st0012 · 2024-06-17T12:12:56Z

lib/reline/unicode.rb

-    end
-    [byte_size, width, new_str]
+    gcs = line.byteslice(byte_pointer..).grapheme_clusters
+    nonwords = gcs.take_while { |c| c.encode(Encoding::UTF_8).match?(/\P{Word}/) }


Does it make sense to encode gcs all at once before looping through them?
Also, all tests still pass even if I removed these encoding calls. Can we add a few cases for them?

encode will change bytesize, so I have to fix all encode(Encoding::UTF_8).grapheme_clusters in this file to return the correct bytesize in original encoding. I'll also add a test.

ima1zumi

LGTM!

(ruby/reline#720) * Refactor Reline::Unicode vi_ ed_ em_ methods * Make Reline::Unicode's vi_ ed_ em_ method encoding safe ruby/reline@cdd7288978

tompng force-pushed the unicode_edit_func_refactor branch 2 times, most recently from 789d090 to 85f4405 Compare June 10, 2024 13:59

st0012 reviewed Jun 17, 2024

View reviewed changes

lib/reline/unicode.rb Outdated Show resolved Hide resolved

st0012 reviewed Jun 17, 2024

View reviewed changes

tompng marked this pull request as draft June 17, 2024 16:13

tompng marked this pull request as ready for review June 19, 2024 16:08

tompng force-pushed the unicode_edit_func_refactor branch from 2f519c3 to 3365ab0 Compare October 9, 2024 16:17

tompng marked this pull request as draft October 9, 2024 16:18

tompng force-pushed the unicode_edit_func_refactor branch from 3365ab0 to e699b4f Compare November 12, 2024 16:57

tompng added 2 commits November 13, 2024 15:48

Refactor Reline::Unicode vi_ ed_ em_ methods

f14950a

Make Reline::Unicode's vi_ ed_ em_ method encoding safe

2ad27a7

tompng force-pushed the unicode_edit_func_refactor branch from e699b4f to 2ad27a7 Compare November 13, 2024 10:31

tompng changed the title ~~Refactor Reline::Unicode ed_ vi_ em_ split_by_width methods~~ Refactor Reline::Unicode ed_ vi_ em_ methods Nov 13, 2024

tompng marked this pull request as ready for review November 13, 2024 14:13

ima1zumi approved these changes Dec 15, 2024

View reviewed changes

ima1zumi merged commit cdd7288 into ruby:master Dec 15, 2024
44 of 46 checks passed

tompng deleted the unicode_edit_func_refactor branch December 16, 2024 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Reline::Unicode ed_ vi_ em_ methods #720

Refactor Reline::Unicode ed_ vi_ em_ methods #720

tompng commented Jun 9, 2024 •

edited

Loading

st0012 Jun 17, 2024

tompng Jun 17, 2024

ima1zumi left a comment

Refactor Reline::Unicode ed_ vi_ em_ methods #720

Refactor Reline::Unicode ed_ vi_ em_ methods #720

Conversation

tompng commented Jun 9, 2024 • edited Loading

Refactor Reline::Unicode vi_ ed_ em_ methods

Description of refactor

Performance

st0012 Jun 17, 2024

Choose a reason for hiding this comment

tompng Jun 17, 2024

Choose a reason for hiding this comment

ima1zumi left a comment

Choose a reason for hiding this comment

tompng commented Jun 9, 2024 •

edited

Loading