Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Reline::Unicode ed_ vi_ em_ methods #720

Merged
merged 2 commits into from
Dec 15, 2024

Conversation

tompng
Copy link
Member

@tompng tompng commented Jun 9, 2024

Depend on #759 (split some commits to it)

Refactor Reline::Unicode vi_ ed_ em_ methods

Rewrite them with about -250 lines.
Update questionable test case.

Description of refactor

Using get_prev_mbchar_size and get_next_mbchar_size in a while loop is extreamly inefficient.
Using take_while to an array of grapheme_cluster is simple and fast enough.

Performance

Scenario: paste 2KB text aaaaa...aaaaa to IRB and press OPTION+LEFT.

# before (slow because time complexity was O(n^2))
irb(main):002> Reline::Unicode.em_backward_word('a'*2000,2000)
processing time: 0.265469s
=> [2000, 2000]

# after
irb(main):002> Reline::Unicode.em_backward_word('a'*2000,2000) 
processing time: 0.001357s
=> 2000

@tompng tompng force-pushed the unicode_edit_func_refactor branch 2 times, most recently from 789d090 to 85f4405 Compare June 10, 2024 13:59
end
[byte_size, width, new_str]
gcs = line.byteslice(byte_pointer..).grapheme_clusters
nonwords = gcs.take_while { |c| c.encode(Encoding::UTF_8).match?(/\P{Word}/) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to encode gcs all at once before looping through them?
Also, all tests still pass even if I removed these encoding calls. Can we add a few cases for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode will change bytesize, so I have to fix all encode(Encoding::UTF_8).grapheme_clusters in this file to return the correct bytesize in original encoding. I'll also add a test.

@tompng tompng marked this pull request as draft June 17, 2024 16:13
@tompng tompng marked this pull request as ready for review June 19, 2024 16:08
@tompng tompng force-pushed the unicode_edit_func_refactor branch from 2f519c3 to 3365ab0 Compare October 9, 2024 16:17
@tompng tompng marked this pull request as draft October 9, 2024 16:18
@tompng tompng force-pushed the unicode_edit_func_refactor branch from 3365ab0 to e699b4f Compare November 12, 2024 16:57
@tompng tompng force-pushed the unicode_edit_func_refactor branch from e699b4f to 2ad27a7 Compare November 13, 2024 10:31
@tompng tompng changed the title Refactor Reline::Unicode ed_ vi_ em_ split_by_width methods Refactor Reline::Unicode ed_ vi_ em_ methods Nov 13, 2024
@tompng tompng marked this pull request as ready for review November 13, 2024 14:13
Copy link
Member

@ima1zumi ima1zumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ima1zumi ima1zumi merged commit cdd7288 into ruby:master Dec 15, 2024
44 of 46 checks passed
matzbot pushed a commit to ruby/ruby that referenced this pull request Dec 15, 2024
(ruby/reline#720)

* Refactor Reline::Unicode vi_ ed_ em_ methods

* Make Reline::Unicode's vi_ ed_ em_ method encoding safe

ruby/reline@cdd7288978
@tompng tompng deleted the unicode_edit_func_refactor branch December 16, 2024 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants