-
-
Notifications
You must be signed in to change notification settings - Fork 22k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorder reverse caps characters table for string lower case conversion #26760
Conversation
Case mapping in Unicode case mapping FAQ: https://unicode.org/faq/casemap_charprop.html |
@bruvzg so you say that we should create a script to generate those mappings from that database? That's a ton of mappings and might decrease performance as well, hmm.
|
@Xrayez I'm sorry, but I think unit test is a little bit incorrect - doesn't your cycle check just 1 letter? |
Probably. Mappings should have many long runs, we could compact each run into a single record. Remap function: Table: |
@breakmt thanks for pointing out, a mistake from copy pasting... And surely the test doesn't pass again! It does work in GDScript, can't understand why it doesn't work in C++, should I use unicode literals like EDIT: no, using Cyrillic unicode literals doesn't work either. |
Ah yes, forgot about different types of string literals in C++:
would be the correct unicode one that handles |
is this the right PR, or the other one? :) |
The binary search algorithm used to lookup character codes in the table relies that the data must be ordered. This fixes `to_lower()` string method to convert upper case to lower case properly, so that the algorithm doesn't terminate prematurely. Co-authored-by: AndreevAndrei (avandrei) <avandrei@MacBookAAV.local>
@reduz @akien-mga oh so my PR has unit test covered for this bug which #27598 doesn't: (also reordered commented out "dotless I" the same way) Either way I added @AndreevAndrei as a co-author, the only problem is that he isn't detected by GitHub (git credentials are not matching) so his contribution will not be registered... If @AndreevAndrei wants his contribution registered, he could add me as a co-author instead with these changes, or fix his git credentials first to match GitHub (email and username). 😃 |
Thanks! |
Cherry-picked for 3.1.1. |
The binary search algorithm used to lookup character codes in the table relies that the data must be ordered. This fixes
to_lower()
string method to convert upper case to lower case properly, so that thealgorithm doesn't terminate prematurely.
Fixes #26744 and potentially more languages which are not Latin.