-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ucaps.h
to contain proper case matchings
#90726
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should be generated from UCD (https://www.unicode.org/reports/tr21/tr21-5.html#UnicodeData, for these functions only singe character mappings are useful, TextServer already do the rest via ICU), not from python (which might use different UCD versions).
Why version |
It should be the latest data, not 3.2,0 specifically, but seems like it was incorporated into other parts of the standard. But we need only legacy handling here (without string length changes). |
Different UCD versions when using different Python versions is a problem only if we see it as such - if we want to have the latest case matchings, we can always run the script using the latest Python version; however, if this is a dealbreaker, I could write a script that parses UnicodeData.txt (UCD 16.0.0) (also see: UnicodeData.txt's Property Table, UnicodeData File Format). Edit:
Done, PR is ready for review :) |
5dcbadc
to
1048576
Compare
ucaps.h
to contain proper case matchings (Python parity)ucaps.h
to contain proper case matchings
01bfb2e
to
1048576
Compare
There are also caps tables in |
Will look into that later today, thanks for the heads-up. Edit: |
1048576
to
1048576
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I would apply the same changes proposed in #101878 (review) (header generation) to this script as well.
Sure thing, I'll look into it later today. Edit: Done, though I couldn't place the |
1048576
to
1048576
Compare
1048576
to
1048576
Compare
Thanks! |
perform the exorcismimplement a fix @reduz mentioned in a Tweet a few days ago.ucaps.h
, so that it contains the latest unicode case matchings.