-
-
Notifications
You must be signed in to change notification settings - Fork 22k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose String::camelcase_to_underscore
#52141
Expose String::camelcase_to_underscore
#52141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be named with either snake_case
or snakecase
. "underscore" is ambiguous because there is also TO_HTML
which uses underscores but isn't lowercase.
<return type="String" /> | ||
<argument index="0" name="lowercase" type="bool" default="true" /> | ||
<description> | ||
Adds underscores before uppercase characters (except for acronyms and the first character of the string). If [code]lowercase[/code] is [code]true[/code], the string will be converted to lowercase. For [code]ToHtml5 ToHTMLElement toHtml to_html TO_HTML A_B-C.D[/code], it will return [code]to_html_5 _to_html_element to_html to_html _to__html _a__b-_c._d[/code]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
cannot be part of a valid identifier, so I don't think we need to document how -
behaves with this method, just like how we don't document +
or *
or /
behave.
Why ToHtml5
-> to_html_5
but ToHTMLElement
-> _to_html_element
? The latter should be to_html_element
. In other words, the first character of a PascalCase identifier should not be prefixed with an underscore, and the only way to get an underscore prefix should be to start with one like _ToHTML5
-> _to_html_5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
ToHtml5
->to_html_5
butToHTMLElement
->_to_html_element
? The latter should beto_html_element
. In other words, the first character of a PascalCase identifier should not be prefixed with an underscore, and the only way to get an underscore prefix should be to start with one like_ToHTML5
->_to_html_5
.
Because this is how it is implemented.
Details
Lines 976 to 1015 in c5f62fa
String String::camelcase_to_underscore(bool lowercase) const { | |
const char32_t *cstr = get_data(); | |
String new_string; | |
int start_index = 0; | |
for (int i = 1; i < this->size(); i++) { | |
bool is_upper = is_upper_case(cstr[i]); | |
bool is_number = is_digit(cstr[i]); | |
bool are_next_2_lower = false; | |
bool is_next_lower = false; | |
bool is_next_number = false; | |
bool was_precedent_upper = is_upper_case(cstr[i - 1]); | |
bool was_precedent_number = is_digit(cstr[i - 1]); | |
if (i + 2 < this->size()) { | |
are_next_2_lower = is_lower_case(cstr[i + 1]) && is_lower_case(cstr[i + 2]); | |
} | |
if (i + 1 < this->size()) { | |
is_next_lower = is_lower_case(cstr[i + 1]); | |
is_next_number = is_digit(cstr[i + 1]); | |
} | |
const bool cond_a = is_upper && !was_precedent_upper && !was_precedent_number; | |
const bool cond_b = was_precedent_upper && is_upper && are_next_2_lower; | |
const bool cond_c = is_number && !was_precedent_number; | |
const bool can_break_number_letter = is_number && !was_precedent_number && is_next_lower; | |
const bool can_break_letter_number = !is_number && was_precedent_number && (is_next_lower || is_next_number); | |
bool should_split = cond_a || cond_b || cond_c || can_break_number_letter || can_break_letter_number; | |
if (should_split) { | |
new_string += this->substr(start_index, i - start_index) + "_"; | |
start_index = i; | |
} | |
} | |
new_string += this->substr(start_index, this->size() - start_index); | |
return lowercase ? new_string.to_lower() : new_string; | |
} |
I just added the binding, documentation, and ported the code to C#.
Capitalized words, ACRONYMS and numbers are separated by adding _
before the beginning of the character group. Other characters are not considered. Because of this, there is a duplicate underscore, and this is probably a bug.
Perhaps an example with lowercase = false
will help you better understand how this function works.
ToHtml5 ToHTMLElement toHtml to_html TO_HTML A_B-C.D
To_Html_5 _To_HTML_Element to_Html to_html _TO__HTML _A__B-_C._D
to_html_5 _to_html_element to_html to_html _to__html _a__b-_c._d
If you think that this needs to be fixed, then this is a separate question. But I can try to fix it. Is this the option it should be:
ToHtml5 ToHTMLElement toHtml to_html TO_HTML A_B-C.D
To_Html_5 To_HTML_Element to_Html to_html TO_HTML A_B-C.D
to_html_5 to_html_element to_html to_html to_html a_b-c.d
?
-
cannot be part of a valid identifier, so I don't think we need to document how-
behaves with this method, just like how we don't document+
or*
or/
behave.
-
used in filenames, commands, identifiers in some programming and markup languages.
To do this, we have to rename the method in the core. |
Why is this needed in core? Sounds like a niche use case. How often would this be used? |
@neikeq The use cases are listed in the OP. Recently, in a Russian-language chat, a person asked for advice with regular expressions to convert file names in his project. Then I noticed that such a function already exists, but is not available in the user code. This function can be useful for converting identifiers, both in the project code and when creating different tools.
|
Ah, I didn't realize the method was already in the engine, just not exposed to scripts. In that case I would prefer if the C# version used an internal call instead of re-implementing the code. It's easier to maintain this way. |
Some functions use an internal call, some duplicate the code. I didn't know how best to proceed in this case, so I ported the C++ code, as in the case of |
Yeah, this should be an internal call for C#. Otherwise looks good to me. |
If you're adding one conversion method, why not add the others? Then this would close godotengine/godot-proposals#719. |
Generally an internal call has a cost: call overhead and marshaling. If that cost would has a very noticeable impact on the function performance then we re-implement the method in C#, otherwise an internal call is preferred for maintenance purposes. Another thing to keep in mind is the complexity of the method and how likely it is that it will be re-written in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this but I can already see calls to isUpper
and isDigit
in C# code. Those functions should be PascalCase, if they exist.
I suggest replacing it with an internal call though.
b410a7b
to
5f58e28
Compare
I see it like this:
Are there any real use cases for |
5f58e28
to
15dbf2f
Compare
Needs a rebase, otherwise should be fine to merge. |
@dalexeev Can you rebase this? :) |
15dbf2f
to
f330fd4
Compare
Done. But I think we'd better implement this:
|
I think that's a good idea. Should we then close this PR and have someone (you?) work on that proposal (which solves godotengine/godot-proposals#719 I suppose)? |
|
This function can be useful in plugins for converting filenames or in code for converting identifiers between different languages.