Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose String::camelcase_to_underscore #52141

Closed

Conversation

dalexeev
Copy link
Member

@dalexeev dalexeev commented Aug 26, 2021

This function can be useful in plugins for converting filenames or in code for converting identifiers between different languages.

@dalexeev dalexeev requested review from a team as code owners August 26, 2021 23:02
@Calinou Calinou added this to the 4.0 milestone Aug 26, 2021
Copy link
Member

@aaronfranke aaronfranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be named with either snake_case or snakecase. "underscore" is ambiguous because there is also TO_HTML which uses underscores but isn't lowercase.

<return type="String" />
<argument index="0" name="lowercase" type="bool" default="true" />
<description>
Adds underscores before uppercase characters (except for acronyms and the first character of the string). If [code]lowercase[/code] is [code]true[/code], the string will be converted to lowercase. For [code]ToHtml5 ToHTMLElement toHtml to_html TO_HTML A_B-C.D[/code], it will return [code]to_html_5 _to_html_element to_html to_html _to__html _a__b-_c._d[/code].
Copy link
Member

@aaronfranke aaronfranke Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- cannot be part of a valid identifier, so I don't think we need to document how - behaves with this method, just like how we don't document + or * or / behave.

Why ToHtml5 -> to_html_5 but ToHTMLElement -> _to_html_element? The latter should be to_html_element. In other words, the first character of a PascalCase identifier should not be prefixed with an underscore, and the only way to get an underscore prefix should be to start with one like _ToHTML5 -> _to_html_5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ToHtml5 -> to_html_5 but ToHTMLElement -> _to_html_element? The latter should be to_html_element. In other words, the first character of a PascalCase identifier should not be prefixed with an underscore, and the only way to get an underscore prefix should be to start with one like _ToHTML5 -> _to_html_5.

Because this is how it is implemented.

Details

godot/core/string/ustring.cpp

Lines 976 to 1015 in c5f62fa

String String::camelcase_to_underscore(bool lowercase) const {
const char32_t *cstr = get_data();
String new_string;
int start_index = 0;
for (int i = 1; i < this->size(); i++) {
bool is_upper = is_upper_case(cstr[i]);
bool is_number = is_digit(cstr[i]);
bool are_next_2_lower = false;
bool is_next_lower = false;
bool is_next_number = false;
bool was_precedent_upper = is_upper_case(cstr[i - 1]);
bool was_precedent_number = is_digit(cstr[i - 1]);
if (i + 2 < this->size()) {
are_next_2_lower = is_lower_case(cstr[i + 1]) && is_lower_case(cstr[i + 2]);
}
if (i + 1 < this->size()) {
is_next_lower = is_lower_case(cstr[i + 1]);
is_next_number = is_digit(cstr[i + 1]);
}
const bool cond_a = is_upper && !was_precedent_upper && !was_precedent_number;
const bool cond_b = was_precedent_upper && is_upper && are_next_2_lower;
const bool cond_c = is_number && !was_precedent_number;
const bool can_break_number_letter = is_number && !was_precedent_number && is_next_lower;
const bool can_break_letter_number = !is_number && was_precedent_number && (is_next_lower || is_next_number);
bool should_split = cond_a || cond_b || cond_c || can_break_number_letter || can_break_letter_number;
if (should_split) {
new_string += this->substr(start_index, i - start_index) + "_";
start_index = i;
}
}
new_string += this->substr(start_index, this->size() - start_index);
return lowercase ? new_string.to_lower() : new_string;
}

I just added the binding, documentation, and ported the code to C#.

Capitalized words, ACRONYMS and numbers are separated by adding _ before the beginning of the character group. Other characters are not considered. Because of this, there is a duplicate underscore, and this is probably a bug.

Perhaps an example with lowercase = false will help you better understand how this function works.

ToHtml5 ToHTMLElement toHtml to_html TO_HTML A_B-C.D
To_Html_5 _To_HTML_Element to_Html to_html _TO__HTML _A__B-_C._D
to_html_5 _to_html_element to_html to_html _to__html _a__b-_c._d

If you think that this needs to be fixed, then this is a separate question. But I can try to fix it. Is this the option it should be:

ToHtml5 ToHTMLElement toHtml to_html TO_HTML A_B-C.D
To_Html_5 To_HTML_Element to_Html to_html TO_HTML A_B-C.D
to_html_5 to_html_element to_html to_html to_html a_b-c.d

?

- cannot be part of a valid identifier, so I don't think we need to document how - behaves with this method, just like how we don't document + or * or / behave.

- used in filenames, commands, identifiers in some programming and markup languages.

@dalexeev
Copy link
Member Author

This method should be named with either snake_case or snakecase. "underscore" is ambiguous because there is also TO_HTML which uses underscores but isn't lowercase.

To do this, we have to rename the method in the core.

@neikeq
Copy link
Contributor

neikeq commented Aug 27, 2021

Why is this needed in core? Sounds like a niche use case. How often would this be used?

@dalexeev
Copy link
Member Author

@neikeq The use cases are listed in the OP. Recently, in a Russian-language chat, a person asked for advice with regular expressions to convert file names in his project. Then I noticed that such a function already exists, but is not available in the user code. This function can be useful for converting identifiers, both in the project code and when creating different tools.

capitalize is already exposed, so I see no reason why camelcase_to_underscore should remain hidden.

@neikeq
Copy link
Contributor

neikeq commented Aug 27, 2021

Ah, I didn't realize the method was already in the engine, just not exposed to scripts. In that case I would prefer if the C# version used an internal call instead of re-implementing the code. It's easier to maintain this way.

@dalexeev
Copy link
Member Author

Some functions use an internal call, some duplicate the code. I didn't know how best to proceed in this case, so I ported the C++ code, as in the case of capitalize.

@mhilbrunner
Copy link
Member

Yeah, this should be an internal call for C#. Otherwise looks good to me.

@AaronRecord
Copy link
Contributor

AaronRecord commented Aug 27, 2021

If you're adding one conversion method, why not add the others? Then this would close godotengine/godot-proposals#719.

@neikeq
Copy link
Contributor

neikeq commented Aug 27, 2021

Some functions use an internal call, some duplicate the code. I didn't know how best to proceed in this case, so I ported the C++ code, as in the case of capitalize.

Generally an internal call has a cost: call overhead and marshaling. If that cost would has a very noticeable impact on the function performance then we re-implement the method in C#, otherwise an internal call is preferred for maintenance purposes. Another thing to keep in mind is the complexity of the method and how likely it is that it will be re-written in the future.

Copy link
Contributor

@neikeq neikeq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this but I can already see calls to isUpper and isDigit in C# code. Those functions should be PascalCase, if they exist.
I suggest replacing it with an internal call though.

@dalexeev dalexeev force-pushed the expose-camelcase-to-underscore branch from b410a7b to 5f58e28 Compare August 27, 2021 16:05
@dalexeev
Copy link
Member Author

If you're adding one conversion method, why not add the others?

I see it like this:

s.to_lower()
s.to_upper()
s.capitalize()
s.to_snakecase()
s.to_camelcase()
s.to_pascalcase()

Are there any real use cases for camelcase_to_underscore with lowercase = false? Languages_With_This_Case are rare.

@dalexeev dalexeev force-pushed the expose-camelcase-to-underscore branch from 5f58e28 to 15dbf2f Compare August 27, 2021 17:33
@akien-mga
Copy link
Member

Needs a rebase, otherwise should be fine to merge.

@mhilbrunner
Copy link
Member

@dalexeev Can you rebase this? :)

@dalexeev dalexeev force-pushed the expose-camelcase-to-underscore branch from 15dbf2f to f330fd4 Compare September 16, 2021 07:20
@dalexeev
Copy link
Member Author

Can you rebase this?

Done. But I think we'd better implement this:

s.to_lower()
s.to_upper()
s.capitalize()
s.to_snakecase()
s.to_camelcase()
s.to_pascalcase()

@akien-mga
Copy link
Member

Done. But I think we'd better implement this:

s.to_lower()
s.to_upper()
s.capitalize()
s.to_snakecase()
s.to_camelcase()
s.to_pascalcase()

I think that's a good idea. Should we then close this PR and have someone (you?) work on that proposal (which solves godotengine/godot-proposals#719 I suppose)?

@dalexeev dalexeev closed this Sep 21, 2021
@dalexeev
Copy link
Member Author

dalexeev commented Aug 4, 2022

I think that's a good idea. Should we then close this PR and have someone (you?) work on that proposal (which solves godotengine/godot-proposals#719 I suppose)?

#63902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants