Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto indent issue: Lua multiline quotes [[...]] conflict with #210641 which removes brackets before passing text to indent rules #223606

Open
tomlau10 opened this issue Jul 25, 2024 · 3 comments · May be fixed by #230632
Assignees

Comments

@tomlau10
Copy link

tomlau10 commented Jul 25, 2024

Does this issue occur when all extensions are disabled?: Yes

  • VS Code Version: Version: 1.90.2; Commit: 5437499
  • OS Version: macOS Big Sur 11.6

I am working on improving Lua's auto indentation rules to fix keywords inside string literal will trigger false positive as described in #199223. If interested, you can see my progress in that issue. However I found another problem during my development.

My regex should be able to handle this case:

if s == [[then]]
  • my regex should be able to tell that then is not followed by pure trailing space / comment => should not be indented
  • however vscode will still indent it on enter
  • I then used the debugger in developer tool, and found that the actual string being evaluated is different from the actual text ‼️

Root Cause Analysis

After some searching, I found that a while ago the issue #209862 proposed to remove brackets from string or comments before passing to the regex module, which fix auto indent issue related to bracket detection. And it is implemented and released in PR #210641.

I see some discussion going on here #209519 (comment), and I understand the rationale behind. Nonetheless Lua has a multiline quote [[...]] which uses the bracket character, and it can even be nested [==[...]==].
ref: http://lua-users.org/wiki/StringsTutorial

I believe that the current logic of removing brackets from string or comments is to detect a string token type (in the above case is the [[then]]) and then remove all [ / ] from it (as they are defined as bracket in Lua's language config). But then after striping the surrounding [[]], the then token stands alone. And my regex have no way to deal with this case, which will cause a false positive ☹️

More example

To make things worse, Lua also has a multiline comment --[[...]] that can also be written in single line. 😂
Take the following as example:

if s == --[[then]] then
  • this should have indent on next line
  • but after [[ and ]] are stripped, the text becomes if s == --then then
  • and the regex will just skip the trailing comment part, because both of the then is after a -- comment

Quick thought

I don't have any bright idea right now. By just by reading the discussion mentioned above, I guess maybe removing all contents in string / comment, including the quotes / comment prefix would be a better solution? Because indentation rule generally will not depends on the content inside string / comment? 🤔
Using the examples above:

  • if s == [[then]] => becomes if s == => no more false positive ✅
  • if s == --[[then]] then => becomes if s == then => again just check for \bthen\b\s*$ is enough to match this ✅
@vscodenpa
Copy link

Thanks for creating this issue! It looks like you may be using an old version of VS Code, the latest stable release is 1.91.1. Please try upgrading to the latest version and checking whether this issue remains.

Happy Coding!

@tomlau10 tomlau10 changed the title Lua multiline quotes [[...]] does not goes well with #210641 which removes brackets before passing text to indent rules Auto indent issue: Lua multiline quotes [[...]] conflict with #210641 which removes brackets before passing text to indent rules Jul 25, 2024
@Tyriar Tyriar assigned aiday-mar and unassigned Tyriar Jul 25, 2024
@tomlau10
Copy link
Author

After a second thought, removing all content including the quotes seems NOT viable 🤔
It will cause troubles when the code is minified.

  • Consider the following example:
if"true"then
  • if we just removing all contents including the quotes, this line becomes ifthen
  • ifthen is a single word, and indent regex will never be able to recognize it 💥

@tomlau10
Copy link
Author

tomlau10 commented Jul 27, 2024

Hi, I think I come up with a good solution 😄

  • The problem here is that we want to remove entire string / comment, including quotes / prefix to avoid false positive detection of keywords inside them
    • But we cannot just replace them with empty string, that would cause trouble if the line is a minified code
    • => To fix this, we can replace it with empty space ‼️
  • However we should not do replacement if a line is purely string / comment
    • This might mess up the getPrecedingValidLine logic inside a multiline string / comment block, as the entire line becomes a single space
    • => To fix this, we need to do this replacement only if the line contains some of other token types

My proposed change

I believe the related code logic resides here:

tokens.forEach((tokenIndex: number) => {
const tokenType = tokens.getStandardTokenType(tokenIndex);
let text = tokens.getTokenText(tokenIndex);
if (shouldRemoveBracketsFromTokenType(tokenType)) {
text = text.replace(bracketsRegExp, '');
}

The simple change will be something like this:

let hasOtherTokenType = false;
tokens.forEach((tokenIndex: number) => {
    const tokenType = tokens.getStandardTokenType(tokenIndex);
    hasOtherTokenType = hasOtherTokenType || !shouldRemoveBracketsFromTokenType(tokenType);
})
tokens.forEach((tokenIndex: number) => {
    const tokenType = tokens.getStandardTokenType(tokenIndex);
    let text = tokens.getTokenText(tokenIndex);
    if (shouldRemoveBracketsFromTokenType(tokenType)) {
        if (hasOtherTokenType) {
            text = " "; // if this line contents other token type, we replace the whole content with single space
        } else {
            text = text.replace(bracketsRegExp, '');
        }
    }
  • hasOtherTokenType: first find out if the line has any token types other than string | comment | regex
  • if true, then in the token replacement forloop, instead of just replacing brackets, we replace the entire token as single space
  • this should be able to strip the quotes / comment prefix, and will not confuse indentation pattern anymore!

Expected Result

I tested this change locally, and this solves #199223 directly, along with all the test cases that I defined there. 🎉
I don't even have to modify Lua's current indentation rule, as what it lacks is just the ability to ignore keywords in string ("...", [[...]]) / single lined block comment (--[[...]]).

Please would @aiday-mar consider this with the team when you have time 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants