Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing error message in the presence of unicode combining characters #100388

Open
Mandragorian opened this issue Aug 10, 2022 · 6 comments
Open
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST A-Unicode Area: Unicode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Mandragorian
Copy link
Contributor

Mandragorian commented Aug 10, 2022

Given the following code:

fn main() {
    println!("hello {}"̣, 1);
}

Playground link

The current output is:

error: unknown start of token: \u{323}
 --> src/main.rs:2:23
  |
2 |     println!("hello{}"̣, 1);
  |                       ^

error: could not compile `playground` due to previous error

Ideally the output should look like:

error: unknown start of token: \u{323}
 --> src/main.rs:2:23
  |
2 |     println!("hello{}"\u{323}, 1);
  |                       ^
help: Unicode character '\u{323}' might not be visible when rendered
error: could not compile `playground` due to previous error
@Mandragorian Mandragorian added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 10, 2022
@chenyukang
Copy link
Member

I think the current output is already helpful enough to point out the issue.
In my opinion, the source code shown in disgnostics should keep same format with original code.

@CAD97
Copy link
Contributor

CAD97 commented Aug 11, 2022

That said, it could be useful to print a help message for any Bidi_Class=Nonspacing_Mark characters (as an approximation for characters that could be difficult to spot)

@Mandragorian
Copy link
Contributor Author

Just letting the user know that there might be "invisible" characters would help.

@chenyukang
Copy link
Member

Just letting the user know that there might be "invisible" characters would help.

Agree.
@rustbot claim

@chenyukang
Copy link
Member

chenyukang commented Aug 12, 2022

@CAD97
Do you know is there any simple way in Rust to test whether a char is in range of Nonspacing_Mark?
Currently, we have a https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/lexer/unicode_chars.rs#L10 to give suggestion for some special Unicode char, but it's not completed.

I found there is a crate: https://github.com/swgillespie/unicode-categories
I think it may too heavy to add a new crate for this? 😁

Maybe give a help message for !c.is_ascii() is also good?

Update: we also have a function in UI

fn is_unicode_nonspacing_mark(self) -> bool { false }

@CAD97
Copy link
Contributor

CAD97 commented Aug 13, 2022

The function in test/ui is part of a minimized repro of #29227, so not relevant. I just picked Nonspacing_Mark as a likely candidate for a property to check, not because the compiler already has access to it.

You'll need to use the unicode table generator to generate a table of the characters with General Category=Nonspacing Mark and/or Bidi_Class=Nonspacing_Mark for which to warn on. Adding the table to rustc is probably fine, adding it to std is nondesirable.

@chenyukang chenyukang removed their assignment Nov 10, 2022
@jyn514 jyn514 added the A-parser Area: The parsing of Rust source code to an AST label Apr 17, 2023
@workingjubilee workingjubilee added the A-Unicode Area: Unicode label Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST A-Unicode Area: Unicode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants