Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give tip for unicode chars which might not be visible when rendered #100439

Closed
wants to merge 1 commit into from

Conversation

chenyukang
Copy link
Member

@chenyukang chenyukang commented Aug 12, 2022

Fixes #100388

@rust-highfive
Copy link
Collaborator

r? @petrochenkov

(rust-highfive has picked a reviewer for you, use r? to override)

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 12, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 12, 2022
@rust-log-analyzer

This comment has been minimized.

@@ -322,6 +322,8 @@ impl<'a> StringReader<'a> {
let token = unicode_chars::check_for_substitution(self, start, c, &mut err);
if c == '\x00' {
err.help("source files must contain UTF-8 encoded text, unexpected null bytes might occur when a different encoding is used");
} else if let Ok(v) = &err.suggestions && v.len() == 0 && !c.is_ascii() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a different heuristic for "is not visible when rendered", since this triggers on every non-ascii char.

Also, why are you checking err.suggestions here?

Copy link
Member Author

@chenyukang chenyukang Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a different heuristic for "is not visible when rendered", since this triggers on every non-ascii char.

Also, why are you checking err.suggestions here?

Yes, I'm still thinking any better method for check a unicode...

I'm checking the err.suggestions here, because we already have a strategy to give suggestions for those possible substitution,

let token = unicode_chars::check_for_substitution(self, start, c, &mut err);
.

My idea is if we already have such kind of suggestion, we won't add more, since it's clear enough:

 --> /home/cat/code/rust/src/test/ui/parser/unicode-quote-chars.rs:2:14
  |
2 |     println!(“hello world”);
  |              ^
  |
= help: Unicode character \u{323} might not be visible when rendered
help: Unicode characters '“' (Left Double Quotation Mark) and '”' (Right Double Quotation Mark) look like '"' (Quotation Mark), but are not

I think = help: Unicode character \u{323} might not be visible when rendered here will be redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking the err.suggestions here, because we already have a strategy to give suggestions for those possible substitution,

I think if you change your PR to only note characters that have a rendered width of zero (i.e. combining characters, etc) then this check is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a better way to figure out "not visible when rendered" chars.
We have a test case in

fn is_es_whitespace(self) -> bool {

The unicode_space_separator, unicode_space_separator and combining_spacing_mark categories seem contain the chars which are hard to spot, but those are also not totaly 'not visible`, and I think introduce those big tables into compiler code base maybe too heavy ..

@petrochenkov
Copy link
Contributor

petrochenkov commented Aug 12, 2022

@rustbot author
(to address compiler-errors' comments)

@rustbot
Copy link
Collaborator

rustbot commented Aug 12, 2022

Error: Parsing shortcut command in comment failed: ...'bot author' | error: expected end of command at >| ' (to addre'...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 12, 2022
@CAD97
Copy link
Contributor

CAD97 commented Aug 13, 2022

You'll need to use the unicode table generator to generate a table of the characters with General Category=Nonspacing Mark and/or Bidi_Class=Nonspacing_Mark for which to warn on. Adding the table to rustc is probably fine, adding it to std is nondesirable.

The UI test is unrelated.

@bors
Copy link
Contributor

bors commented Sep 28, 2022

☔ The latest upstream changes (presumably #102302) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon
Copy link
Member

ping from triage:
@chenyukang

Can you please address the merge conflicts - and post your status on this PR?

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@chenyukang chenyukang closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing error message in the presence of unicode combining characters
9 participants