Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: syntax tree patterns #3875

Merged
merged 1 commit into from
Nov 23, 2022

Conversation

fkohlgrueber
Copy link
Contributor

@fkohlgrueber fkohlgrueber commented Mar 12, 2019

This PR adds an RFC for implementing lints using syntax tree patterns.

Rendered RFC (outdated)

Update: A more thorough description of the current concept is given in my thesis.

cc @oli-obk

Copy link
Member

@flip1995 flip1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed through this RFC. That sounds like an interesting project.

s/clippy/Clippy/


Following the motivation above, the first goal this RFC is to **simplify writing and reading lints**.

The second part of the motivation is clippy's dependence on unstable compiler-internal data structures. Clippy lints are currently written against the compiler's AST / HIR which means that even small changes in these data structures might break a lot of lints. The second goal of this RFC is to **make lints independant of the compiler's AST / HIR data structures**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
The second part of the motivation is clippy's dependence on unstable compiler-internal data structures. Clippy lints are currently written against the compiler's AST / HIR which means that even small changes in these data structures might break a lot of lints. The second goal of this RFC is to **make lints independant of the compiler's AST / HIR data structures**.
The second part of the motivation is clippy's dependence on unstable compiler-internal data structures. Clippy lints are currently written against the compiler's AST / HIR which means that even small changes in these data structures might break a lot of lints. The second goal of this RFC is to **make lints independent of the compiler's AST / HIR data structures**.


#### Applicability

Even though I'd expect that a lot of lints can be written using the proposed pattern syntax, it's unlikely that all lints can be expressed using patterns. I suspect that there will still be lints that need to be implemented by writing custom pattern matching code. This would lead to mix within clippy's codebase where some lints are implemented using patterns and others aren't. This inconsistency might be considered a drawback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this, I especially see potential to provide recuring patterns in the utils module, which can be used across lints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I'd eventually like to see is that all lints either use something from utils or something from the macro.

Eventually, we can even make these utils checks part of the macro, by allowing you to add additional checks like where named implements Trait

(This opens up the opportunity to write lints as text files devoid of rust code)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be nice to have an external crate that could be re-used by rustc and potentially other linters.

}
```

This pattern matches arrays that end with at least one literal. Now given the array `[x, 1, 2]`, should `1` be matched as part of the `_*` or the `Lit(_)+` part of the pattern? The difference is important because the named submatch `#literals` would contain 1 or 2 elements depending how the pattern is matched. In regular expressions, this problem is solved by matching "greedy" by default and "non-greedy" optionally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for matching regex style. If the pattern macro is sold as "like regex" it should be as near as possible to the regex behavior, so it can be picked up more easily.

@flip1995 flip1995 added S-needs-discussion Status: Needs further discussion before merging or work can be started C-an-interesting-project Category: Interesting projects, that usually are more involved design/code wise. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Mar 12, 2019
@felix91gr
Copy link
Contributor

This is a very interesting idea! I think it could help a lot for newcomers specially because the rustc APIs used by most of Clippy, given their unstable status, aren't as documented as one might like, and this makes the learning curve steeper.

I have a question, from the maintainability point of view expressed in the RFC: this would shift the maintenance burden from "all the lint code" to just "the code that translates from the Syntax Tree Patterns to the current AST, HIR and MIR APIs", right?

@fkohlgrueber
Copy link
Contributor Author

@felix91gr Yes, that's the idea. Implementation detail changes (e.g. renaming ast::Expr to ast::ExprType) in the AST / HIR would only require adapting the IsMatch trait implementations, not all existing lints.

There are two restrictions to this though:

  1. If the AST / HIR changes in a way that requires the PatternTree to change as well (e.g. larger structural changes), we'd still have to fix (possibly many) lints.
  2. Using named submatches, a lint can get a reference to an AST / HIR node (see this section of the RFC). For example, the collapsible_if lint uses the span attributes of various AST nodes. Using these node references makes the lint dependent on the actual AST / HIR implementation (e.g. if the span attribute on AST nodes would be renamed, the collapsible_if lint would break).

Regarding (1), I don't really know how likely larger changes to the AST / HIR data structures are at this point. Are there any plans already? What are the most frequent changes to these data structures?

The second point could probably be solved by abstracting common additional checks (e.g. getting the span of a node) into functions in util. Then changes would only break some util functions instead of a lot of lints.

Does this answer your question?

@felix91gr
Copy link
Contributor

@fkohlgrueber yes it does! :)

Regarding the two limitations,

  • I think 1. makes a lot of sense. Whether or not changes to AST and HIR data structures are likely at this point, I don't know. There'll definitely be some more stable than others.
  • I agree with your point regarding 2.. Those attributes sound very abstractable, but at the same time, I think that starting with a smaller viable prototype like yours is best. After this feature is being used, fully covering the edge cases pointed out in 2. can be added :D

@felix91gr
Copy link
Contributor

PS: take a look at https://github.com/rust-lang/rust-clippy/projects/2, it would seem to me that this RFC has a place in there somewhere.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really well written and designed, kudos! I have some minor side comments but they shouldn't affect the RFC in its current form.

Thanks for working on this, I've wanted something like this for ages!

Block(Seq<Stmt>),
}
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we also need to deal with impls, structs, etc? A lot of clippy lints deal with items not expressions.

Of course as a first step this is still amazing :)


#### Applicability

Even though I'd expect that a lot of lints can be written using the proposed pattern syntax, it's unlikely that all lints can be expressed using patterns. I suspect that there will still be lints that need to be implemented by writing custom pattern matching code. This would lead to mix within clippy's codebase where some lints are implemented using patterns and others aren't. This inconsistency might be considered a drawback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I'd eventually like to see is that all lints either use something from utils or something from the macro.

Eventually, we can even make these utils checks part of the macro, by allowing you to add additional checks like where named implements Trait

(This opens up the opportunity to write lints as text files devoid of rust code)

}
```

The difference compared to the currently proposed two-stage filtering is that using early filtering, the condition (`!in_macro(#then.span)` in this case) would be evaluated as soon as the `Block(_)#then` was matched.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allow where statements to be shoved in anywhere, allowing the user to specify when the filtering happens. Not necessary for now, but nice to have.


#### Match descendant

A lot of lints currently implement custom visitors that check whether any subtree (which might not be a direct descendant) of the current node matches some properties. This cannot be expressed with the proposed pattern syntax. Extending the pattern syntax to allow patterns like "a function that contains at least two return statements" could be a practical addition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be expressed as a mix of a walking function in utils and patterns with the current proposal.

Lit(Alt<Lit>),
Array(Seq<Expr>),
Block_(Alt<BlockType>),
If(Alt<Expr>, Alt<BlockType>, Opt<Expr>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, perhaps users of opt can be made to use named parameters? I.e.

If(_, _), vs If(_, _, then=_) vs If(_, _, then=_?). It might be cleaner that way. I'm not sure.

@flip1995
Copy link
Member

flip1995 commented Jun 6, 2019

Friendly ping @fkohlgrueber. Any updates on this project?

@fkohlgrueber
Copy link
Contributor Author

Hey Philipp, thanks for reminding me. I wanted to post updates here some time ago but have forgotten to do so unfortunately. I've continued to work on the project over the last two months and implemented some additional features (e.g. functional composition). I'm currently focusing on writing my thesis report and after that is finished, I'll have time to work on the integration into Clippy again.

@fkohlgrueber
Copy link
Contributor Author

Hello everyone! It's been some time, but I can now share my finished thesis report with you:

https://fkohlgrueber.github.io/thesis.pdf

Over the next six months, I'll continue to work on this topic approximately one day per week. My goal is to integrate my thesis work into Clippy until February 2020.

I'm currently thinking about how to do the integration and would like to hear your thoughts on this. Since it's been quite some time since I did most of the coding, I'll first update my projects to work with the latest clippy / rustc nightly. After that, I'm not sure how to proceed further.

One possible way to go would be to integrate the pattern-based collapsible_if lint first and then implement more and more lints using patterns afterwards.

What do you think?

@flip1995
Copy link
Member

flip1995 commented Sep 3, 2019

Awesome!

I think the first step would be to integrate a MVP of the pattern code in Clippy, which can be used optionally. Then documentation with "How do I switch to the pattern based code?"-instructions would be great. This documentation could be created while making the first lint (collapsible_if) pattern based.

Once it works with the first lint, it would be great to create one or more tracking issues, where lints are grouped by:

  1. Should be relatively easy to make these lints pattern based
  2. It's possible, but requires an extension of the MVP
  3. It's quite hard to switch, but probably possible
  4. It probably won't be possible to make these lints pattern based.

Making such a grouping of lints could be time consuming, but I think that's the best way to get contributors to help you with the transition. (And "There are 311 lints included in this crate!", so I think you'll need help on this project 😉).

Once that is done we can swap to the patterns one lint at a time.

@Manishearth
Copy link
Member

@flip1995's plan sounds good to me! Probably should open a tracking issue for this.

@llogiq
Copy link
Contributor

llogiq commented Sep 3, 2019

I just found the time to read the RFC. Some thoughts:

  • There is prior art for this. For example PMD uses a related scheme, but using its own interpreted language for the patterns, IIRC.
  • A possible extension would be type patterns (note: I mean ty::Ty, not hir::Ty).

@fkohlgrueber
Copy link
Contributor Author

I should have been more explicit that the RFC linked above describes a preliminary version of the concept that has been extended afterwards. The thesis documents the current state and is a better description of the concept. The main differences are:

  • Functional composition (listed under "Future Possibilities" in the RFC) has been implemented
  • Support for multiple languages (and meta-lints in particular; see RFC) has been implemented

@llogiq I didn't know about PMD, thanks for the hint. Could you elaborate what you mean by type patterns? My understanding currently is that type information is only useful after name resolution has been performed, which would mean using HIR, right? What's the difference between ty::Ty and hir::Ty?

@JeanMertz
Copy link
Contributor

The thesis documents the current state and is a better description of the concept.

That makes sense. Perhaps it's a good idea to update the PR description and add a link to the thesis so that others can instantly find it instead of having to read the comments?

@fkohlgrueber
Copy link
Contributor Author

@JeanMertz That's true. I updated the PR's description.

@benmkw
Copy link

benmkw commented Aug 31, 2021

rust analyzer will probably integrate with this at some point rust-lang/rust-analyzer#8696 (comment) and rust-lang/rust-analyzer#8696 (comment)

currently there exists structural search and replace https://rust-analyzer.github.io/manual.html#structural-search-and-replace which is some small subset

@Manishearth
Copy link
Member

@rust-lang/clippy So it occurred to me that this PR is still open, and even though we're not working on this, do y'all think we should still merge this? Overall it's a very neat design and will likely work into the lang team's plans of doing custom lints at some point. I'd definitely like to have this doc in a spot that's a bit more permanent than just a PR 😄

@flip1995
Copy link
Member

flip1995 commented Apr 6, 2022

I plan to finish the work on the first version of the Clippy book on Saturday. So I would wait until the book is merged and then include this into the book. But sure, I'm fine with merging this 👍

@Manishearth
Copy link
Member

Sounds good. I figure this doesn't necessarily go in the book yet because it's a propoed design, but if we can incorporate it that would still be great!

@felix91gr
Copy link
Contributor

Omg omg omg omg this is such a cool proposal, and it's been a while so I feared it might never be merged. I'm glad y'all have regained time to work on it / think about it. I'm looking forward to it!

I hope things are okay for you both, @Manishearth and @flip1995. Times have been hard ❤️‍🩹

@flip1995
Copy link
Member

flip1995 commented Apr 6, 2022

I figure this doesn't necessarily go in the book

I guess I'll turn the "Roadmap" section that currently exists in the draft of the book into a "Proposal" section. This proposal section will then only include the 2021 Roadmap (that we aren't close to completing in 2022 lol) and this for now. But we leave things open for future bigger proposals for Clippy / linting in general.

I'm glad y'all have regained time to work on it

@felix91gr Finding time to work on this is out of scope for me for the foreseeable future. But I like the proposal in general, so I'm happy to do the smallest possible amount of work for moving this forward with merging it so it becomes more official and maybe someone else feels encouraged to work on it.

Hope you're also doing well!

@Manishearth
Copy link
Member

I guess I'll turn the "Roadmap" section that currently exists in the draft of the book into a "Proposal" section. This proposal section will then only include the 2021 Roadmap (that we aren't close to completing in 2022 lol) and this for now. But we leave things open for future bigger proposals for Clippy / linting in general.

Ah, makes sense!

@Manishearth
Copy link
Member

I totally dropped the ball on this but I'm at least goign to hit merge, and @flip1995 can look at moving it into the book?

@Manishearth Manishearth merged commit be172c5 into rust-lang:master Nov 23, 2022
bors added a commit that referenced this pull request Nov 25, 2022
Move syntax tree patterns RFC to the book

r? `@Manishearth`

Follow up to #3875

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-an-interesting-project Category: Interesting projects, that usually are more involved design/code wise. S-needs-discussion Status: Needs further discussion before merging or work can be started S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants