Just use trees for HIR? #8713

matklad · 2021-05-03T17:07:10Z

EDIT: this now has a design doc: https://hackmd.io/@matklad/rJzQhvk2u/edit

One of our early design decisions was to keep all semantic info out of the syntax trees. I think that worked well for the compiler side of rust-analyzer -- we managed to do a bunch of relatively clear IRs, which are relatively incremental.

However, I am not so convinced that this split works quite as well for the hir API. When implementing, eg, assists, working with syntax is easy. Patterns like node.ancestors().find_map(ast::Expr::cast) are emerging nice idioms. Going from syntax to semantics and back creates a lot of friction, both because there are two layers to start with, and because the bridges between them are unclear.

So I am thinking, what if hir exported API which looked exactly as our syntax-tree based API, but with additional semantic info? So that, you can ask .type for an expression, .resolve for path, etc.

When you ask for .parent of a module/file, you transparently get the module in another file.

For macro calls, there are two children: .arg() and .expansion(), and .expansion() goes into another file.

Similarly, .parent transparently goes via expansion stack.

The text was updated successfully, but these errors were encountered:

matklad · 2021-05-03T17:08:43Z

Note: got this idea after reading "early on, we decided that syntax trees would be the basic API of Roslyn" in some blog. My first reaction was "of course we can't do that, because macros", but then I realized that a macro can have both expansion and argument as children...

flodiebold · 2021-05-03T17:17:52Z

So, what would that concretely mean compared to the current API? I think 1. some types that currently are just data would gain an identity / location in the tree (e.g. so you can go back "up" from a Name), 2. most types gain a parent method if they don't have one, and some methods change to be more analogous to the syntax trees 3. we need to fill the current "holes" in the HIR (e.g. expressions)? 🤔 Or are you considering a more fundamental change to how the HIR types are represented? (Would the HIR methods still always take a database argument?)

I think this does sound like a good idea, and would probably give us some more guidance as to how to design the HIR APIs...

flodiebold · 2021-05-03T17:26:50Z

Would there be a HirNode analogous to SyntaxNode, so you can do some_expression_hir.ancestors().find_map(hir::Function::cast)? I wonder whether the HIR needs that flexibility, or whether there could instead be something like some_expression_hir.ancestor::<hir::Function>().

matklad · 2021-05-03T17:31:00Z

Yeah, I think that's roughly what I have in mind. I'd add the following as well:

we get a hir::Node untyped type, analogous to SyntaxNode, so that we can have .ancestrors() method (your version is also good though!)
macro expansion API is changed to be transparent (hir::MacroCall::expansion/argument)
we somehow fill-in syntactic gaps (hir::Struct::struct_keyword needs to be a thing probably, hir::get_token_at_offest is also needed)
(maybe) for syntax trees, we had quite a success with handles to nodes. When I think about hir::Struct as a handle, I think that maybe it's OK to embed a pointer to a database inside this handle?

matklad · 2021-05-06T22:46:29Z

Probably makes sense to take a second look at https://docs.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.semanticmodel?view=roslyn-dotnet-3.9.0, it feels like it has a lot of goodness

IsSpeculativeSemanticModel | Returns true if this is a speculative semantic model created with any of the TryGetSpeculativeSemanticModel methods

matklad · 2021-06-14T11:07:45Z

(Perpetual) Question: what should be the underlying storage strategy for such a hir?

I can see three different strategies:

// I. Copy type with reference to context
struct Store { ... }

#[derive(Clone, Copy)]
pub struct IfExpression<'a> {
    id: u32,
    store: &'a Store
}

impl IfExpression<'a> {
    pub fn cond(self) -> Expression<'a> { ... }
}

// II. Copy type with explicitly passed-in context
pub struct Store { ... }

#[derive(Clone, Copy)]
pub struct IfExpression {
    id: u32,
}

impl IfExpression {
    pub fn cond(self, store: &Store) -> Expression { ... }
}

// III. !Copy type with shared reference to the context
#[derive(Clone)]
pub struct IfExpression {
    id: u32,
    store: Rc<Store>
}

impl IfExpression {
    pub fn cond(self) -> Expression { ... }
}

Veykril · 2021-06-14T11:40:29Z

I feel like option 2 could get annoying when you need to pass the store as well as the database to some function calls.

flodiebold · 2021-06-14T11:45:57Z

I think I would go with option 2 for now, like with the current HIR. I think the store can and maybe has to contain a reference to the DB as well?

matklad · 2021-06-14T12:07:10Z

Yeah, the store would have to have a DB. How exactly it stores the db is also a question -- hir is polymorphic in the DB, and the caller can use a db with additional facts. I tried to account for that by making Semantics parametric over the database, but I don't really like this design -- seems to complicated, and breaks separate compilation somewhat. Seems better to keep just dyn HirDatabase in the store, and let the caller pass-in a better-typed reference themselves.

It also seems that the Store should be bigger than the database -- I think we should allow for some store-local caches, like we do in semantics. So, it'll probably be store: &'a Store<'a>.

matklad · 2021-06-14T22:55:31Z

Observation: III is basically how our syntax tree is setup actually.

flodiebold · 2021-06-15T08:33:21Z

Yeah. On the other hand, you don't usually need the DB to manipulate syntax trees, but you will need it to do things with the HIR. And I don't think we can put a DB reference into an Rc<Store> 🤔 So this would end up being option 2 as well anyway...

matklad · 2021-06-15T08:55:08Z

You don't need DB literary, but a reference to a single SyntaxNode does give access to the whole. The role of DB is played by the root node, which owns the underlying data (green node). I don't think we can put a reference to db, but, presumably, we can store Rc<Snapshot<DB>> or some such.

But it's a good point that, fundamentally, we have a lifetime here -- everything happens in a context of a running query. Even if we do fully owned API, that would be a footgun. If you store a hir node in some long-living datastructure, you'll notice that you can't actually apply modifications to the database, because the stored node will keep a database reference alive.

This I think rules out the third variant. Which is good -- it's the most tantalizing one!

flodiebold · 2021-06-15T08:57:08Z

Yeah. Of the remaining ones, option 1 is more inconvenient than 2, but it makes the connection of the nodes to the store clearer, and maybe that's a good thing 🤔

lnicola · 2021-06-15T17:45:51Z

The extra reference in option 1 is quite significant wrt. the memory usage, right?

flodiebold · 2021-06-16T15:24:33Z

Not sure, since we won't keep many of these around usually, I think?

matklad · 2021-06-20T16:12:41Z

Yeah, due to the lifetime, neither setup will be stored in long-term storage. Although optimizing transient memory usage and cache efficiency will be nice!

Couple of more considerations:

Option 2 can't implement useful Debug/Display directly on the hir nodes, which is inconvenient. (again, the fact that {} works for trees is a godsend when it comes to figuring out why a certain assist does not work).
At some point, we will expose this API for an unbounded set of people to use. If I think in that context, the "pass the store in explicitly" starts sounding more weird in comparison to only us using the API.

bjorn3 · 2021-06-20T16:50:16Z

Option 2 can't implement useful Debug/Display directly on the hir nodes, which is inconvenient. (again, the fact that {} works for trees is a godsend when it comes to figuring out why a certain assist does not work).

You could have a .display() function that takes the store and returns a value that implements Display.

matklad · 2021-06-22T14:09:45Z

I am warming up to the idea more and more. At the same time, I feel a bit scared -- it looks like this might be the last time we have a chance to re-design the whole API. I feel a bit of waterfall can be helpful here! So, I suggest using this document to collaboratively iterate on the design:

https://hackmd.io/@matklad/rJzQhvk2u/edit

It's very bare bones, I plan to fill it in in the coming weeks, but I want to lay the design framework here upfront!

xFrednet · 2022-08-09T12:18:55Z

Hello, I'm currently working on marker (formally rust-linting) where I try to create a stable driver independent linting interface for Rust. The IR will be based on the Rust Reference, to ensure that it's (as much as possible) driver independent.

From reading this discussion and the linked document, it looks the goal of the these project mostly align. The project will mostly focus on usability and less on performance, which was mentioned in the linked document. However, based on the input, I'll try to make a few more things lazy evaluated and leave the cashing/node storage up to the driver.

Right now, the project is in a very early stage and the current test representation has to be rewritten for various reasons. So, there is probably very little/nothing that can be done right now to help. I mainly wanted to get this idea out there. 🙃 Though, @HKalbasi has expressed interest to test rust-analyzer as a driver.

matklad added the C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) label May 3, 2021

flodiebold mentioned this issue May 9, 2021

Extract function assist will add async if required #8766

Merged

matklad mentioned this issue May 10, 2021

rust-analyzer reader-mode #8696

Open

matklad mentioned this issue Jun 22, 2021

internal: document source_to_def and it's connection to Kotlin&Roslyn #9369

Merged

Veykril added the Broken Window Bugs / technical debt to be addressed immediately label Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Just use trees for HIR? #8713

Just use trees for HIR? #8713

matklad commented May 3, 2021 •

edited

Loading

matklad commented May 3, 2021

flodiebold commented May 3, 2021 •

edited

Loading

flodiebold commented May 3, 2021

matklad commented May 3, 2021

matklad commented May 6, 2021

matklad commented Jun 14, 2021

Veykril commented Jun 14, 2021

flodiebold commented Jun 14, 2021

matklad commented Jun 14, 2021

matklad commented Jun 14, 2021

flodiebold commented Jun 15, 2021

matklad commented Jun 15, 2021

flodiebold commented Jun 15, 2021

lnicola commented Jun 15, 2021 •

edited

Loading

flodiebold commented Jun 16, 2021

matklad commented Jun 20, 2021

bjorn3 commented Jun 20, 2021 •

edited

Loading

matklad commented Jun 22, 2021

xFrednet commented Aug 9, 2022 •

edited

Loading

Just use trees for HIR? #8713

Just use trees for HIR? #8713

Comments

matklad commented May 3, 2021 • edited Loading

matklad commented May 3, 2021

flodiebold commented May 3, 2021 • edited Loading

flodiebold commented May 3, 2021

matklad commented May 3, 2021

matklad commented May 6, 2021

matklad commented Jun 14, 2021

Veykril commented Jun 14, 2021

flodiebold commented Jun 14, 2021

matklad commented Jun 14, 2021

matklad commented Jun 14, 2021

flodiebold commented Jun 15, 2021

matklad commented Jun 15, 2021

flodiebold commented Jun 15, 2021

lnicola commented Jun 15, 2021 • edited Loading

flodiebold commented Jun 16, 2021

matklad commented Jun 20, 2021

bjorn3 commented Jun 20, 2021 • edited Loading

matklad commented Jun 22, 2021

xFrednet commented Aug 9, 2022 • edited Loading

matklad commented May 3, 2021 •

edited

Loading

flodiebold commented May 3, 2021 •

edited

Loading

lnicola commented Jun 15, 2021 •

edited

Loading

bjorn3 commented Jun 20, 2021 •

edited

Loading

xFrednet commented Aug 9, 2022 •

edited

Loading