Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Move devex] Move source code linters to explain language subset #20

Open
sblackshear opened this issue Dec 2, 2021 · 23 comments
Open
Labels
move Priority: Medium A nice to have feature or annoying bug, non-blocking and no delays expected if we punt on it Triaged Issue is associated with a milestone and has at least 1 label of each: status, type, & priority Type: Enhancement New feature or request
Milestone

Comments

@sblackshear
Copy link
Collaborator

FastX needs a number of custom bytecode verifier passes (see #18, #19, #16, #15) to keep users in the no-shared-memory subset of Move. This is sufficient for on-chain safety, but for an end-user trying to write code in the Move source language, finding out that you've strayed from the subset via a bytecode verifier error at module publish time is not a great experience.

We should address this is by adding custom linters that enforce the subset rules at the source level. See this for some examples in the existing compiler. The code for this linters should live in the fastNFT repo. We will probably need to extend the upstream compiler to make it easy to plug in custom linters.

@lxfind lxfind added this to the GDC milestone Jan 21, 2022
@awelc
Copy link
Contributor

awelc commented Feb 17, 2022

@sblackshear - this looks like a useful thing to have for GDC and Move seems to have some support for (custom?) linting already? I don't want to spend too much time looking into it if you think it's too much to chew for me at this point, which is very likely (I doubt I could write a Move's version of clippy from scratch including Move-side support for custom linting by the GDC deadline :-) ). Still, perhaps you can share some opinion and/or give some advice here?

@awelc awelc added the move label Mar 28, 2022
@awelc awelc changed the title [fastx devex] Move source code linters to explain language subset [Move devex] Move source code linters to explain language subset Mar 28, 2022
@bholc646 bholc646 removed this from the GDC milestone Apr 6, 2022
@bholc646 bholc646 removed the move label Apr 6, 2022
@awelc
Copy link
Contributor

awelc commented Apr 12, 2022

Here are some ideas on how to get this to work, after having some conversations on the topic with @tnowacki.

The main goal is to (mostly) provide support for writing linters to blockchain (and Move infrastructure) developers, rather than to the end users (programmers targeting a given blockchain and its Move flavor). This is mostly pragmatic, as the latter would be preferred, but considering resources and time required to create a linting framework supporting some kind of declarative linter specification is a bit beyond our reach right now (and probably unnecessary).

The idea the is to provide a rather simple set of primitives over the existing compiler that would support writing linting checks. We could provide a fairly "light" linting CLI that would offer the following:

  • expose one of the existing compiler ASTs for traversal - a given lint check would have to "manually" traverse the AST (we could provide some abstraction there, such as visitor of some kind) and check for a given property (this would be roughly equivalent to writing linters for Go using go/parser and go/ast packages as described here)
  • create a "driver" that would abstract a linter as struct that embeds a function pointer implementing a given AST traversal (plus some metadata if necessary), takes as an input a vector of such structs, and runs these in parallel (this would be roughly equivalent to writing linters for Go using go/analysis package as described here)
  • integrate diagnostics implementation from the Move repo to simplify generation of linter messages during traversal (these should also be generated at different levels, starting with, say, deny and warn).

Most of this code would live in the Move repo, including linters for core Move. A linter "client" (e.g. Sui) would use the driver implementation, optionally some/all core Move linters, and build a custom CLI tool with additional "abstracted" linters that would specify customized checks for a given client.

@tnowacki
Copy link
Contributor

I think most of the linting can be done over the typed AST, but I just remembered that there is this ObjectID escape analysis that is super general. It will need the abstract interpreter. Luckily exposing just that pass should be really really easy, as the abstract interpreter is already setup with a very general trait system

@tnowacki
Copy link
Contributor

I think an expression visitor could be a helpful utility, and would be helpful for writing the check that bans global storage operations. Writing that visitor is non-trivial due to the recursive structure.

Everything else for the typed AST should be easy enough

@tnowacki
Copy link
Contributor

integrate diagnostics implementation from the Move repo to simplify generation of linter messages during traversal (these should also be generated at different levels, starting with, say, deny and warn).

This one will be the most annoying for the compiler in its current architecture. There is this trait though for the diagnostic codes I set up, maybe we could make one just for sui things

@awelc
Copy link
Contributor

awelc commented Apr 12, 2022

I think most of the linting can be done over the typed AST, but I just remembered that there is this ObjectID escape analysis that is super general. It will need the abstract interpreter. Luckily exposing just that pass should be really really easy, as the abstract interpreter is already setup with a very general trait system

We should talk about this in more detail as I am not sure if ObjectID escape analysis comes before or after the typed AST is produced, nor why we necessarily have to care about it...

@awelc
Copy link
Contributor

awelc commented Apr 12, 2022

This one will be the most annoying for the compiler in its current architecture. There is this trait though for the diagnostic codes I set up, maybe we could make one just for sui things

I thought it would be pretty non-controversial in that we could re-use the diagnostics facility to easily generate error messages. Something like instantiating the Diagnostics struct with the location, error message etc. and re-using the existing facility to format and display it. If it's problematic, we could adopt a different (simpler?) way of reporting linting errors.

@awelc awelc added Type: Enhancement New feature or request Priority: Medium A nice to have feature or annoying bug, non-blocking and no delays expected if we punt on it Triaged Issue is associated with a milestone and has at least 1 label of each: status, type, & priority labels Apr 14, 2022
@awelc awelc added this to the Mainnet milestone Apr 14, 2022
@sblackshear
Copy link
Collaborator Author

I think we should consider using move-model for linters. It already exists, has hooks for reporting source diagnostics, and has an IR that has been used to write several linter-like tools (e.g., the escape analysis for the robust safety paper, some lints on prover specs) and is pretty nice to work with. Importantly, it supports interprocedural analysis, which I think will be important for a number of linters, but will be tougher to do inside the compiler because it heavily leverages modularity (as it should).

@tnowacki
Copy link
Contributor

The move model might be okay for some passes but really lacks for anything detailed.

The biggest thing here is it does not have any source level expressions for function definitions, just compiled bytecode.
This is going to lead to less than idea error messages in most places.

Importantly, it supports interprocedural analysis, which I think will be important for a number of linters

Interprocedural analysis something we have actively avoided for the source language. These sorts of passes are slow and brittle, and tend to lead to rather confusing error messages. They are really good for static analysis type checks, since if it is confusing or inaccurate, they can be ignored (which might certainly be true for certain "linter" style pases).

But if you want to use this for anything that is more a "type system" style pass, you really want the compiler and all of its tooling.

@tnowacki
Copy link
Contributor

@awelc, sorry I missed your comments

We should talk about this in more detail as I am not sure if ObjectID escape analysis comes before or after the typed AST is produced, nor why we necessarily have to care about it...

The current rules in the Sui verifier I believe are control flow sensitive. So you will need to use the CFGIR AST, and the abstract interpreter.
Alternatively, we can come up with a rule that is stricter than the verifier and use that.

I thought it would be pretty non-controversial in that we could re-use the diagnostics facility to easily generate error messages. Something like instantiating the Diagnostics struct with the location, error message etc. and re-using the existing facility to format and display it. If it's problematic, we could adopt a different (simpler?) way of reporting linting errors.

Now that I think about it, it might not be that bad? We just have the Sui linter have a diagnostic code that implements DiagnosticCode
The only blocker right now is the Category for the diagnostic is an enum. We will need to make it just some other trait+associated type or something so the Sui checks can have one

@sblackshear
Copy link
Collaborator Author

The biggest thing here is it does not have any source level expressions for function definitions, just compiled bytecode.
This is going to lead to less than idea error messages in most places.

I believe Move models produced from source have very detailed source context information. I have used this feature in diagnostics, anyway. I'm sure there are cases where it falls short of what you can do in the compiler, but my expectation is that the error messages we can produce will still be very high quality.

@sblackshear
Copy link
Collaborator Author

Interprocedural analysis something we have actively avoided for the source language. These sorts of passes are slow and brittle, and tend to lead to rather confusing error messages. They are really good for static analysis type checks, since if it is confusing or inaccurate, they can be ignored (which might certainly be true for certain "linter" style pases).

Definitely agree with avoiding this in the core analyses performed by the compiler. But for linters (as we're discussing in this task), I think we want to give the linter writer as much power/flexibility as we can. As you say, linters can be ignored, so there's no problem if someone writes a slow or flaky one.

@tnowacki
Copy link
Contributor

This isn't for "general" linters though. This is things like "do my key structs have ID fields?" and "Do my entry points have the correct signature?" These can't be ignored because the Sui bytecode verifier will yell at you.

I would consider them more "Move type system extensions" rather than the sort of checks you would find in a tool like clippy

@tnowacki
Copy link
Contributor

I believe Move models produced from source have very detailed source context information

And I would hope so as they are strictly a subset of the information in the AST ;)

@sblackshear
Copy link
Collaborator Author

I guess my question is: what would go wrong if we implemented our Sui linters as move-model-powered linters? I ask this because I would rather have one mechanism for extending the Move type system/analyzing Move code that we put a lot of work into instead of spreading our effort across two ways of doing this (at least at this point).

@awelc
Copy link
Contributor

awelc commented Apr 18, 2022

This isn't for "general" linters though. This is things like "do my key structs have ID fields?" and "Do my entry points have the correct signature?" These can't be ignored because the Sui bytecode verifier will yell at you.

I think we need both. The idea was to have a separate linting tool that could have different "severity" levels.

The "Move type system extensions" linting should probably be directly integrated with the system so that these checks cannot be bypassed.

@sblackshear
Copy link
Collaborator Author

I am certainly not opposed to having both. I think this is just a sequencing/prioritization question.

@awelc
Copy link
Contributor

awelc commented Apr 18, 2022

I am certainly not opposed to having both. I think this is just a sequencing/prioritization question.

Agreed. They could also reuse at least part of the common infrastructure, so it would be nice (though not strictly necessary) to agree on the infrastructure that could work for both linter types.

@tnowacki
Copy link
Contributor

I guess my question is: what would go wrong if we implemented our Sui linters as move-model-powered linters? I ask this because I would rather have one mechanism for extending the Move type system/analyzing Move code that we put a lot of work into instead of spreading our effort across two ways of doing this (at least at this point).

I would rather never extend the move-model unless you want to work over the bytecode or want to do an inter-procedural analysis, so if you want one mechanism, use the ASTs.
(I hope this will also be easier for @awelc, as he will need the ASTs for any vs code work)

I think we need both. The idea was to have a separate linting tool that could have different "severity" levels. The "Move type system extensions" linting should probably be directly integrated with the system so that these checks cannot be bypassed.

Already doable in the source languages diagnostics, you can even have non-skippable errors that don't stop compilation

@sblackshear
Copy link
Collaborator Author

I would rather never extend the move-model

This does not answer my question of what would go wrong--I would like to understand why you are so opposed to extending it :).

@tnowacki
Copy link
Contributor

(to be clear by extending I mean building a thing on top of it)

I oppose it because I just could not imagine a linter framework that does not want the source expressions. Thats almost literally all of the linter cases (at least for clippy), "You had XYZ expression. ABC expression is cleaner". The move-model does not contain information about source language expressions outside of whats in the source map (mapping from bytecode instruction to source file location)

Similarly for that reason, it is literally impossible to use the move-model for IDE features like hover types or type-based autocomplete (which will be a thing when we eventually have receiver style syntax)


For the Sui type system cases:

  • "key" rules
  • Global storage banning
  • entry points
  • ID immutable
  • ID leak

The only one I think would produce equally readable errors are the "key" rules and maybe global storage banning.

For entry points, the type checker and the typed AST has a lot of built up infra for good errors when comparing types or for checking ability constraints (as you might imagine lol).

For the ID checks, you will be able to do these checks at the bytecode level, but you will lose a lot of structured information from the source language.
I couldn't even imagine how it would look for something control flow sensitive like the ID leak. It is hard enough getting good error messages for the abs int based checks (just because of all the control flow paths).
One really small example for the ID immutable one is the difference between: x.f.g.id = new_id or let ref = &mut x.f.g.id.
(0) You will want to case these separately for error message verbiage
(1) The former syntax doesn't even really exist at the bytecode level, and its going to generate a ton of temporary variables in order to compile that. I don't know how/if that will affect the mapping. If I had to guess it shouldn't be too bad, but it might make it just oddly partial
(2) For good error messages you don't just want to say "dont do this right here", especially in a language in with type inference. You want to say "don't do this", "this is why I think this field has this type", and "this is where this field was declared with this restriction"


As a further side note, Brian and I talked about trying to fix the move-model a bit before he left. Obviously we never started on it in much detail, but part of the problem is that the move-model is a lot of code for not doing much

  • Gather meta data needed for specs
  • Intern a bunch of data

So much of the complexity in the move-model was maybe needed when it was written, but is outdated now.
The compiler already now interns all of the strings. Most of the metadata it tries to gather is also gathered by the compiler (the move-model just splits off shortly after parsing the AST). and other bits of the meta data could rely on the source map a bit more heavily.

In short, something like the move-model probably should exist, but hopefully something that doesn't have its own custom interning or location data that is at best out of step with, or at worst duplicates, the tooling in the package system, compiler, IR, unit tester, etc.

@sblackshear
Copy link
Collaborator Author

Got it + makes sense. The goal of this task is to do better than raising an opaque verifier error for the Sui-specific passes, and it sounds like we won't be able to do much better unless we work inside the compiler.

I would still be hesitant to go too far down the path of providing a generic linter framework inside the compiler when it will be limited w.r.t the kinds of analyses it can do compared to move-model (even though move-model will inevitably fall short in the error message department for some cases). But it doesn't sound like that is the plan (we're focusing on trying to make the Sui verifiers understandable at the source level in the best way possible!), so I think we're aligned.

@tnowacki
Copy link
Contributor

tnowacki commented Jun 3, 2022

Adding to this, we should also consider linting against useless transfer wrappers. But this might be a bit tricky to do

@bholc646 bholc646 added the move label Jul 25, 2022
mwtian pushed a commit that referenced this issue Sep 12, 2022
More benchmark results
mwtian pushed a commit to mwtian/sui that referenced this issue Sep 29, 2022
brson pushed a commit to brson/sui that referenced this issue Apr 30, 2024
Tests can now add compute_unit_limit as a config.

Followups:

    Remove DEFAULT_EXECUTION_BOUND_SOLANA_STDLIB_TEST added in MystenLabs#18
    Parse GAS_BUDGET_HEAP_SIZE
    Parse GAS_BUDGET_MAX_CALL_DEPTH

Fixes: MystenLabs#20
brson pushed a commit to brson/sui that referenced this issue Apr 30, 2024
Tests can now add compute_unit_limit as a config.

Followups
- Remove `DEFAULT_EXECUTION_BOUND_SOLANA_STDLIB_TEST` added in MystenLabs#18
- Parse GAS_BUDGET_HEAP_SIZE
- Parse GAS_BUDGET_MAX_CALL_DEPTH

Fixes: MystenLabs#20
brson pushed a commit to brson/sui that referenced this issue Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
move Priority: Medium A nice to have feature or annoying bug, non-blocking and no delays expected if we punt on it Triaged Issue is associated with a milestone and has at least 1 label of each: status, type, & priority Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants