[Move devex] Move source code linters to explain language subset #20

sblackshear · 2021-12-02T19:51:52Z

FastX needs a number of custom bytecode verifier passes (see #18, #19, #16, #15) to keep users in the no-shared-memory subset of Move. This is sufficient for on-chain safety, but for an end-user trying to write code in the Move source language, finding out that you've strayed from the subset via a bytecode verifier error at module publish time is not a great experience.

We should address this is by adding custom linters that enforce the subset rules at the source level. See this for some examples in the existing compiler. The code for this linters should live in the fastNFT repo. We will probably need to extend the upstream compiler to make it easy to plug in custom linters.

awelc · 2022-02-17T23:23:43Z

@sblackshear - this looks like a useful thing to have for GDC and Move seems to have some support for (custom?) linting already? I don't want to spend too much time looking into it if you think it's too much to chew for me at this point, which is very likely (I doubt I could write a Move's version of clippy from scratch including Move-side support for custom linting by the GDC deadline :-) ). Still, perhaps you can share some opinion and/or give some advice here?

awelc · 2022-04-12T00:11:45Z

Here are some ideas on how to get this to work, after having some conversations on the topic with @tnowacki.

The main goal is to (mostly) provide support for writing linters to blockchain (and Move infrastructure) developers, rather than to the end users (programmers targeting a given blockchain and its Move flavor). This is mostly pragmatic, as the latter would be preferred, but considering resources and time required to create a linting framework supporting some kind of declarative linter specification is a bit beyond our reach right now (and probably unnecessary).

The idea the is to provide a rather simple set of primitives over the existing compiler that would support writing linting checks. We could provide a fairly "light" linting CLI that would offer the following:

expose one of the existing compiler ASTs for traversal - a given lint check would have to "manually" traverse the AST (we could provide some abstraction there, such as visitor of some kind) and check for a given property (this would be roughly equivalent to writing linters for Go using go/parser and go/ast packages as described here)
create a "driver" that would abstract a linter as struct that embeds a function pointer implementing a given AST traversal (plus some metadata if necessary), takes as an input a vector of such structs, and runs these in parallel (this would be roughly equivalent to writing linters for Go using go/analysis package as described here)
integrate diagnostics implementation from the Move repo to simplify generation of linter messages during traversal (these should also be generated at different levels, starting with, say, deny and warn).

Most of this code would live in the Move repo, including linters for core Move. A linter "client" (e.g. Sui) would use the driver implementation, optionally some/all core Move linters, and build a custom CLI tool with additional "abstracted" linters that would specify customized checks for a given client.

tnowacki · 2022-04-12T01:42:37Z

I think most of the linting can be done over the typed AST, but I just remembered that there is this ObjectID escape analysis that is super general. It will need the abstract interpreter. Luckily exposing just that pass should be really really easy, as the abstract interpreter is already setup with a very general trait system

tnowacki · 2022-04-12T01:44:04Z

I think an expression visitor could be a helpful utility, and would be helpful for writing the check that bans global storage operations. Writing that visitor is non-trivial due to the recursive structure.

Everything else for the typed AST should be easy enough

tnowacki · 2022-04-12T01:44:57Z

integrate diagnostics implementation from the Move repo to simplify generation of linter messages during traversal (these should also be generated at different levels, starting with, say, deny and warn).

This one will be the most annoying for the compiler in its current architecture. There is this trait though for the diagnostic codes I set up, maybe we could make one just for sui things

awelc · 2022-04-12T04:01:23Z

I think most of the linting can be done over the typed AST, but I just remembered that there is this ObjectID escape analysis that is super general. It will need the abstract interpreter. Luckily exposing just that pass should be really really easy, as the abstract interpreter is already setup with a very general trait system

We should talk about this in more detail as I am not sure if ObjectID escape analysis comes before or after the typed AST is produced, nor why we necessarily have to care about it...

awelc · 2022-04-12T04:03:39Z

This one will be the most annoying for the compiler in its current architecture. There is this trait though for the diagnostic codes I set up, maybe we could make one just for sui things

I thought it would be pretty non-controversial in that we could re-use the diagnostics facility to easily generate error messages. Something like instantiating the Diagnostics struct with the location, error message etc. and re-using the existing facility to format and display it. If it's problematic, we could adopt a different (simpler?) way of reporting linting errors.

sblackshear · 2022-04-18T18:44:03Z

I think we should consider using move-model for linters. It already exists, has hooks for reporting source diagnostics, and has an IR that has been used to write several linter-like tools (e.g., the escape analysis for the robust safety paper, some lints on prover specs) and is pretty nice to work with. Importantly, it supports interprocedural analysis, which I think will be important for a number of linters, but will be tougher to do inside the compiler because it heavily leverages modularity (as it should).

tnowacki · 2022-04-18T19:37:13Z

The move model might be okay for some passes but really lacks for anything detailed.

The biggest thing here is it does not have any source level expressions for function definitions, just compiled bytecode.
This is going to lead to less than idea error messages in most places.

Importantly, it supports interprocedural analysis, which I think will be important for a number of linters

Interprocedural analysis something we have actively avoided for the source language. These sorts of passes are slow and brittle, and tend to lead to rather confusing error messages. They are really good for static analysis type checks, since if it is confusing or inaccurate, they can be ignored (which might certainly be true for certain "linter" style pases).

But if you want to use this for anything that is more a "type system" style pass, you really want the compiler and all of its tooling.

tnowacki · 2022-04-18T19:43:41Z

@awelc, sorry I missed your comments

We should talk about this in more detail as I am not sure if ObjectID escape analysis comes before or after the typed AST is produced, nor why we necessarily have to care about it...

The current rules in the Sui verifier I believe are control flow sensitive. So you will need to use the CFGIR AST, and the abstract interpreter.
Alternatively, we can come up with a rule that is stricter than the verifier and use that.

I thought it would be pretty non-controversial in that we could re-use the diagnostics facility to easily generate error messages. Something like instantiating the Diagnostics struct with the location, error message etc. and re-using the existing facility to format and display it. If it's problematic, we could adopt a different (simpler?) way of reporting linting errors.

Now that I think about it, it might not be that bad? We just have the Sui linter have a diagnostic code that implements DiagnosticCode
The only blocker right now is the Category for the diagnostic is an enum. We will need to make it just some other trait+associated type or something so the Sui checks can have one

sblackshear · 2022-04-18T21:02:22Z

The biggest thing here is it does not have any source level expressions for function definitions, just compiled bytecode.
This is going to lead to less than idea error messages in most places.

I believe Move models produced from source have very detailed source context information. I have used this feature in diagnostics, anyway. I'm sure there are cases where it falls short of what you can do in the compiler, but my expectation is that the error messages we can produce will still be very high quality.

sblackshear · 2022-04-18T21:05:08Z

Interprocedural analysis something we have actively avoided for the source language. These sorts of passes are slow and brittle, and tend to lead to rather confusing error messages. They are really good for static analysis type checks, since if it is confusing or inaccurate, they can be ignored (which might certainly be true for certain "linter" style pases).

Definitely agree with avoiding this in the core analyses performed by the compiler. But for linters (as we're discussing in this task), I think we want to give the linter writer as much power/flexibility as we can. As you say, linters can be ignored, so there's no problem if someone writes a slow or flaky one.

tnowacki · 2022-04-18T21:17:16Z

This isn't for "general" linters though. This is things like "do my key structs have ID fields?" and "Do my entry points have the correct signature?" These can't be ignored because the Sui bytecode verifier will yell at you.

I would consider them more "Move type system extensions" rather than the sort of checks you would find in a tool like clippy

tnowacki · 2022-04-18T21:18:02Z

I believe Move models produced from source have very detailed source context information

And I would hope so as they are strictly a subset of the information in the AST ;)

sblackshear · 2022-04-18T21:21:09Z

I guess my question is: what would go wrong if we implemented our Sui linters as move-model-powered linters? I ask this because I would rather have one mechanism for extending the Move type system/analyzing Move code that we put a lot of work into instead of spreading our effort across two ways of doing this (at least at this point).

awelc · 2022-04-18T21:23:24Z

This isn't for "general" linters though. This is things like "do my key structs have ID fields?" and "Do my entry points have the correct signature?" These can't be ignored because the Sui bytecode verifier will yell at you.

I think we need both. The idea was to have a separate linting tool that could have different "severity" levels.

The "Move type system extensions" linting should probably be directly integrated with the system so that these checks cannot be bypassed.

sblackshear · 2022-04-18T21:24:46Z

I am certainly not opposed to having both. I think this is just a sequencing/prioritization question.

awelc · 2022-04-18T21:27:08Z

I am certainly not opposed to having both. I think this is just a sequencing/prioritization question.

Agreed. They could also reuse at least part of the common infrastructure, so it would be nice (though not strictly necessary) to agree on the infrastructure that could work for both linter types.

tnowacki · 2022-04-18T21:28:07Z

I guess my question is: what would go wrong if we implemented our Sui linters as move-model-powered linters? I ask this because I would rather have one mechanism for extending the Move type system/analyzing Move code that we put a lot of work into instead of spreading our effort across two ways of doing this (at least at this point).

I would rather never extend the move-model unless you want to work over the bytecode or want to do an inter-procedural analysis, so if you want one mechanism, use the ASTs.
(I hope this will also be easier for @awelc, as he will need the ASTs for any vs code work)

I think we need both. The idea was to have a separate linting tool that could have different "severity" levels. The "Move type system extensions" linting should probably be directly integrated with the system so that these checks cannot be bypassed.

Already doable in the source languages diagnostics, you can even have non-skippable errors that don't stop compilation

sblackshear · 2022-04-18T21:34:22Z

I would rather never extend the move-model

This does not answer my question of what would go wrong--I would like to understand why you are so opposed to extending it :).

tnowacki · 2022-04-18T22:14:54Z

(to be clear by extending I mean building a thing on top of it)

I oppose it because I just could not imagine a linter framework that does not want the source expressions. Thats almost literally all of the linter cases (at least for clippy), "You had XYZ expression. ABC expression is cleaner". The move-model does not contain information about source language expressions outside of whats in the source map (mapping from bytecode instruction to source file location)

Similarly for that reason, it is literally impossible to use the move-model for IDE features like hover types or type-based autocomplete (which will be a thing when we eventually have receiver style syntax)

For the Sui type system cases:

"key" rules
Global storage banning
entry points
ID immutable
ID leak

The only one I think would produce equally readable errors are the "key" rules and maybe global storage banning.

For entry points, the type checker and the typed AST has a lot of built up infra for good errors when comparing types or for checking ability constraints (as you might imagine lol).

For the ID checks, you will be able to do these checks at the bytecode level, but you will lose a lot of structured information from the source language.
I couldn't even imagine how it would look for something control flow sensitive like the ID leak. It is hard enough getting good error messages for the abs int based checks (just because of all the control flow paths).
One really small example for the ID immutable one is the difference between: x.f.g.id = new_id or let ref = &mut x.f.g.id.
(0) You will want to case these separately for error message verbiage
(1) The former syntax doesn't even really exist at the bytecode level, and its going to generate a ton of temporary variables in order to compile that. I don't know how/if that will affect the mapping. If I had to guess it shouldn't be too bad, but it might make it just oddly partial
(2) For good error messages you don't just want to say "dont do this right here", especially in a language in with type inference. You want to say "don't do this", "this is why I think this field has this type", and "this is where this field was declared with this restriction"

As a further side note, Brian and I talked about trying to fix the move-model a bit before he left. Obviously we never started on it in much detail, but part of the problem is that the move-model is a lot of code for not doing much

Gather meta data needed for specs
Intern a bunch of data

So much of the complexity in the move-model was maybe needed when it was written, but is outdated now.
The compiler already now interns all of the strings. Most of the metadata it tries to gather is also gathered by the compiler (the move-model just splits off shortly after parsing the AST). and other bits of the meta data could rely on the source map a bit more heavily.

In short, something like the move-model probably should exist, but hopefully something that doesn't have its own custom interning or location data that is at best out of step with, or at worst duplicates, the tooling in the package system, compiler, IR, unit tester, etc.

sblackshear · 2022-04-18T22:21:57Z

Got it + makes sense. The goal of this task is to do better than raising an opaque verifier error for the Sui-specific passes, and it sounds like we won't be able to do much better unless we work inside the compiler.

I would still be hesitant to go too far down the path of providing a generic linter framework inside the compiler when it will be limited w.r.t the kinds of analyses it can do compared to move-model (even though move-model will inevitably fall short in the error message department for some cases). But it doesn't sound like that is the plan (we're focusing on trying to make the Sui verifiers understandable at the source level in the best way possible!), so I think we're aligned.

tnowacki · 2022-06-03T21:43:54Z

Adding to this, we should also consider linting against useless transfer wrappers. But this might be a bit tricky to do

More benchmark results

Tests can now add compute_unit_limit as a config. Followups: Remove DEFAULT_EXECUTION_BOUND_SOLANA_STDLIB_TEST added in MystenLabs#18 Parse GAS_BUDGET_HEAP_SIZE Parse GAS_BUDGET_MAX_CALL_DEPTH Fixes: MystenLabs#20

Tests can now add compute_unit_limit as a config. Followups - Remove `DEFAULT_EXECUTION_BOUND_SOLANA_STDLIB_TEST` added in MystenLabs#18 - Parse GAS_BUDGET_HEAP_SIZE - Parse GAS_BUDGET_MAX_CALL_DEPTH Fixes: MystenLabs#20

…own gas budget parameters (MystenLabs#26) Follow up from MystenLabs#24 Fixes: MystenLabs#20

lxfind added this to the GDC milestone Jan 21, 2022

awelc added the move label Mar 28, 2022

awelc changed the title ~~[fastx devex] Move source code linters to explain language subset~~ [Move devex] Move source code linters to explain language subset Mar 28, 2022

bholc646 removed this from the GDC milestone Apr 6, 2022

bholc646 removed the move label Apr 6, 2022

awelc added Type: Enhancement New feature or request Priority: Medium A nice to have feature or annoying bug, non-blocking and no delays expected if we punt on it Triaged Issue is associated with a milestone and has at least 1 label of each: status, type, & priority labels Apr 14, 2022

awelc added this to the Mainnet milestone Apr 14, 2022

bholc646 added the move label Jul 25, 2022

mwtian pushed a commit that referenced this issue Sep 12, 2022

Merge pull request #20 from asonnino/master

b569e17

More benchmark results

mwtian pushed a commit to mwtian/sui that referenced this issue Sep 29, 2022

Merge pull request MystenLabs#20 from asonnino/master

8ead047

More benchmark results

georgescharlesbrain mentioned this issue Jan 30, 2023

Sui doc content request #7250

Closed

brson pushed a commit to brson/sui that referenced this issue Apr 30, 2024

Remove hard coded DEFAULT_EXECUTION_BOUND as tests can provide their …

855fd31

…own gas budget parameters (MystenLabs#26) Follow up from MystenLabs#24 Fixes: MystenLabs#20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Move devex] Move source code linters to explain language subset #20

[Move devex] Move source code linters to explain language subset #20

sblackshear commented Dec 2, 2021

awelc commented Feb 17, 2022

awelc commented Apr 12, 2022

tnowacki commented Apr 12, 2022

tnowacki commented Apr 12, 2022

tnowacki commented Apr 12, 2022

awelc commented Apr 12, 2022

awelc commented Apr 12, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

awelc commented Apr 18, 2022

sblackshear commented Apr 18, 2022

awelc commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Jun 3, 2022

[Move devex] Move source code linters to explain language subset #20

[Move devex] Move source code linters to explain language subset #20

Comments

sblackshear commented Dec 2, 2021

awelc commented Feb 17, 2022

awelc commented Apr 12, 2022

tnowacki commented Apr 12, 2022

tnowacki commented Apr 12, 2022

tnowacki commented Apr 12, 2022

awelc commented Apr 12, 2022

awelc commented Apr 12, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

awelc commented Apr 18, 2022

sblackshear commented Apr 18, 2022

awelc commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Apr 18, 2022

sblackshear commented Apr 18, 2022

tnowacki commented Jun 3, 2022