Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework oxc_prettier #5068

Open
Boshen opened this issue Aug 22, 2024 · 35 comments
Open

Rework oxc_prettier #5068

Boshen opened this issue Aug 22, 2024 · 35 comments
Assignees
Labels
A-prettier Area - Prettier C-enhancement Category - New feature or request E-Help Wanted Experience level - For the experienced collaborators

Comments

@Boshen
Copy link
Member

Boshen commented Aug 22, 2024

Note

@leaysgur is currently examining the following options:

  • 1: Implement the existing Prettier-based code
  • 2: Reuse the Biome code and start from scratch

For option 1, see #5068 (comment) for the rest of the details and feel free to contribute.

For option 2, stay tuned for more details. 🚧

Original description

crates/oxc_prettier was my attempt at the prettier bounty.

I thought I could finish it in time, except the fact that I rushed too quickly without looking at all the requirements ... It was too late when I got blocked by printing comments.

In order to rework oxc_prettier, we need to understand at least:

As for the infrastructure, we already have most of the code:

Feel free to remove everything and start from scratch, and copy over the format code https://github.com/oxc-project/oxc/tree/main/crates/oxc_prettier/src/format

@Boshen Boshen added C-enhancement Category - New feature or request A-prettier Area - Prettier E-Help Wanted Experience level - For the experienced collaborators labels Aug 22, 2024
@parkin-lin

This comment was marked as off-topic.

@DonIsaac

This comment was marked as off-topic.

@Boshen
Copy link
Member Author

Boshen commented Sep 4, 2024

@leaysgur is writing a series of articles in preparation of this task:


I'm also working on comment attachments to unblock prettier.

@Boshen Boshen pinned this issue Sep 4, 2024
@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@IWANABETHATGUY
Copy link
Contributor

IWANABETHATGUY commented Sep 7, 2024

For those who are interested in algorithms under the hood, prettier is based on https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf,
https://prettier.io/docs/en/technical-details

@leaysgur
Copy link
Contributor

It has been 3 weeks since I started reading the Prettier source code.
It's still far from being complete, but I'd like to leave some progress and summary here.

How to debug Prettier

There are 3 ways:

  • Playground
    • Enable "Debug" options in the sidebar
  • CLI
    • With --debug-* args
  • Node.js API
    • Under __debug exports

https://leaysgur.github.io/posts/2024/09/03/111109/

It is written in Japanese, but it is all code, so you can understand it. 😉

I also recommend to run node --inspect-brk the code with debugger and inspect it from Chrome DevTools.

How to handle comments

I will post some topics for discussion in a few days.

@leaysgur
Copy link
Contributor

How to handle comments

As you may know, Prettier's formatting process consists of roughly 3 phases:

  • P1: original text > AST
  • P2: AST > Doc
  • P3: Doc > formatted text

Comments are collected in P1 and used in P2.

In P1:

  • It simply parses to the AST with comments in the output
    • However, there are also some adjustments made to the AST nodes
    • Some parsers, such as Babel, already attach comments to nodes at this stage
      • However, Prettier does not use them
      • The reason is to support parsers other than Babel
  • As the final step of P1 (more technically, the beginning of P2), comments are classified and attached to nodes
    • First, it starts by finding nearby nodes for each comment
    • Based on that, it determines the placement(ownLine, endOfLine, remaining) from the lines before and after each comment
    • Then, it handles about 30(!) known special patterns for each placement
    • Finally, it finishes using unique tie-breaking algorithm

As a result, some AST nodes have comments property with array of Comment extended with leading, trailing and few more props.

In P2 (I haven’t read the code in detail here yet),

  • When each node is converted into a Doc, comments are also converted into Docs
    • Therefore, how they are output seems to have already been decided in P1

In OXC, part of the necessary information is already implemented and can be obtained. / #5785

However, just like with Babel, that information may be different from what Prettier requires...


So, I think I’ve generally understood "what" Prettier is doing.

However, as for "why" Prettier does it that way, I can only say it’s because that’s Prettier’s opinion.

Incidentally, there seem to be at least around 120 issues related to JS/TS at the moment, but

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug

about 50 of them are related to comments, with some remaining unresolved since as far back as 2017.

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug+label%3Aarea%3Acomments

@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@Boshen

This comment has been minimized.

@leaysgur

This comment has been minimized.

@magic-akari
Copy link
Contributor

For the long run, I envision a more configurable and less opinionated formatter.

Does this mean that oxc_prettier will provide a wide range of configurable options, and offer a preset called prettier to match with prettier?

@leaysgur
Copy link
Contributor

How to handle comments

(Follow up of #5068 (comment))

As I posted above, comments are collected and attached to AST nodes in P1.
Then in P2, comments are printed as Doc along with other AST nodes.

Most comments are printed with their attached nodes like:

[leadingCommentsDoc, nodeDoc]
// or
[nodeDoc, trailingCommentsDoc]

https://github.com/prettier/prettier/blob/e3b8a16c7fa247db02c483fb86fc2049d45763b8/src/main/ast-to-doc.js#L128-L138

But the rest of the comments are handled as needed.

  • Dangling comments
  • Not dangling(leading or trailing) but need special treatment

There are about 40 files for printing ESTree AST to Doc.

https://github.com/prettier/prettier/tree/main/src/language-js/print

And 15 files of them print comments on demand.

❯ rg 'print(Dangling)?Comments(Separately)?' -l src/language-js/print
estree.js
class.js
type-annotation.js
function-parameters.js
mapped-type.js
component.js
module.js
function.js
ternary.js
array.js
binaryish.js
property.js
call-arguments.js
block.js
jsx.js
arrow-function.js
object.js
member-chain.js
type-parameters.js

@Boshen

This comment has been minimized.

@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@srijan-paul

This comment has been minimized.

@pumano

This comment has been minimized.

@YurySolovyov

This comment has been minimized.

@leaysgur

This comment has been minimized.

@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@leaysgur

This comment has been minimized.

@Boshen
Copy link
Member Author

Boshen commented Nov 25, 2024

@leaysgur will lead this project, and is free to make any changes to the oxc_prettier crate.

Boshen pushed a commit that referenced this issue Jan 23, 2025
Part of #5068 

- ExportAllDeclaration
- ExportNamedDeclaration
- ExportDefaultDeclaration
- (ExportNamespaceSpecifier = ExportAllDeclaration+exported)

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Boshen pushed a commit that referenced this issue Jan 28, 2025
Part of #5068 

Mostly trivial this time. 😪
Boshen pushed a commit that referenced this issue Jan 31, 2025
Part of #5068 

- MemberExpression
  - Computed
  - Static, PrivateField
Boshen pushed a commit that referenced this issue Feb 3, 2025
Part of #5068 

- ObjectExpression, ObjectPattern
- ObjectProperty
- ObjectMethod
Boshen pushed a commit that referenced this issue Feb 4, 2025
Part of #5068 

Now all AST nodes are once verified. ✌🏻
Boshen pushed a commit that referenced this issue Feb 5, 2025
Part of #5068 

- VariableDeclaration
- AwaitExpression
- SequenceExpression
- (ParenthesizedExpression)
Boshen pushed a commit that referenced this issue Feb 6, 2025
Part of #5068 

Verified and completed `print/array.rs`, except for comment handling.
Boshen pushed a commit that referenced this issue Feb 6, 2025
Part of #5068 

Cosmetic changes only.
Boshen pushed a commit that referenced this issue Feb 10, 2025
Part of #5068 

Update `doc.to_string()` output to `Prettier.__debug.formatDoc()`
compatible Doc AST json format.

```sh
# Usecase
cargo run -p oxc_prettier --example prettier --quiet -- --debug | jq .

# Advanced
cargo run -p oxc_prettier --example prettier --quiet -- --debug | pbcopy
# Open Prettier playground, select doc-explorer as parser option, then paste as input!
```
Boshen pushed a commit that referenced this issue Feb 12, 2025
Part of #5068 

Full rewrite `print/object`, slight improvement. 😇
Boshen pushed a commit that referenced this issue Feb 14, 2025
Part of #5068 

Support `objectWrap` option added in v3.5.0.

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Boshen pushed a commit that referenced this issue Feb 27, 2025
Part of #5068 

Verify and refactor `print/*`.

- Verify the main printing logic w/ refactoring
  - The `propagate_breaks()` needs to be reworked soon
- Test coverage drops slightly because of this, but the root cause is
the original `Doc` structure
- Export `print_doc_to_string` as a function
- Some functions in `format/*` require the printed result to determine
formatting...
- Update the `Doc` macro comments
- Properly escape strings in `Doc` JSON
@MichaReiser
Copy link

MichaReiser commented Feb 27, 2025

I'm probably a bit late here but what we did for ruff (Python formatter) was to vendor biome_formatter, adjust it to use our own AST (you don't want to convert between different representations for best performance). We, obviously, couldn't reuse any of the formatting logic because it's Python ;)

I also think that building/vendoring is the right choice. It's a fundamental piece of your formatter and you want to have control over it.

@leaysgur
Copy link
Contributor

leaysgur commented Feb 28, 2025

@MichaReiser Thanks for the very useful information!

It's not too late at all. I'm still asking myself every day whether to base on Prettier or biome_formatter. 😅

So far, I mainly checked if it is possible to use biome_formatter directly as an external crate instead of vendor(fork)ing it, and concluded that it seems not to be possible.

On top of that,

  • fork biome_formatter and start from scratch
  • or continue to improve the existing Prettier-based code

To determine which directions is more promising, I spent time to diff the current code with Prettier, examine what is missing while picking low-hanging fruit. (And it's been two months already...!)

Even though, if I want to base it on biome_formatter, just getting it to work seems to take a lot of time due to my limited time and knowledge. 🙈

The situation might be different for JS(+JSX+TS) and Python, but do you think it would also be better for OXC to be based on biome_formatter like ruff? Do you have any advice? 👀

@jycouet
Copy link

jycouet commented Feb 28, 2025

Would love to try the prototype from npm 😇

@MichaReiser
Copy link

It's not too late at all. I'm still asking myself every day whether to base on Prettier or biome_formatter. 😅

Yeah, it's a hard decision!

The situation might be different for JS(+JSX+TS) and Python, but do you think it would also be better for OXC to be based on biome_formatter like ruff? Do you have any advice? 👀

I can tell you why we decided to use biome_formatter. Ultimately, that's a decision you have to make for your project and I'm probably biased because I wrote most of biomejs_formatter.

  • Using biomejs_formatter gave us a huge head start because it gave us most infrastructure for free. The only thing we had to do was fix the compile errors (because of CST dependencies).
  • I authored most of biomejs_formatter. I'm obviously happy with most its design decisions and understand it well. You might not agree with some of my design decisions or have ideas to make the formatter even faster.
  • biomejs_formatter reimplements Prettier very closely (the semantics are 99.99% the same, it's mainly that I named things differently and that it's Rusty)

Our approach was to copy over biomejs_formatter and then rip out/replace everything that didn't fit well into our architecture. That worked fairly well. We ultimately also ended up introducing new IR elements that allow us to format some common Python syntax more performantly.

You could also take a middle ground and copy code more incrementally. That could also be a good learning experience. Start with the Printer and the IR (which is, by the way, the most important foundation already). Then go ahead and change the IR however you like or make changes to the Printer. The advantage of this is that you already get most of the IR semantics right (it took me a long time to reverse engineer all the semantics!) It also gives you the freedom to disagree with all my naming decisions and rename the IR elements or even decide to use an entirely different representation (struct of arrays?). You could even decide that the Printer is just not what you want and that means you have your decision not to use biomejs_formatter.

From there, you can move on and vendor more of biomejs (or decide to go different ways). Vendor the Format trait or decide that you want to use standalone functions with the signature fn format_node(node: &Node, context: &Context) -> FormatResult. Rip out the invalid-syntax error if that's something you don't want to support... You can also decide that you aren't interested in having two separate crates (biomejs_formatter and biomejs_javascript_formatter). This removes the need for all those FormatWith traits because the orphan rules no longer apply (https://github.com/astral-sh/ruff/blob/0945803427ef971724ce29565cffb58ffc06166c/crates/ruff_python_formatter/orphan_rules_in_the_formatter.svg#L16)

You can then move on to start investigating comments. Biome comes with a CommentsMap, but maybe that's not needed for OXC or there's a better representation for OXC.

I hope you find some of this helpful. Overall, I encourage you to take what you find useful and throw away/rewrite everything that you don't. You can also take a look at ruff's formatter to see how we changed biomejs_formatter to work with an AST instead of CST. Let me know if you have any questions (here or in the discord formatter channel).

@leaysgur
Copy link
Contributor

Thank you so much, senpai! 🥹 Your help is extremely valuable, and I really appreciate it.

It was a nice discovery for me that there is an example of using the biome_formatter code for a custom AST rather than one based on the Biome CST.

Now that I have a general idea of the approach based on Prettier, I will try to start by copying the biome_formatter code next time.

Thanks again~!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-prettier Area - Prettier C-enhancement Category - New feature or request E-Help Wanted Experience level - For the experienced collaborators
Projects
None yet
Development

No branches or pull requests