Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal] Don't download Go third-party dependencies multiple times (Cherry-pick of #13352) #13378

Merged
merged 1 commit into from
Oct 27, 2021

Conversation

Eric-Arellano
Copy link
Contributor

Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

  1. determining GoModInfo (once per go.mod)
  2. AllDownloadedModules (once per go.mod)
  3. Determining metadata for each third-party package (once per third-party module)
  4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per go.mod.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop cd-ing into its downloaded directory and running go list directly in it, using its own go.mod and go.sum. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run go list '...' to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant .go files from all of the downloaded GOPATH, i.e. all the downloaded modules. For compilation, all we need is the .go files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole GOPATH into the chroot. (This is really slow! We need something like #12716 to fix this.)

Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs

Fixing only third-party analysis:

❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs

Fixing everything:

❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs

[ci skip-rust]
[ci skip-build-wheels]

…antsbuild#13352)

Turns out that pantsbuild#13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

1. determining `GoModInfo` (once per `go.mod`)
2. `AllDownloadedModules` (once per `go.mod`)
3.  Determining metadata for each third-party package (once per third-party module)
4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per `go.mod`.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like pantsbuild#13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like pantsbuild#12716 to fix this.)

## Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs
```

Fixing only third-party analysis:

```
❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs
```

Fixing everything:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs
```

[ci skip-rust]
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano merged commit 56c575a into pantsbuild:2.8.x Oct 27, 2021
@Eric-Arellano Eric-Arellano deleted the cp-go-third-party branch October 27, 2021 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants