Implement support for out-of-process compilation #1536

Xanewok · 2019-08-06T22:36:04Z

This is quite a lengthy patch, but the gist of it is as follows:

rls-ipc crate is introduced which acts as the IPC interface along with a server/client implementation
rls-rustc is enhanced with optional support for the IPC
RLS can optionally support it via setting RLS_OUT_OF_PROCESS env var (like rls-rustc it needs to be compiled ipc feature)

The IPC is async JSON-RPC running on Tokio using parity-tokio-ipc (UDS on unices and named pipes on Windows)

Tokio because I wanted to more or less easily come up with a PoC
RPC because I needed a request->response model for VFS IPC function calls
uds/pipes because it's somewhat cross-platform and we don't have to worry about rustc potentially polluting stdio (maybe just capturing the output in run_compiler would be enough?)

However, the implementation is far from efficient - it currently starts a thread per requested compilation, which in turn starts a single-threaded async runtime to drive the IPC server for a given compilation.

I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead.

While this gets rid of the global environment lock on each (previously) in-process crate compilation, what still needs to be addressed is the sequential compilation of cached build plan for this implementation to truly benefit from the unlocked parallelization potential.

I did some rough test runs (~5) and on a warm cache had the following results:

integration test suite (release) 3.6 +- 0.2s (in-process) vs 3.8 +- 0.3s (out-of-process)
rustfmt master whitespace change (release) 6.4 +- 0.2s (in-process) vs 6.6 +- 0.3s (out-of-process)

which at least initially confirms that the performance overhead is somewhat negligible if we can really parallelize the work and leverage process isolation for increased stability.

cc #1307

(I'll squash the commits in the final merge, 30+ commits is a tad too much 😅 )

If possible I'd like to get more eyes on the patch to see if it's a good approach and what might be directly improved:

@matklad for potentially shared rustc-with-patched-filesystem
@alexheretic for the RLS/implementation itself
@alexcrichton @nrc do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow? For now we just rolled our own linear queue here because we didn't need much more but maybe it might be worthwhile to extract the pure execution bits somehow?

matklad · 2019-08-07T08:38:07Z

Excellent work, I am super excited about this approach!

I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead.

My gut feeling would be that optimizing here is not really required: we shouldn't be running many compilations concurrently anyway. I am more worried about depending on tokio and the rest of async ecosystem: it centrally is a good choice for getting the stuff up an running, but, longer term, I think that we are unfortunately buying-in into a lot of complexity here. An alternative would be to roll-our-own blocking json-per-line API on top of mkfifo/windows-named-pipe.

do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow?

That's perhaps a naive question, but whey we can't do env RLS_IPC_ENDPOINT=... cargo check? Ie, shell out to Cargo, setting up the required env-vars in such a way that Cargo calls our shim IPC-enabled rustc?

matklad · 2019-08-08T06:59:59Z

Oh, this looks interesting: https://users.rust-lang.org/t/recommended-way-of-ipc-in-rust/31116/9?u=matklad

…

On Wednesday, 7 August 2019, Igor Matuszewski ***@***.***> wrote: This is quite a lengthy patch, but the gist of it is as follows: - rls-ipc crate is introduced which acts as the IPC interface along with a server/client implementation - rls-rustc is enhanced with optional support for the IPC - RLS can optionally support it via setting RLS_OUT_OF_PROCESS env var (like rls-rustc it needs to be compiled ipc feature) The IPC is async JSON-RPC running on Tokio using parity-tokio-ipc (UDS on unices and named pipes on Windows) - Tokio because I wanted to more or less easily come up with a PoC - RPC because I needed a request->response model for VFS IPC function calls - uds/pipes because it's somewhat cross-platform and we don't have to worry about rustc potentially polluting stdio (maybe just capturing the output in run_compiler would be enough?) However, the implementation is far from efficient - it currently starts a thread per requested compilation, which in turn starts a single-threaded async runtime to drive the IPC server for a given compilation. I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead. While this gets rid of the global environment lock on each (previously) in-process crate compilation, what still needs to be addressed is the sequential compilation <https://github.com/rust-lang/rls/blob/35eba227650eee482bedac7d691a69a8487b2135/rls/src/build/plan.rs#L122-L124> of cached build plan for this implementation to truly benefit from the unlocked parallelization potential. I did some rough test runs (~5) and on a warm cache had the following results: - integration test suite (release) 3.6 +- 0.2s (in-process) vs 3.8 +- 0.3s (out-of-process) - rustfmt master whitespace change (release) 6.4 +- 0.2s (in-process) vs 6.6 +- 0.3s (out-of-process) which at least initially confirms that the performance overhead is somewhat negligible if we can really parallelize the work and leverage process isolation for increased stability. cc #1307 <#1307> (I'll squash the commits in the final merge, 30+ commits is a tad too much 😅 ) If possible I'd like to get more eyes on the patch to see if it's a good approach and what might be directly improved: - @matklad <https://github.com/matklad> for potentially shared rustc-with-patched-filesystem - @alexheretic <https://github.com/alexheretic> for the RLS/implementation itself - @alexcrichton <https://github.com/alexcrichton> @nrc <https://github.com/nrc> do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow? For now we just rolled our own linear queue here because we didn't need much more but maybe it might be worthwhile to extract the pure execution bits somehow? ------------------------------ You can view, comment on, or merge this pull request online at: #1536 Commit Summary - rls-rustc: Remove redundant sysroot crate imports - rls-rustc: Scaffold IPC file loader - rls-rustc: Add PoC to sending data across processes - Use jsonrpc-ipc-server - Use patched jsonrpc IPC server - rls-rustc: Tidy up the implementation - rls: Implement the IPC server on the RLS side - WIP: Pass back analysis as IPC callback - rls-rustc: Adapt to Compilation driver changes - Capture stderr directly for diagnostics - Collect file -> edition mapping in out-of-process compilation - Support Clippy over IPC - WIP: Bump jsonrpc to git master - WIP: Add rls-ipc crate - Use unified IPC interface from rls-ipc - Clean up IPC interfaces in rls-rustc - Format and stuff - Use jsonrpc crate from crates.io - Use VFS snapshots for out-of-process rustc - Handle exit codes directly in RLS - Perform a minor cleanup - Gate out-of-process RLS behind an environment variable - Collect files -> editions mapping after expansion when out-of-process - Wait for the thread the IPC server is spawned on - rls-rustc: Gate anything Clippy-related behind feature flag - Unify in and out of process compilation entry point - Use RlsRustcCalls only in in-process case - Move in-process compilation specific stuff into relevant function - Start prettifying code - Run cargo fmt - Support not compiling out-of-process compilation - Tidy up the remaining implementation File Changes - *M* Cargo.lock <https://github.com/rust-lang/rls/pull/1536/files#diff-0> (291) - *M* Cargo.toml <https://github.com/rust-lang/rls/pull/1536/files#diff-1> (11) - *A* rls-ipc/.gitignore <https://github.com/rust-lang/rls/pull/1536/files#diff-2> (3) - *A* rls-ipc/Cargo.toml <https://github.com/rust-lang/rls/pull/1536/files#diff-3> (21) - *A* rls-ipc/src/client.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-4> (25) - *A* rls-ipc/src/lib.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-5> (9) - *A* rls-ipc/src/rpc.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-6> (80) - *A* rls-ipc/src/server.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-7> (3) - *M* rls-rustc/Cargo.toml <https://github.com/rust-lang/rls/pull/1536/files#diff-8> (16) - *M* rls-rustc/src/bin/rustc.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-9> (4) - *A* rls-rustc/src/clippy.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-10> (90) - *A* rls-rustc/src/ipc.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-11> (73) - *M* rls-rustc/src/lib.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-12> (232) - *M* rls/src/build/cargo.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-13> (2) - *A* rls/src/build/ipc.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-14> (163) - *M* rls/src/build/mod.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-15> (2) - *M* rls/src/build/plan.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-16> (7) - *M* rls/src/build/rustc.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-17> (167) - *M* rls/src/config.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-18> (11) - *M* rls/src/main.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-19> (6) - *M* tests/client.rs <https://github.com/rust-lang/rls/pull/1536/files#diff-20> (1) Patch Links: - https://github.com/rust-lang/rls/pull/1536.patch - https://github.com/rust-lang/rls/pull/1536.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1536?email_source=notifications&email_token=AANB3M665NUXHWRWAQO4UOTQDH4FTA5CNFSM4IJ2ZCZ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HDX62TA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AANB3M7U4RQMBT4E2AZVOWLQDH4FTANCNFSM4IJ2ZCZQ> .

alexcrichton · 2019-08-09T14:29:19Z

do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow

This can be done but it'll likely be somewhat tricky. There's a lot to handle here like:

The dependency graph itself
Limiting parallelism with a jobserver and -j
Handling when one job exits in parallel
Lots of concurrent output to make sure gets weaved accordingly

FWIW though I don't have a good understanding of what this PR is doing, so I'm not quite sure exactly how hard this would be! In any case if you're interested we could try to see if logic could be extracted to crates on crates.io or something like that?

bors · 2019-08-13T15:47:34Z

☔ The latest upstream changes (presumably #1541) made this pull request unmergeable. Please resolve the merge conflicts.

alexheretic · 2019-08-16T16:08:13Z

I'm looking forward to the improvements compiling out-of-process can bring. This implementation looks quite clean. I wonder if we could make this simpler though.

On that note I wouldn't have bothered with the "ipc" feature separation, it adds complexity. Is this just to avoid compiling tokio when not used? Although having both in-process & out with a runtime switch makes sense.

I'd like to have diagnostic streaming instead of after-compile-reporting, and for active processes to be killed/cancelled when a new compile is required. But I'm getting ahead of myself a bit.

bors · 2019-08-26T15:36:15Z

☔ The latest upstream changes (presumably #1545) made this pull request unmergeable. Please resolve the merge conflicts.

Xanewok · 2019-08-28T12:43:33Z

@matklad

I am more worried about depending on tokio and the rest of async ecosystem

Yeah, that's fair - ideally I'd like not to depend on all of it just for the sake of IPC. FWIW we do use Tokio already in the development setting for the integration tests and right now the IPC is conditionally compiled under a feature flag, so for now I'd continue with this approach while we polish it as we go in order to ship it in the RLS for other users.

That's perhaps a naive question, but whey we can't do env RLS_IPC_ENDPOINT=... cargo check?

That's the hottest operation right now in the RLS so I'd like to cut down the overhead as much as possible. I'd prefer not to re-run the cargo check in a cached build scenario potentially on every keystroke (and ideally in the long-run we should not depend on the adaptive delay in order to push the results to the user as soon as possible) - it might do disk I/O and most of the time we probably won't build more than 2 crate targets, short of workspaces with a lot of binary/test crates or with a lot of transitive primary dependencies.

I think that we are unfortunately buying-in into a lot of complexity here. An alternative would be to roll-our-own blocking json-per-line API on top of mkfifo/windows-named-pipe.

Oh, this looks interesting:
https://users.rust-lang.org/t/recommended-way-of-ipc-in-rust/31116/9?u=matklad

Time to shave some more yaks it seems 😈

@alexcrichton

In any case if you're interested we could try to see if logic could be extracted to crates on crates.io or something like that?

That'd be interesting to pursue! In general it'd be great if we could just throw a computation DAG/build plan at an 'execution engine' of sorts and handle the output. As I understand it Rayon (which also has support for jobserver) is meant to be used to process data in a single process rather than coordinate multiple processes?

At a first glance this sounds like a useful thing to have in the ecosystem, so maybe it wouldn't be a waste only to expose it for the sake of RLS?

@alexheretic

On that note I wouldn't have bothered with the "ipc" feature separation, it adds complexity. Is this just to avoid compiling tokio when not used?

Yep! (see above)

I'd like to have diagnostic streaming instead of after-compile-reporting, and for active processes to be killed/cancelled when a new compile is required. But I'm getting ahead of myself a bit.

That's a great idea! Let me write that on a to-do list - we should definitely explore it further.

Since the implementation looks okay as you're saying I'll merge this and we'll hopefully iterate and polish it further as we go. Thanks for the review ❤️

Xanewok · 2019-08-28T12:43:39Z

@bors r+

bors · 2019-08-28T12:43:41Z

📌 Commit 3bcfa0f has been approved by Xanewok

bors · 2019-08-28T12:43:48Z

⌛ Testing commit 3bcfa0f with merge 00e4f29...

@matklad

Implement support for out-of-process compilation This is quite a lengthy patch, but the gist of it is as follows: - `rls-ipc` crate is introduced which acts as the IPC interface along with a server/client implementation - `rls-rustc` is enhanced with optional support for the IPC - RLS can optionally support it via setting `RLS_OUT_OF_PROCESS` env var (like `rls-rustc` it needs to be compiled `ipc` feature) The IPC is async JSON-RPC running on Tokio using `parity-tokio-ipc` (UDS on unices and named pipes on Windows) - Tokio because I wanted to more or less easily come up with a PoC - RPC because I needed a request->response model for VFS IPC function calls - uds/pipes because it's somewhat cross-platform and we don't have to worry about `rustc` potentially polluting stdio (maybe just capturing the output in `run_compiler` would be enough?) However, the implementation is far from efficient - it currently starts a thread per requested compilation, which in turn starts a single-threaded async runtime to drive the IPC server for a given compilation. I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead. While this gets rid of the global environment lock on each (previously) in-process crate compilation, what still needs to be addressed is the [sequential compilation](https://github.com/rust-lang/rls/blob/35eba227650eee482bedac7d691a69a8487b2135/rls/src/build/plan.rs#L122-L124) of cached build plan for this implementation to truly benefit from the unlocked parallelization potential. I did some rough test runs (~5) and on a warm cache had the following results: - integration test suite (release) 3.6 +- 0.2s (in-process) vs 3.8 +- 0.3s (out-of-process) - rustfmt master whitespace change (release) 6.4 +- 0.2s (in-process) vs 6.6 +- 0.3s (out-of-process) which at least initially confirms that the performance overhead is somewhat negligible if we can really parallelize the work and leverage process isolation for increased stability. cc #1307 (I'll squash the commits in the final merge, 30+ commits is a tad too much 😅 ) If possible I'd like to get more eyes on the patch to see if it's a good approach and what might be directly improved: - @matklad for potentially shared rustc-with-patched-filesystem - @alexheretic for the RLS/implementation itself - @alexcrichton @nrc do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow? For now we just rolled our own linear queue here because we didn't need much more but maybe it might be worthwhile to extract the pure execution bits somehow?

bors · 2019-08-28T13:00:25Z

☀️ Test successful - checks-azure
Approved by: Xanewok
Pushing 00e4f29 to master...

Xanewok mentioned this pull request Aug 6, 2019

[WIP] Implement an IPC VFS to be used for out-of-process compilation #1523

Closed

Xanewok force-pushed the ipc-everything branch 5 times, most recently from e293bb8 to 4fb6446 Compare August 13, 2019 15:15

Xanewok force-pushed the ipc-everything branch 2 times, most recently from ccb979d to a6e0304 Compare August 13, 2019 15:52

Implement out-of-process compilation

3bcfa0f

Xanewok force-pushed the ipc-everything branch from a6e0304 to 3bcfa0f Compare August 26, 2019 15:59

bors merged commit 3bcfa0f into rust-lang:master Aug 28, 2019

Xanewok deleted the ipc-everything branch August 28, 2019 13:07

Xanewok mentioned this pull request Sep 19, 2019

cpu usage pegged at 100% seemingly forever #1016

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for out-of-process compilation #1536

Implement support for out-of-process compilation #1536

Xanewok commented Aug 6, 2019

matklad commented Aug 7, 2019

matklad commented Aug 8, 2019 via email

alexcrichton commented Aug 9, 2019

bors commented Aug 13, 2019

alexheretic commented Aug 16, 2019

bors commented Aug 26, 2019

Xanewok commented Aug 28, 2019

Xanewok commented Aug 28, 2019

bors commented Aug 28, 2019

bors commented Aug 28, 2019

bors commented Aug 28, 2019

Implement support for out-of-process compilation #1536

Implement support for out-of-process compilation #1536

Conversation

Xanewok commented Aug 6, 2019

matklad commented Aug 7, 2019

matklad commented Aug 8, 2019 via email

alexcrichton commented Aug 9, 2019

bors commented Aug 13, 2019

alexheretic commented Aug 16, 2019

bors commented Aug 26, 2019

Xanewok commented Aug 28, 2019

Xanewok commented Aug 28, 2019

bors commented Aug 28, 2019

bors commented Aug 28, 2019

bors commented Aug 28, 2019