-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PGO in the compiler's build system #79562
Comments
I don't think we need to optimize for full collection + use, I'm not even sure how many people would use it if available -- it's just not really a common thing I expect. One question is that for the Rust part of the PGO, I imagine we don't want to collect or use the PGO data for the first round of compilation, right? Since that'll have differing codegen from the second round (potentially even using a different LLVM version). That's fine -- rustbuild can handle that for us -- but something to keep in mind. If we don't support simultaneously using and generating PGO data we should probably check that at startup but I'm not otherwise all that worried. |
Hm, I wonder if it would be possible to make stage 1 instrumented, use it to collect profile data, and then make stage 2 use that data. The profdata format looks rather robust, so that might even work for different LLVM versions. But I would not want to rely on it. Are our release artifacts stage 2 builds? Then the instrumented compiler should also be a stage 2 build. |
Right, stage 2 is basically the artifacts we'd be using for this, though they're built as "stage 1 artifacts" -- the second rustc binary that gets produced. I'm not too worried about getting the language precise here, I think we have agreement on what we need to profile :) I expect that we can build stage 0 artifacts as normal -- no need for special PGO handling, though eventually the beta they're built with will be PGO'd so that should give a nice speed up. Then the second round of compilation (stage 1 artifacts) is built, instrumented, we run our workloads to gather data, and then rebuild the second round of compilation using that data. This gives us a single compiler rebuild in terms of overhead. Thinking along those lines, perhaps the right thing is actually for rustbuild to always do this - i.e., the pgo-build option you suggest is not actually necessary. Specifically, rustbuild will collect data if the profile-generate options are given and use it with profile-use; just profile-use will only use it, and both will do the full thing. |
If this works on a technical level with the profile data format then this sounds like a pretty great idea. You've already got a multi-phase build, so reusing one of those phases to generate the profile data would be a clever trick! Per my comment elsewhere, this would impact the reproducibility of the builds. To preserve that it would be nice to publish the profile data as a build artifact and provide rustbuild support for producing an optimized build given an existing set of profile data. |
Let me see if I understand this correctly: Right now we have:
that would become:
So, PGO would mean one additional build, right? |
I'm wondering if that isn't actually too complicated since we'd actually never use that build in CI. What I suggested originally would look like: # case 1: generate profile data
beta --> stage0 --> stage1 =run=> profdata (for uploading)
^^^^^^
dist (instrumented)
# case 2: use profile data (downloaded, from some previous build)
beta --> stage0 --> stage1
^^^^^^
dist (profile-use) Cases (1) and (2) would always run in parallel, and case (2) would use profile data from a previous run of case (1). The advantage is that we don't have to worry about building LLVM twice. |
Another wrinkle here that I just realized is that LLVM is built only once -- so for the profile-generate case we would work with an instrumented LLVM already when building the instrumented |
I think your layout makes sense for the case where someone sets the profile use config but not profile generate. If both are set, I think it makes sense to do the layout in #79562 (comment). If sccache does not properly handle PGO flags for caching then we'll need to leave LLVM out of scope for now, otherwise I expect there should be no problem with building LLVM twice for the immediate reuse case. I am still a bit skeptical about using old profile data wrt to perf.rlo, but I think we can tackle that problem when we get to it. |
sccache should handle clang's PGO flags fine since Firefox builds use them. However, it won't cache the optimized Interestingly, reading the clang documentation reminded me that LLVM added support for using the output of a sampling profiler as the profiling input for PGO. This is likely out of scope as it's not interoperable with the instrumentation-based profile data and thus wouldn't fit neatly into what's already been implemented in rustc, but it's an interesting thing to consider since it doesn't require building instrumented binaries. |
I don't think we should use the PGOed compiler for perf.rlo. It would just be confusing otherwise. |
Oh, good to know! Yes, support for that would have to be added to sccache. The |
I think I'd prefer if we just didn't support doing a PGOed build with a single |
While working on #80033, I ran into |
I owe you a longer-form writeup, but for Firefox we had perf testing on both PGO and non-PGO builds, and it was possible to compare the results between them. This was really useful because the PGO builds were what we were actually shipping to users, but being able to compare between the two was good and also allowed for quicker iteration when the PGO perf difference didn't matter. (IIRC enabling PGO builds was opt-in for Firefox try pushes.) |
@luser Yes, it would be great if we could benchmark both PGO and non-PGO builds. If we can afford that (which is something @Mark-Simulacrum would know better than me) then all the better. If we had to choose one of the two, I'd opt for benchmarking just non-PGO builds since they would not contain the "random" effects of partially out-dated profiling data. |
This is what I've been able to find out so far: "value profiling" allows to record some of the actual values that a variable takes during execution. To do that the profiling runtime has to reserve some space in the profdata. Apparently one does not need many of these counters in the common case: It looks like an overflow here can safely be ignored. It might just lead to less accurate profile data: But it also looks like increasing the number of counters via |
Ok, useful to know. I don't feel like it's critical that we avoid the warnings but there seems to not be any significant disadvantage to bumping slightly so I'll leave that in the code for now. |
@luser What's the best way of getting support for |
Opening an issue would be a good start, but the project is a bit under-maintained right now so it's likely that doing the work yourself would be the quickest path to success. I'd be happy to give you pointers on how to implement it. I'm on the Rust Discord as |
FYI: I've made a PR to sccache that implements the missing functionality. I haven't heard back from the maintainers yet. |
Any update on this issue? |
The required sccache changes have not been merged yet. |
We ended up foregoing the caching for the time being; LLVM and rustc are both PGO'd on x86_64-unknown-linux-gnu as of #88069. The sccache changes would still be helpful, though. |
The sccache changes have been merged. |
One of the first steps in bringing PGO to rustc is adding support for it to the compiler's build system. Concretely this means
-Cprofile-generate
phase), and-Cprofile-use
phase)Optionally, the build system could also
In rust-lang/cargo#7618 (comment), @luser suggests that the three phases should be operable separately. I think that makes a lot of sense since on CI we'll probably want to either only run the "profile-use" phase with downloaded profile data, or only the "profile-generate" phase that uploads the profile data.
I suggest adding the following new
config.toml
settings:llvm.profile-generate = ( "<path>" | "default-path" )
llvm.profile-use = ( "<path>" | "default-path" | "download-profdata" )
rust.profile-generate = ( "<path>" | "default-path" )
rust.profile-use = ( "<path>" | "default-path" | "download-profdata" )
rust.profile-collect = (true | false)
The paths specify the argument to the
-Cprofile-generate
and-Cprofile-use
flags.default-path
means some known location within the build directory.download-profdata
means that the build system will try to find and download the most recent profile data available for the given commit. Ifrust.profile-collect
is true, the build system will run a standard data collection scenario with the instrumented compiler (or will error if none ofprofile-generate
flags is set).Note that this setup has the following implications:
./configure --llvm.profile-generate=default-path --rust.profile-generate=default-path --llvm.profile-collect=true ./x.py build ./x.py clean # including LLVM? ./configure --llvm.profile-use=default-path --rust.profile-use=default-path ./x.py build
./configure && ./x.py build
though.I'd like to hear if somebody can come up with a different, better setup. Maybe something that allows the entire three-phase build with a single
./x.py pgo-build
command?The text was updated successfully, but these errors were encountered: