-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Samply reports different results based on perf.data compared to perf or Inferno #290
Comments
https://splichal.eu/tmp/perf.data.zst gives me an "Access forbidden!" error. |
Likely what's happening is that samply is just giving you much more detailed information, broken down by thread. In the samply flamegraph, at the bottom you can see The SVG flame graph says there were 3598 samples inside |
Please try it now! I've used |
Ok, I've just tried to select one thread and filter for But it seems to suffer from the very same problem as Inferno. Even for a simple thread, the number of call stacks leading to the filtered function is enormous and one gets a gazillion of vertical lines, each representing its stack. What's also interesting is that if I select a single thread, I can't see any value in the Self column: Is it correct? Based on the files I provides, can you open the perf profile on your box? |
The flame graph and call tree visualizations aren't a great fit for workloads which make a lot of use of recursion. Rayon uses recursion extensively. I agree it makes for a very messy flame graph. A better view for these types of profiles would be a "top functions" view (firefox-devtools/profiler#15 is filed about this). |
Ah, I see your use of |
Yes, it is correct. If you drill down deeper into the stack you will eventually end up at call nodes with non-zero self times. See https://profiler.firefox.com/docs/#/./guide-stack-samples-and-call-trees?id=self-time-in-the-call-tree for a more detailed explanation. |
Everything's working as intended, closing. Feel free to keep asking questions here, I'm just marking this as "no work needs to be done". |
I couldn't resist looking at your profile and had some ideas for optimizations. These are all about RPM parsing inside the rpm crate:
On the debuginfod-rs side, I think it's worth thinking about a way to kick off the RPM parsing before all the upfront directory traversal is done. This would get you improved parallelism. Also, unrelated to performance, I noticed that |
Really appreciate your observations - they are all good learning material and I've just copied your reply message to RPM project so that the maintainer is aware of it. When it comes to debuginfod-rs: it's just a hobby project (that is ~30x faster than the official elfutils implementation even with the aforementioned issues) and the bottleneck would be the disk drive. Thank you! |
That's pretty impressive!
I would have expected this too, but that's not what your profile indicated. The threads were busy at 100% CPU usage, and they weren't spending any time in the kernel pagefaulting. It looked really quite CPU bound. |
Oh yeah! The profile I presented is actually collected after my page cache is filled for the disk reads. The cold start takes about 4x more time and that's with an extremely fast SSD (~10GB/s). |
That makes sense. Thanks for clearing that up :) |
I've got an application that scans RPM files and builds build-id debuginfod index of that. If I run perf, I've got the following profile (the results are expected based on what the program does):
If I run Inferno, I get the following Flamegraph (note I collapse the call stacks as the application uses Rayon):
perf script | inferno-collapse-perf --skip-after=debuginfod_rs::Server::analyze_file | inferno-flamegraph --minwidth=1
However, if I try Samply, I get the following Flamegraph:

Most of the profile is taken by functions (e.g.
std::sys::pal::unix::fs::stat
22%) which I cannot see in the Perf report or Inferno flamegraph).Can you please take a look?
Needed files:
https://splichal.eu/tmp/debuginfod-rs.zst
https://splichal.eu/tmp/perf.data.zst
The text was updated successfully, but these errors were encountered: