Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Method Table view #4227

Conversation

parttimenerd
Copy link
Contributor

@parttimenerd parttimenerd commented Sep 9, 2022

This PR implements a CallTree based view that presents all methods with their total and self times:

image

It implements a simplified version of the view first suggested in #15 (Add a "top functions" call tree view) and reiterated in #4205.

It supports all the amenities of the CallTree view like the side bar or highlighting the samples in the timeline.

Sample profile

Deployment preview

As in other PRs: This PR has not yet any tests, but it is manually tested and I'll add the tests after I'm sure what the final implementation is. The tests should be really similar to the tests for the CallTree view itself, as it is essentially the same view just with a different source of data.

┆Issue is synchronized with this Jira Task

@parttimenerd parttimenerd requested a review from a team as a code owner September 9, 2022 08:22
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Attention: Patch coverage is 34.22819% with 196 lines in your changes missing coverage. Please review.

Project coverage is 87.91%. Comparing base (64cdd7e) to head (c098615).
Report is 1742 commits behind head on main.

Files with missing lines Patch % Lines
src/profile-logic/call-tree.js 3.84% 108 Missing and 17 partials ⚠️
src/profile-logic/profile-data.js 56.57% 26 Missing and 7 partials ⚠️
src/selectors/per-thread/index.js 50.00% 11 Missing and 3 partials ⚠️
src/selectors/per-thread/stack-sample.js 64.51% 10 Missing and 1 partial ⚠️
src/reducers/profile-view.js 12.50% 6 Missing and 1 partial ⚠️
...rc/components/calltree/ProfileFunctionTableView.js 40.00% 3 Missing ⚠️
src/actions/profile-view.js 0.00% 2 Missing ⚠️
src/components/app/BottomBox.js 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4227      +/-   ##
==========================================
- Coverage   88.56%   87.91%   -0.65%     
==========================================
  Files         282      283       +1     
  Lines       25333    25619     +286     
  Branches     6817     6886      +69     
==========================================
+ Hits        22435    22523      +88     
- Misses       2696     2865     +169     
- Partials      202      231      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mstange
Copy link
Contributor

mstange commented Sep 13, 2022

Very nice! It seems to work correctly.

I have a few comments, about the naming and about the implementation.

First off, using the word "method" in "Method Table" is a bit Java-centric. In other languages, not every piece of executable code is a method. So I would propose a different name instead: "Function List".

Furthermore, I think the implementation should not make use of the CallNodeInfo type. Because we're dealing with a flat list here, I think it's better to avoid all use of "tree" and "node" terminology. Instead, you could put your information into a new FuncListInfo struct.

Also, the current implementation does a lot of work when the preview selection changes, because it iterates over all samples and walks the entire stack for each sample (doing repeated work if the same stack is present in multiple samples), and it allocates a fresh Set object for every sample.
See my feedback in this comment: #2388 (review)

Independent of the samples, compute the "does stack N contain func M" cache so that it can be memoized and reused for different preview selections. And then have a selector that combines the preview selection filtered samples with the cache and returns the times per function for the current selection.

I'm not sure what's the best way to represent the "does stack N contain func M" cache. It feels like a sparse boolean matrix. I guess you could use an array of sets, i.e. one set for each stack, where the set contains the funcs in that stack. But maybe there's a more efficient data structure for this.

@mstange
Copy link
Contributor

mstange commented Sep 13, 2022

I have thought about this more and have come to the conclusion that my advice about the cache was probably misguided. Computing a set per stack upfront is not necessarily going to be better, in fact it could even lead to wasted work because there are many stacks which aren't directly used by samples and only exist as ancestor nodes. And computing a set per stack would mean that all those sets need to be retained, whereas your solution only needs one set at a given time.
So, in conclusion, your _getSummaryPerFunction approach seems fine. We can revisit this if it ever shows up as a bottleneck.

@parttimenerd
Copy link
Contributor Author

First off, using the word "method" in "Method Table" is a bit Java-centric. In other languages, not every piece of executable code is a method. So I would propose a different name instead: "Function List".

I totally forget that, thanks for catching this.

Furthermore, I think the implementation should not make use of the CallNodeInfo type. Because we're dealing with a flat list here, I think it's better to avoid all use of "tree" and "node" terminology. Instead, you could put your information into a new FuncListInfo struct.

That would really complicate the implementation. My implementation uses the CallTree and Sidebar component (and has therefore to use the CallNodeInfo type) to reuse most of the existing functionality.

So replacing the types is only really possible by using an alias to CallNodeInfo in all FunctionList specific places. This should clear most of the confusion.

@mstange
Copy link
Contributor

mstange commented Sep 14, 2022

It supports all the amenities of the CallTree view like the side bar or highlighting the samples in the timeline.

Oh, I forgot to comment on this.

One thing I noticed that isn't supported yet is the context menu.

As for the timeline highlighting, it doesn't seem to be highlighting based on the selection in the function list. The highlight seems to be based on the selected node in the call tree tab, i.e. based on a selection that's invisible when the function list is shown. And clicking the graph in the timeline does not change which row in the function list is selected.
However, that's fine for now, because it's unclear to me how it should work in the ideal case. So this can be worked on as a follow-up. I just wouldn't call it "supported" in the current state.

  • Which function should be highlighted when the graph is clicked?
    • Each pixel in the graph belongs to a sample, which has a stack, which consists of many functions. Which of the functions in the stack should we select? Maybe the leaf function?
  • Which parts of the graph should be highlighted when a function is selected?
    • We should highlight the pixels in the graph which belong to samples whose stack contains the selected function. However, this is tricky to achieve with the current graph drawing code: For each category fill, the graph currently only supports one "selected" area. That's because, in the call tree, the selected subtree of the call tree always corresponds to a "connected area" in the graph, due to how stacks in the graph are sorted - they're sorted in call tree DFS traversal order. But in the function list, when a function is selected, this function can be present in many stacks which are in disjoint subtrees of the call tree. So the "selected samples" will not form one connected area in the graph. So the graph drawing needs to support multiple disjoint selected areas within a single category fill.

@parttimenerd
Copy link
Contributor Author

I agree with you that all of this is to tricky for this small PR and I fear that these discussions will prevent this PR from ever getting merged.

The side bar works nonetheless, which is really important.

@parttimenerd parttimenerd force-pushed the parttimenerd_method_table_view branch from 3b870ef to ee9e16a Compare September 15, 2022 15:04
@parttimenerd
Copy link
Contributor Author

parttimenerd commented Sep 21, 2022

@mstange I fixed all your suggestions

@mstange
Copy link
Contributor

mstange commented Sep 27, 2022

I hope to look at this later this week.

While testing this, I found a profile which shows sluggish preview selection handling:
https://deploy-preview-4227--perf-html.netlify.app/public/kva8z3a82gvp50t0xw49re353y90atzs6bka2dr/function-table/?globalTrackOrder=0&hiddenLocalTracksByPid=14432-01&localTrackOrderByPid=14432-01&thread=0&v=7

Maybe this example can help us optimize this view. Dragging the mouse in the timeline is much smoother in the Call Tree tab than it is in the Function Table tab.

@parttimenerd
Copy link
Contributor Author

Thanks for the sample. I have a few ideas how to solve this (and will try to test them tomorrow), they also are based on caching information so that less information is computed on every change. I think the lag for small (< 3s or so) selections is quite good, but everything above is problematic. There are two ideas that come into my mind:

  1. only compute the function table for the newly selected time range portion (since the last refresh)
  2. precompute the method table for every 1000 (or so) samples and only recompute for the start and the end of the selection

These two ideas combined should result in larger memory usage for a drastically reduced number of (expensive) method table computations from samples.

I'll try to implement these and check with the profiler whether it worked. But I'm hopeful and it is far less complicated than previous ideas.

(One remaining problem could be the sheer number of methods. But this could be solved by storing the functions with one or two usages more efficiently.)

@parttimenerd
Copy link
Contributor Author

My new version improves of the computation uses a multi-level cache and trades memory for performance: Firefox Profiler uses around 10 - 15 % more memory with this change. But the parameters of the cache can be configured.

The cache is generic and could be used for speeding up the computation callees and callers in a yet to be proposed feature.

@mstange
Copy link
Contributor

mstange commented Oct 4, 2022

Still haven't found time to look at this in detail, hoping for this week.

In the new deploy preview, dragging a selection now works very fast, nice! I noticed that it's still sluggish if the mouse is over the marker track while it's being moved, but that's completely unrelated to this PR. I wonder if the sluggish behavior I saw previously was due to the same issue - I hope I didn't accidentally blame this PR for a sluggish marker track.

With the cache, do you mean selectedFunctionTableNodeSelectors?

It looks like graph highlighting is working nicely now, though I don't really understand how it can work - how did you address this case I mentioned in a previous comment?

So the "selected samples" will not form one connected area in the graph.

One bug I found is that double-clicking a function will not scroll the source view to the clicked function. For example, opening the source view on js::GCMarker::traverse(JSString*) displays the markAndTraverse method instead. My hunch is that the "selected call node" is interpreted in the wrong way.

I also noticed that the search filter's current behavior is not very intuitive for the function table view. I would expect it to filter out all functions that don't match the search string. At the moment, it still displays functions that are present elsewhere in the matching stacks. But this is a bigger change and should be deferred for a separate PR.

@parttimenerd
Copy link
Contributor Author

In the new deploy preview, dragging a selection now works very fast, nice! I noticed that it's still sluggish if the mouse is over the marker track while it's being moved, but that's completely unrelated to this PR. I wonder if the sluggish behavior I saw previously was due to the same issue - I hope I didn't accidentally blame this PR for a sluggish marker track.

The remaining sluggishness comes from a too primitive implementation in the call tree, combined with using React events which are triggered not that regularly.

With the cache, do you mean selectedFunctionTableNodeSelectors?

I mean the SummaryCacheTreeNode class and related (see https://github.com/firefox-devtools/profiler/pull/4227/files#diff-f59d84bde204d9e3751ca53e860796d9da5b948173c561109fa4337f88b6a238R617). This is used to build a multilevel cache.

It looks like graph highlighting is working nicely now, though I don't really understand how it can work - how did you address this case I mentioned in a previous comment?

It was not that hard: I just walk over all stacks and look for the selected function where I previously just looked at the top frame. See https://github.com/firefox-devtools/profiler/pull/4227/files#diff-7a963fec8fbae1d005d4cbb324ff7399f707c667f19fc79fef0ceada360a4d1fR531

I also noticed that the search filter's current behavior is not very intuitive for the function table view. I would expect it to filter out all functions that don't match the search string. At the moment, it still displays functions that are present elsewhere in the matching stacks. But this is a bigger change and should be deferred for a separate PR.

This is the same for the call tree. I find searching pretty unintuitive in the whole profiler UI. I will tackle this in an upcoming proposal: I want to be able to specify for detailed filters based on glob patterns, like @function toString which matches all toString functions.

@parttimenerd
Copy link
Contributor Author

One bug I found is that double-clicking a function will not scroll the source view to the clicked function. For example, opening the source view on js::GCMarker::traverse(JSString*) displays the markAndTraverse method instead. My hunch is that the "selected call node" is interpreted in the wrong way.

It works for me for almost all examples correctly, so I rather fix it in a later PR. I don't think your particular case has anything to do with my implementation.

Copy link
Contributor

@mstange mstange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some progress on the review, but I didn't dig into the actual meaty pieces. I hope to do so tomorrow. I'm posting what I have so far.

@mstange
Copy link
Contributor

mstange commented Oct 11, 2022

It looks like graph highlighting is working nicely now, though I don't really understand how it can work - how did you address this case I mentioned in a previous comment?

It was not that hard: I just walk over all stacks and look for the selected function where I previously just looked at the top frame.

Oh, you took the easy way out and made it highlight the same areas for different samples, by always sorting "selected" before "unselected". Here's an example profile where you can compare the highlighting of the functions "Left" and "Right" between the Call Tree and the Function Table tabs. In your current implementation, both "Left" and "Right" highlight the same areas, even though they correspond to different samples.
That's ok for now, don't worry about it. But I wanted to make sure that you understood the issue that I'm referring to.

@mstange
Copy link
Contributor

mstange commented Oct 17, 2022

One bug I found is that double-clicking a function will not scroll the source view to the clicked function. For example, opening the source view on js::GCMarker::traverse(JSString*) displays the markAndTraverse method instead. My hunch is that the "selected call node" is interpreted in the wrong way.

I think this is because BottomBox isn't adjusting its selectedCallNodeLineTimings for the function table. To fix this bug, the BottomBox would need to use selectedFunctionTableNodeSelectors.getSourceViewLineTimings when the function table is visible. It currently always uses selectedNodeSelectors.getSourceViewLineTimings:

    selectedCallNodeLineTimings:
      selectedNodeSelectors.getSourceViewLineTimings(state),

@mstange
Copy link
Contributor

mstange commented Oct 17, 2022

I still need to read the "summary" code. I'm getting there, slowly.

I have one more general request: Please use JavaScript arrays for the new arrays you're adding, not Typed Arrays such as Int32Array. The typed arrays may have slightly better performance in some cases, but they're more complicated to deal with because they can't contain null and because you can't specify the Flow type of the array element. We still have a fair amount of them in the profiler code but we've been slowly phasing them out and are using regular JS arrays in new code. We haven't observed any adverse performance from them.

So please change your new arrays into regular JS arrays and use null instead of -1 in them.

@parttimenerd
Copy link
Contributor Author

Good to know, I'll change them soon.

@parttimenerd
Copy link
Contributor Author

So please change your new arrays into regular JS arrays and use null instead of -1 in them.

The problem with this is that my new code is intertwined with lot's of old code which uses these typed arrays and passes them into my code or expects it. I'm not too inclined to do a major refactoring of the other code in this PR. But I'm open to do this in another PR if need be.

@parttimenerd
Copy link
Contributor Author

I'm happy for any feedback.

@mstange
Copy link
Contributor

mstange commented Oct 31, 2022

I have read the summary cache tree code now. Let me try to summarize what it does: It reduces the time taken to compute function timings for a sub-range of the samples, by re-using previous calculations over other sub-ranges of samples. This way, when the user drags a preview selection, we ideally only need to iterate over the newly-added samples (and add their timings into to the timings of existing cached "summary nodes") instead of iterating over all samples in the selection, when computing the timings.
It's a neat idea, and I hadn't thought of it before. But I think it's a bit complicated, and it also has a certain memory cost.
I think it would be better to find other ways to reduce the cost of the calculation, so that iterating over all samples is cheap enough.

For now, can you remove the summary cache tree again? I think performance is good enough without it. And then, as a follow-up, we can improve the performance in other ways.

Here's how I would reduce the cost of what's currently happening in _getSummaryPerFunctionSlice, as follow-ups:

  • Optimize the case where the same stack is used in multiple samples, by aggregating the weight per stack index first.
  • Now there are three sources of slowness in the computation:
    1. Duplicated work from "similar but different" stacks, i.e. stacks which consist of the same functions but which have different stack indexes.
    2. Accessing the function index still goes through the frame table.
    3. For each stack, to de-duplicate functions in the stack, we have to build up a set.
  • To optimize 1 and 2, we could use the call node info from the regular call tree. It has a tree structure where "stacks which consist of the same functions in the same order" have already been collapsed into a single call node. So the same-stack optimization would get a better hit rate. The call node info also has a direct access to each node's function index, saving the indirection via the frame table.
  • To optimize 3, we could build a transformed call node info which eliminates call nodes for functions which are already present in the stack. Specifically, any node whose function is already present in an ancestor of this node would get "merged-away" into its parent node. Building this transformed (de-duplicated) call node info would probably be somewhat expensive - I don't think there's a way to do it without creating a set of functions per node - but we'd only need to do it once. And then during the sample computation we could rely on the fact that this call node info will never show the same function twice in the same stack, so we can just "walk the stack and add the weight" without building up a set.

The last optimization idea sounds quite complicated and is probably not worth doing, but I wanted to throw it out there.

@parttimenerd
Copy link
Contributor Author

parttimenerd commented Oct 31, 2022

It's a neat idea, and I hadn't thought of it before. But I think it's a bit complicated, and it also has a certain memory cost.
I think it would be better to find other ways to reduce the cost of the calculation, so that iterating over all samples is cheap enough

How about using this version and implementing the other optimizations as another PR, so that this PR is not stalled? I'm willing to commit to further improve the Function Table going forward.

For now, can you remove the summary cache tree again? I think performance is good enough without it. And then, as a follow-up, we can improve the performance in other ways.

The performance is almost unusable for larger profiles (and my use case has these a lot)...

@mstange
Copy link
Contributor

mstange commented Oct 31, 2022

It's a neat idea, and I hadn't thought of it before. But I think it's a bit complicated, and it also has a certain memory cost.
I think it would be better to find other ways to reduce the cost of the calculation, so that iterating over all samples is cheap enough

How about using this version and implementing the other optimizations as another PR, so that this PR is not stalled?

Yes of course, that's what I meant with "as follow-ups".

For now, can you remove the summary cache tree again? I think performance is good enough without it. And then, as a follow-up, we can improve the performance in other ways.

The performance is almost unusable for larger profiles (and my use case has these a lot)...

Oh, really? I hadn't realized that. That's unfortunate.

Ok, then let's keep the tree for now, but please document it. The comments for the cache tree should say which use case is being optimized, and roughly how it works. Furthermore, the invariants of the tree fields should be documented, i.e. what's the sum of what, which sample index ranges are nested / overlapping / adjacent, what it means for children to be null, etc.

@parttimenerd
Copy link
Contributor Author

I've added the requested documentation (the field invariants are stated in the documentation of each field).

Is this now ready for adding tests? Any recommendations on what tests I should add?

@julienw julienw requested a review from mstange November 17, 2022 12:37
@julienw
Copy link
Contributor

julienw commented Nov 17, 2022

Hey @mstange, do you think you would have the chance to look at this again?

Copy link
Contributor

@mstange mstange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting a few more comments about things I noticed while reading. I still haven't completely read the summary cache implementation.

As for tests, let's start with a few basic tests which exercise the code in call-tree.js, and a few which exercise the actions and reducers.

For the call tree tests, make a simple example profile and then check that the counts of each function are correct, including the recursion case where the function must only be counted once.

@mstange
Copy link
Contributor

mstange commented Nov 29, 2022

Oh, Jeff noticed a bug when testing the deploy preview: When a preview selection is applied, the percentages are currently of the non-preview-filtered thread, which is inconsistent with the call tree tab.

@parttimenerd parttimenerd force-pushed the parttimenerd_method_table_view branch from d97cb29 to aa05d53 Compare December 21, 2022 14:04
@parttimenerd
Copy link
Contributor Author

I rebased it on the current main branch.

Squashed all previous commits
@parttimenerd parttimenerd force-pushed the parttimenerd_method_table_view branch from aa05d53 to a2e8359 Compare December 21, 2022 16:48
@parttimenerd
Copy link
Contributor Author

Oh, Jeff noticed a bug when testing the deploy preview: When a preview selection is applied, the percentages are currently of the non-preview-filtered thread, which is inconsistent with the call tree tab.

I fixed it, it was probably just forgotten.

@julienw
Copy link
Contributor

julienw commented Jan 16, 2023

I noticed another issue: the transform shortcuts work in the function table, but they don't target the right node index. Probably you need to change handleCallNodeTransformShortcut. I can see that you could split it in 2 parts, so that one part is specific to panels (finding the funcIndex etc), and one part is generic (handling the actual keys).

But taking a step back, I feel like you're trying a bit too hard to reuse existing code, instead of starting from scratch for this panel. But I think I need to think about it more. We don't want to copy paste everything either, especially the call-tree part. I don't like much how you call the new structure also a call node table, I think this is confusing things.

I need to think a bit more.

@flodolo flodolo removed the request for review from a team August 2, 2023 10:32
@flodolo
Copy link
Contributor

flodolo commented Aug 2, 2023

(taking this out of my queue for now)

@parttimenerd
Copy link
Contributor Author

This PR seems to have stalled.

@krk
Copy link

krk commented Sep 12, 2024

Hi! This feature seems interesting and useful, is the PR abandoned?

@xoofx
Copy link

xoofx commented Nov 26, 2024

Hey, I'm also looking for this feature that would make the profiler UI more usable to investigate such things.

@mstange
Copy link
Contributor

mstange commented Nov 27, 2024

Reading this PR again, my main concern during the review was the complexity of the implementation, to the degree that it was hard for me as a reviewer to understand what was going on. At the time I also didn't have a lot of time to think of an approach I'd prefer.

Now that I've worked on speeding up the inverted call tree, I have a much better understanding of what I'd like the implementation to look like. I've also changed the infrastructure around the call tree and the computation of call tree timings in a way that should make a function table easier to plug in, so that the code reuse with the existing call tree happens at the right abstraction boundary and we don't need to create a fake call node table.

I've posted the recommended implementation in this comment: #15 (comment)

I'll close this PR for now. The new implementation would look differently enough that it should happen in a new PR.

@mstange mstange closed this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants