Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the NGEN tracking fix to be more performant #6598

Merged
merged 5 commits into from
Jan 28, 2025

Conversation

tonyredondo
Copy link
Member

@tonyredondo tonyredondo commented Jan 27, 2025

Summary of changes

This PR refactor the NGEN tracking fix to remove the lock and the unordered_set and reuse the module_ids lock and a vector + 3 mdMethodDef field.

Reason for change

After #6588 was merged, @andrewlock saw some performance impact in the Execution Benchmarks.

Implementation details

Remove a lock and an unordered_set for just a field comparison + a vector Contains if a field is found.

This should improve everything on the missed hit because is just a comparison over 3 fields.

Test coverage

Other details

@tonyredondo tonyredondo self-assigned this Jan 27, 2025
@tonyredondo tonyredondo marked this pull request as ready for review January 27, 2025 18:03
@tonyredondo tonyredondo requested a review from a team as a code owner January 27, 2025 18:03
@lucaspimentel lucaspimentel added area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native) type:performance Performance, speed, latency, resource usage (CPU, memory) labels Jan 27, 2025
@DataDog DataDog deleted a comment from andrewlock Jan 28, 2025
@DataDog DataDog deleted a comment from andrewlock Jan 28, 2025
@DataDog DataDog deleted a comment from datadog-ddstaging bot Jan 28, 2025
@andrewlock
Copy link
Member

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6598) - mean (69ms)  : 66, 72
     .   : milestone, 69,
    master - mean (69ms)  : 67, 72
     .   : milestone, 69,

    section CallTarget+Inlining+NGEN
    This PR (6598) - mean (980ms)  : 954, 1005
     .   : milestone, 980,
    master - mean (1,026ms)  : 1003, 1049
     .   : milestone, 1026,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6598) - mean (107ms)  : 105, 109
     .   : milestone, 107,
    master - mean (108ms)  : 106, 109
     .   : milestone, 108,

    section CallTarget+Inlining+NGEN
    This PR (6598) - mean (672ms)  : 656, 689
     .   : milestone, 672,
    master - mean (728ms)  : 704, 752
     .   : milestone, 728,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6598) - mean (91ms)  : 89, 94
     .   : milestone, 91,
    master - mean (92ms)  : 91, 94
     .   : milestone, 92,

    section CallTarget+Inlining+NGEN
    This PR (6598) - mean (630ms)  : 610, 649
     .   : milestone, 630,
    master - mean (674ms)  : 656, 692
     .   : milestone, 674,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6598) - mean (190ms)  : 185, 195
     .   : milestone, 190,
    master - mean (190ms)  : 184, 195
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (6598) - mean (1,087ms)  : 1059, 1115
     .   : milestone, 1087,
    master - mean (1,126ms)  : 1097, 1155
     .   : milestone, 1126,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6598) - mean (276ms)  : 271, 281
     .   : milestone, 276,
    master - mean (276ms)  : 271, 281
     .   : milestone, 276,

    section CallTarget+Inlining+NGEN
    This PR (6598) - mean (859ms)  : 828, 890
     .   : milestone, 859,
    master - mean (912ms)  : 884, 940
     .   : milestone, 912,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6598) - mean (264ms)  : 260, 268
     .   : milestone, 264,
    master - mean (264ms)  : 261, 268
     .   : milestone, 264,

    section CallTarget+Inlining+NGEN
    This PR (6598) - mean (842ms)  : 807, 876
     .   : milestone, 842,
    master - mean (886ms)  : 857, 916
     .   : milestone, 886,

Loading

@andrewlock
Copy link
Member

Benchmarks Report for tracer 🐌

Benchmarks for #6598 compared to master:

  • 2 benchmarks are faster, with geometric mean 1.140
  • 2 benchmarks are slower, with geometric mean 1.132
  • All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 8.02μs 44.5ns 281ns 0.0153 0.00765 0 5.61 KB
master StartStopWithChild netcoreapp3.1 10.1μs 55.9ns 366ns 0.0145 0.00485 0 5.8 KB
master StartStopWithChild net472 16.1μs 41.5ns 161ns 1.05 0.315 0.102 6.21 KB
#6598 StartStopWithChild net6.0 7.8μs 44.9ns 336ns 0.0114 0.0038 0 5.61 KB
#6598 StartStopWithChild netcoreapp3.1 9.87μs 51.8ns 279ns 0.0203 0.0101 0 5.8 KB
#6598 StartStopWithChild net472 16.2μs 60ns 232ns 1.06 0.331 0.105 6.21 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 493μs 201ns 751ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 688μs 340ns 1.32μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 867μs 541ns 2.09μs 0.431 0 0 3.3 KB
#6598 WriteAndFlushEnrichedTraces net6.0 477μs 225ns 810ns 0 0 0 2.7 KB
#6598 WriteAndFlushEnrichedTraces netcoreapp3.1 647μs 291ns 1.13μs 0 0 0 2.7 KB
#6598 WriteAndFlushEnrichedTraces net472 860μs 447ns 1.61μs 0.428 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 125μs 381ns 1.48μs 0.188 0 0 14.47 KB
master SendRequest netcoreapp3.1 145μs 330ns 1.19μs 0.217 0 0 17.27 KB
master SendRequest net472 0.00469ns 0.00166ns 0.00644ns 0 0 0 0 b
#6598 SendRequest net6.0 129μs 499ns 1.93μs 0.191 0 0 14.47 KB
#6598 SendRequest netcoreapp3.1 146μs 414ns 1.6μs 0.148 0 0 17.27 KB
#6598 SendRequest net472 0.0061ns 0.00227ns 0.00881ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 553μs 2.7μs 14.3μs 0.584 0 0 41.49 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 660μs 3.43μs 16.4μs 0.342 0 0 41.71 KB
master WriteAndFlushEnrichedTraces net472 846μs 2.88μs 10.8μs 8.45 2.53 0.422 53.31 KB
#6598 WriteAndFlushEnrichedTraces net6.0 550μs 2.96μs 15.7μs 0.553 0 0 41.56 KB
#6598 WriteAndFlushEnrichedTraces netcoreapp3.1 666μs 3.55μs 20.1μs 0.324 0 0 41.72 KB
#6598 WriteAndFlushEnrichedTraces net472 849μs 3.9μs 15.6μs 8.28 2.48 0.414 53.3 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.3μs 1.31ns 5.05ns 0.0143 0 0 1.02 KB
master ExecuteNonQuery netcoreapp3.1 1.82μs 1.81ns 7.01ns 0.0135 0 0 1.02 KB
master ExecuteNonQuery net472 2.1μs 1.94ns 7.49ns 0.157 0.00105 0 987 B
#6598 ExecuteNonQuery net6.0 1.26μs 0.835ns 3.23ns 0.0145 0 0 1.02 KB
#6598 ExecuteNonQuery netcoreapp3.1 1.71μs 3.54ns 13.2ns 0.0134 0 0 1.02 KB
#6598 ExecuteNonQuery net472 2.11μs 1.4ns 5.24ns 0.156 0.00105 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.31μs 1ns 3.62ns 0.0137 0 0 976 B
master CallElasticsearch netcoreapp3.1 1.59μs 0.948ns 3.29ns 0.0128 0 0 976 B
master CallElasticsearch net472 2.57μs 1.95ns 7.56ns 0.158 0 0 995 B
master CallElasticsearchAsync net6.0 1.3μs 1.12ns 4.35ns 0.0131 0 0 952 B
master CallElasticsearchAsync netcoreapp3.1 1.63μs 0.51ns 1.84ns 0.0138 0 0 1.02 KB
master CallElasticsearchAsync net472 2.61μs 0.779ns 2.81ns 0.167 0 0 1.05 KB
#6598 CallElasticsearch net6.0 1.29μs 0.73ns 2.83ns 0.0137 0 0 976 B
#6598 CallElasticsearch netcoreapp3.1 1.6μs 1.03ns 3.87ns 0.0132 0 0 976 B
#6598 CallElasticsearch net472 2.55μs 1.87ns 7.23ns 0.158 0 0 995 B
#6598 CallElasticsearchAsync net6.0 1.36μs 0.418ns 1.57ns 0.013 0 0 952 B
#6598 CallElasticsearchAsync netcoreapp3.1 1.66μs 0.834ns 3.23ns 0.0133 0 0 1.02 KB
#6598 CallElasticsearchAsync net472 2.67μs 1.81ns 6.79ns 0.167 0 0 1.05 KB
Benchmarks.Trace.GraphQLBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6598

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net6.0 1.143 1,220.11 1,394.93

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.22μs 0.596ns 2.15ns 0.0136 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.7μs 0.68ns 2.45ns 0.0128 0 0 952 B
master ExecuteAsync net472 1.78μs 0.39ns 1.51ns 0.145 0 0 915 B
#6598 ExecuteAsync net6.0 1.39μs 0.534ns 2ns 0.0133 0 0 952 B
#6598 ExecuteAsync netcoreapp3.1 1.6μs 1.16ns 4.2ns 0.0129 0 0 952 B
#6598 ExecuteAsync net472 1.81μs 0.419ns 1.57ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.45μs 3.11ns 11.7ns 0.0314 0 0 2.31 KB
master SendAsync netcoreapp3.1 5.49μs 2.45ns 9.48ns 0.0384 0 0 2.85 KB
master SendAsync net472 7.48μs 1.92ns 7.43ns 0.494 0 0 3.12 KB
#6598 SendAsync net6.0 4.51μs 2.16ns 8.09ns 0.0316 0 0 2.31 KB
#6598 SendAsync netcoreapp3.1 5.31μs 1.77ns 6.85ns 0.0369 0 0 2.85 KB
#6598 SendAsync net472 7.33μs 1.49ns 5.76ns 0.496 0 0 3.12 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.56μs 2.29ns 8.85ns 0.0227 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 2.3μs 0.916ns 3.55ns 0.0219 0 0 1.64 KB
master EnrichedLog net472 2.54μs 1.62ns 6.08ns 0.249 0 0 1.57 KB
#6598 EnrichedLog net6.0 1.47μs 0.562ns 2.1ns 0.0227 0 0 1.64 KB
#6598 EnrichedLog netcoreapp3.1 2.24μs 6.03ns 22.6ns 0.0223 0 0 1.64 KB
#6598 EnrichedLog net472 2.59μs 0.834ns 3.23ns 0.25 0 0 1.57 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 115μs 99.2ns 384ns 0 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 121μs 118ns 455ns 0 0 0 4.28 KB
master EnrichedLog net472 151μs 96.7ns 375ns 0.675 0.225 0 4.46 KB
#6598 EnrichedLog net6.0 117μs 233ns 901ns 0.0586 0 0 4.28 KB
#6598 EnrichedLog netcoreapp3.1 121μs 246ns 953ns 0 0 0 4.28 KB
#6598 EnrichedLog net472 152μs 124ns 480ns 0.68 0.227 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 3.15μs 1.63ns 6.31ns 0.03 0 0 2.2 KB
master EnrichedLog netcoreapp3.1 4.19μs 1.38ns 5.17ns 0.0293 0 0 2.2 KB
master EnrichedLog net472 4.82μs 1.36ns 5.28ns 0.32 0 0 2.02 KB
#6598 EnrichedLog net6.0 3.14μs 1.12ns 4.32ns 0.0309 0 0 2.2 KB
#6598 EnrichedLog netcoreapp3.1 4.18μs 1.27ns 4.77ns 0.0293 0 0 2.2 KB
#6598 EnrichedLog net472 4.75μs 1.54ns 5.95ns 0.319 0 0 2.02 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.36μs 1.03ns 3.98ns 0.0164 0 0 1.14 KB
master SendReceive netcoreapp3.1 1.72μs 1.1ns 4.27ns 0.0155 0 0 1.14 KB
master SendReceive net472 2.06μs 0.636ns 2.29ns 0.183 0 0 1.16 KB
#6598 SendReceive net6.0 1.4μs 1.22ns 4.71ns 0.0162 0 0 1.14 KB
#6598 SendReceive netcoreapp3.1 1.77μs 0.445ns 1.72ns 0.015 0 0 1.14 KB
#6598 SendReceive net472 2.12μs 0.938ns 3.63ns 0.183 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.72μs 1.11ns 4.29ns 0.022 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 4.05μs 1.33ns 5.15ns 0.0222 0 0 1.65 KB
master EnrichedLog net472 4.43μs 3.37ns 13.1ns 0.322 0 0 2.04 KB
#6598 EnrichedLog net6.0 2.69μs 0.658ns 2.46ns 0.0216 0 0 1.6 KB
#6598 EnrichedLog netcoreapp3.1 4.04μs 1.19ns 4.6ns 0.0223 0 0 1.65 KB
#6598 EnrichedLog net472 4.45μs 2.64ns 9.89ns 0.323 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6598

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑netcoreapp3.1 1.120 568.30 636.70

Faster 🎉 in #6598

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 1.155 467.58 404.94

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 468ns 0.77ns 2.98ns 0.00806 0 0 576 B
master StartFinishSpan netcoreapp3.1 568ns 1.34ns 5.19ns 0.00771 0 0 576 B
master StartFinishSpan net472 613ns 1.99ns 7.7ns 0.0917 0 0 578 B
master StartFinishScope net6.0 483ns 0.78ns 3.02ns 0.00968 0 0 696 B
master StartFinishScope netcoreapp3.1 702ns 0.895ns 3.47ns 0.00946 0 0 696 B
master StartFinishScope net472 854ns 1.09ns 4.21ns 0.104 0 0 658 B
#6598 StartFinishSpan net6.0 404ns 0.493ns 1.84ns 0.00804 0 0 576 B
#6598 StartFinishSpan netcoreapp3.1 636ns 0.956ns 3.58ns 0.00758 0 0 576 B
#6598 StartFinishSpan net472 621ns 1.7ns 6.13ns 0.0918 0 0 578 B
#6598 StartFinishScope net6.0 495ns 0.475ns 1.84ns 0.00986 0 0 696 B
#6598 StartFinishScope netcoreapp3.1 704ns 1.13ns 4.39ns 0.0096 0 0 696 B
#6598 StartFinishScope net472 769ns 1.62ns 6.29ns 0.104 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #6598

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net472 1.125 1,162.75 1,033.38

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 636ns 1.18ns 4.56ns 0.0097 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 897ns 2.51ns 9.71ns 0.00938 0 0 696 B
master RunOnMethodBegin net472 1.16μs 2.07ns 8.01ns 0.104 0 0 658 B
#6598 RunOnMethodBegin net6.0 650ns 1.16ns 4.5ns 0.00978 0 0 696 B
#6598 RunOnMethodBegin netcoreapp3.1 884ns 1.16ns 4.49ns 0.00925 0 0 696 B
#6598 RunOnMethodBegin net472 1.04μs 1.71ns 6.62ns 0.104 0 0 658 B

Copy link
Collaborator

@gleocadie gleocadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@tonyredondo tonyredondo merged commit e14cb11 into master Jan 28, 2025
136 of 141 checks passed
@tonyredondo tonyredondo deleted the tony/ngen-performance-fix branch January 28, 2025 18:11
@github-actions github-actions bot added this to the vNext-v3 milestone Jan 28, 2025
andrewlock pushed a commit that referenced this pull request Jan 28, 2025
## Summary of changes

This PR refactor the NGEN tracking fix to remove the `lock` and the
`unordered_set` and reuse the `module_ids` lock and a vector + 3
mdMethodDef field.

## Reason for change

After #6588 was merged, @andrewlock saw some performance impact in the
Execution Benchmarks.

## Implementation details

Remove a lock and an unordered_set for just a field comparison + a
vector Contains if a field is found.

This should improve everything on the missed hit because is just a
comparison over 3 fields.

## Test coverage

## Other details
<!-- Fixes #{issue} -->

<!-- ⚠️ Note: where possible, please obtain 2 approvals prior to
merging. Unless CODEOWNERS specifies otherwise, for external teams it is
typically best to have one review from a team member, and one review
from apm-dotnet. Trivial changes do not require 2 reviews. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native) type:performance Performance, speed, latency, resource usage (CPU, memory)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants