Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix APM Disablement V1 (ASM Standalone) #4479

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

vpellan
Copy link
Contributor

@vpellan vpellan commented Mar 11, 2025

What does this PR do?

This PR fixes multiple bugs on APM Disablement V1 (aka ASM Standalone) when it was activated:

  • When HTTP frameworks (Net::HTTP, RestClient, Excon, Faraday...) were instrumented, the sampling_priority was set to MANUAL_REJECT if tags contained _dd.p.appsec (which was used to check for upstream propagated appsec events), then not sampled later due to a change in the sampler in this PR This was fixed by making this check during the building step of a trace from a digest (which also means that it now works even if an HTTP framework is not instrumented)
  • Distributed tracing was skipped only for Net::HTTP (in case of upstream or local security event). It now skips it on every HTTP frameworks supported, but also Sidekiq and GRPC.
  • The APM Tracing disabled tag (_dd.apm.enabled=0) was not added when appsec was disabled. It is now fixed and the code related to it was moved to Tracing
  • Added tests that actually tests APM disablement heartbeat traces (ASM Standalone too) and cleaned related tests

Motivation:

When I tried to activate SCA Standalone system-tests, I discovered that disabling AppSec and disabling APM Tracing would result in heartbeat traces not sending dd.apm.enabled=0 tag. During my investigation, I discovered more bugs (listed above).
This is also a step towards APM Disablement V2. For now, the variables are still called after appsec standalone, but this will change to APM disablement in the future and lose all reference to asm standalone, as it is in fact an APM feature.

Change log entry

None.

Additional Notes:

This PR should be followed soon by APM Disablement V2, which will replace the env var by DD_APM_TRACING_ENABLED, and every config/variable referencing asm standalone to apm disablement.

I understand that there might be some concerns about AppSec stuff being in APM territory but this is mostly a matter of naming that will be fixed when APM Disablement V2 will be released. In the V2, I will replace any instance of Datadog.configuration.appsec.standalone.enabled by Datadog.configuration.tracing.apm.disabled, Datadog::AppSec::Ext::TAG_DISTRIBUTED_APPSEC_EVENT by something like Datadog::Tracing::Ext::TAG_DISTRIBUTED_TRACE_SOURCE (_dd.p.ts, which will be a bitmask). The only exception might be in the propagate_sampling_priority? method, in tracer.rb, where we check if AppSec is activated.

How to test the change?

CI and system-tests

@vpellan vpellan requested review from a team as code owners March 11, 2025 15:59
@vpellan vpellan self-assigned this Mar 11, 2025
@github-actions github-actions bot added integrations Involves tracing integrations appsec Application Security monitoring product tracing labels Mar 11, 2025
@vpellan vpellan added this to the 2.13.0 milestone Mar 11, 2025
@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 99.71751% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.69%. Comparing base (2f427f8) to head (def4da4).
Report is 65 commits behind head on master.

Files with missing lines Patch % Lines
lib/datadog/tracing/distributed/circuit_breaker.rb 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4479      +/-   ##
==========================================
- Coverage   97.70%   97.69%   -0.01%     
==========================================
  Files        1375     1382       +7     
  Lines       83812    83987     +175     
  Branches     4251     4249       -2     
==========================================
+ Hits        81887    82054     +167     
- Misses       1925     1933       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Mar 11, 2025

Datadog Report

Branch report: vpellan/fix-sca-standalone
Commit report: def4da4
Test service: dd-trace-rb

✅ 0 Failed, 20565 Passed, 1370 Skipped, 3m 18.38s Total Time

@pr-commenter
Copy link

pr-commenter bot commented Mar 11, 2025

Benchmarks

Benchmark execution time: 2025-03-12 14:46:19

Comparing candidate commit def4da4 in PR branch vpellan/fix-sca-standalone with baseline commit 2f427f8 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 31 metrics, 2 unstable metrics.

Copy link
Contributor

@TonyCTHsu TonyCTHsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Currently, there is a karafka integration under review which contains distributed tracing mechanism


# Skips distributed tracing if disabled for this instrumentation
# or if APM is disabled unless there is an AppSec event (from upstream distributed trace or local)
def should_skip_distributed_tracing?(trace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I am seeing this piece of logic repeatedly several times, perhaps it make sense to encapsulate this from an AppSec module and be reused at multiple places?

The method can be stateless or depends on configuration and the contract is basically returning a boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will investigate that, although I don't think it should be in an AppSec module, as in the V2, this will not contain any reference to AppSec (appsec.standalone will be replaced by tracing.apm and APPSEC_EVENT will be replaced by TRACE_SOURCE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 91f2f68

@vpellan
Copy link
Contributor Author

vpellan commented Mar 12, 2025

FYI: Currently, there is a karafka integration under review which contains distributed tracing mechanism

What is the process for this kind of situation ? If I merge to master first, should I myself add a commit on the Karafka integration to add APM Disablement capabilities ? (Of course if the other way around happens I could just rebase and add it here)

@TonyCTHsu
Copy link
Contributor

What is the process for this kind of situation ?

I would suggest not to wait on it. However, it would be much welcomed to encapsulate the logic so that the pattern can be adopted easily.

# Skips distributed tracing if disabled for this instrumentation
# or if APM is disabled unless there is an AppSec event (from upstream distributed trace or local)
def should_skip_distributed_tracing?(client_config: nil, datadog_config: nil, trace: nil)
if ::Datadog.configuration.appsec.standalone.enabled &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the method takes configuration as arguments, why is this line reading global config?

The method would be much more understandable if the types of parameters were listed.

@@ -545,10 +550,30 @@ def skip_trace(name)
end
end

# Decide whether upstream sampling priority should be propagated, by taking into account
# the upstream tags and the configuration.
# We should always propagate if there APM is not disabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"there APM" is a typo or broken grammar?

def propagate_sampling_priority?(upstream_tags:)
return true unless apm_tracing_disabled

if upstream_tags&.key?(Datadog::AppSec::Ext::TAG_DISTRIBUTED_APPSEC_EVENT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is hard to understand not knowing the type of upstream_tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product integrations Involves tracing integrations tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants