-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix APM Disablement V1 (ASM Standalone) #4479
base: master
Are you sure you want to change the base?
Conversation
…t the rate limiter)
…no effect on them)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4479 +/- ##
==========================================
- Coverage 97.70% 97.69% -0.01%
==========================================
Files 1375 1382 +7
Lines 83812 83987 +175
Branches 4251 4249 -2
==========================================
+ Hits 81887 82054 +167
- Misses 1925 1933 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Datadog ReportBranch report: ✅ 0 Failed, 20565 Passed, 1370 Skipped, 3m 18.38s Total Time |
BenchmarksBenchmark execution time: 2025-03-12 14:46:19 Comparing candidate commit def4da4 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 31 metrics, 2 unstable metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Currently, there is a karafka integration under review which contains distributed tracing mechanism
|
||
# Skips distributed tracing if disabled for this instrumentation | ||
# or if APM is disabled unless there is an AppSec event (from upstream distributed trace or local) | ||
def should_skip_distributed_tracing?(trace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I am seeing this piece of logic repeatedly several times, perhaps it make sense to encapsulate this from an AppSec module and be reused at multiple places?
The method can be stateless or depends on configuration and the contract is basically returning a boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will investigate that, although I don't think it should be in an AppSec module, as in the V2, this will not contain any reference to AppSec (appsec.standalone
will be replaced by tracing.apm
and APPSEC_EVENT
will be replaced by TRACE_SOURCE
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 91f2f68
What is the process for this kind of situation ? If I merge to master first, should I myself add a commit on the Karafka integration to add APM Disablement capabilities ? (Of course if the other way around happens I could just rebase and add it here) |
I would suggest not to wait on it. However, it would be much welcomed to encapsulate the logic so that the pattern can be adopted easily. |
# Skips distributed tracing if disabled for this instrumentation | ||
# or if APM is disabled unless there is an AppSec event (from upstream distributed trace or local) | ||
def should_skip_distributed_tracing?(client_config: nil, datadog_config: nil, trace: nil) | ||
if ::Datadog.configuration.appsec.standalone.enabled && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the method takes configuration as arguments, why is this line reading global config?
The method would be much more understandable if the types of parameters were listed.
@@ -545,10 +550,30 @@ def skip_trace(name) | |||
end | |||
end | |||
|
|||
# Decide whether upstream sampling priority should be propagated, by taking into account | |||
# the upstream tags and the configuration. | |||
# We should always propagate if there APM is not disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"there APM" is a typo or broken grammar?
def propagate_sampling_priority?(upstream_tags:) | ||
return true unless apm_tracing_disabled | ||
|
||
if upstream_tags&.key?(Datadog::AppSec::Ext::TAG_DISTRIBUTED_APPSEC_EVENT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is hard to understand not knowing the type of upstream_tags
.
What does this PR do?
This PR fixes multiple bugs on APM Disablement V1 (aka ASM Standalone) when it was activated:
Motivation:
When I tried to activate SCA Standalone system-tests, I discovered that disabling AppSec and disabling APM Tracing would result in heartbeat traces not sending dd.apm.enabled=0 tag. During my investigation, I discovered more bugs (listed above).
This is also a step towards APM Disablement V2. For now, the variables are still called after appsec standalone, but this will change to APM disablement in the future and lose all reference to asm standalone, as it is in fact an APM feature.
Change log entry
None.
Additional Notes:
This PR should be followed soon by APM Disablement V2, which will replace the env var by DD_APM_TRACING_ENABLED, and every config/variable referencing asm standalone to apm disablement.
I understand that there might be some concerns about AppSec stuff being in APM territory but this is mostly a matter of naming that will be fixed when APM Disablement V2 will be released. In the V2, I will replace any instance of
Datadog.configuration.appsec.standalone.enabled
byDatadog.configuration.tracing.apm.disabled
,Datadog::AppSec::Ext::TAG_DISTRIBUTED_APPSEC_EVENT
by something likeDatadog::Tracing::Ext::TAG_DISTRIBUTED_TRACE_SOURCE
(_dd.p.ts
, which will be a bitmask). The only exception might be in thepropagate_sampling_priority?
method, in tracer.rb, where we check if AppSec is activated.How to test the change?
CI and system-tests