1
- # Propagate head trace sampling probability
1
+ # Propagate parent sampling probability
2
2
3
- Use the W3C trace context to convey consistent head trace sampling probability.
3
+ Use the W3C trace context to convey consistent parent sampling probability.
4
4
5
5
## Motivation
6
6
7
- The head trace sampling probability is the probability associated with
7
+ The parent sampling probability is the probability associated with
8
8
the start of a trace context that was used to determine whether the
9
9
W3C ` sampled ` flag is set, which determines whether child contexts
10
10
will be sampled by a ` ParentBased ` Sampler. It is useful to know the
11
- head trace sampling probability associated with a context in order to
11
+ parent sampling probability associated with a context in order to
12
12
build span-to-metrics pipelines when the built-in ` ParentBased `
13
13
Sampler is used. Further motivation for supporting span-to-metrics
14
14
pipelines is presented in [ OTEP
@@ -30,10 +30,9 @@ itself](https://github.com/open-telemetry/opentelemetry-specification/pull/1852)
30
30
31
31
## Explanation
32
32
33
- Two pieces of information are needed to convey consistent head trace
34
- sampling probability:
33
+ Two pieces of information are needed to convey consistent parent sampling probability:
35
34
36
- 1 . p-value representing the head trace sampling probability.
35
+ 1 . p-value representing the parent sampling probability.
37
36
2 . r-value representing the "randomness" as the source of consistent sampling decisions.
38
37
39
38
This proposal uses 6 bits of information to propagate each of these
@@ -42,11 +41,28 @@ sufficiently specified for probability sampling at this time. This
42
41
proposal closely follows [ research by Otmar
43
42
Ertl] ( https://arxiv.org/pdf/2107.07703.pdf ) .
44
43
44
+ ### Adjusted count
45
+
46
+ The concept of adjusted count is introduced in [ OTEP
47
+ 170] ( ./0170-sampling_probability.md ) . Briefly, adjusted count is defined
48
+ in terms of the sampling probability, where:
49
+
50
+ | Sampling probability | Adjusted count | Notes |
51
+ | -- | -- | -- |
52
+ | ` probability ` != 0 | ` adjusted_count ` = ` 1/probability ` | For spans selected with non-zero probability, adjusted count is the inverse of their sampling probability. |
53
+ | ` probability ` == 0 | ` adjusted_count ` = 0 | For spans that were not selected by a probability sampler, adjusted count is zero. |
54
+
55
+ The term is used to convey the representivity of an item that was (or
56
+ was not) selected by a probability sampler. Items that are not
57
+ selected by a probability sampler are logically assigned zero adjusted
58
+ count, such that if they are recorded for any other reason they do not
59
+ introduce bias in the estimated count of the total span population.
60
+
45
61
### p-value
46
62
47
63
To limit the cost of this extension and for statistical reasons
48
- documented below, we propose to limit head trace sampling probability
49
- to powers of two. This limits the available head trace sampling
64
+ documented below, we propose to limit parent sampling probability
65
+ to powers of two. This limits the available parent sampling
50
66
probabilities to 1/2, 1/4, 1/8, and so on. We can compactly encode
51
67
these probabilities as small integer values using the base-2 logarithm
52
68
of the adjusted count.
@@ -60,7 +76,7 @@ When propagated, the "p-value" as it is known will be interpreted as
60
76
shown in the following table. The p-value for known sampling
61
77
probabilities is the negative base-2 logarithm of the probability:
62
78
63
- | p-value | Head Probability |
79
+ | p-value | Parent Probability |
64
80
| ----- | ----------- |
65
81
| 0 | 1 |
66
82
| 1 | 1/2 |
@@ -74,20 +90,20 @@ probabilities is the negative base-2 logarithm of the probability:
74
90
75
91
[ As specified in OTEP 170 for the Trace data
76
92
model] ( https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md ) ,
77
- head sampling probability can be stored in exported Span data to
93
+ parent sampling probability can be stored in exported Span data to
78
94
enable span-to-metrics pipelines to be built. Because ` tracestate ` is
79
95
already encoded in the OpenTelemetry Span, this proposal is requires
80
96
no changes to the Span protocol. Accepting this proposal means the
81
- p-value can be derived from ` tracesstate ` when the head sampling
97
+ p-value can be derived from ` tracestate ` when the parent sampling
82
98
probability is known.
83
99
84
100
An unknown value for ` p ` cannot be propagated using ` tracestate `
85
- explicitly, simply omitting ` p ` conveys an unknown head sampling
101
+ explicitly, simply omitting ` p ` conveys an unknown parent sampling
86
102
probability.
87
103
88
104
### r-value
89
105
90
- With head trace sampling probabilities limited to powers of two, the
106
+ With parent sampling probabilities limited to powers of two, the
91
107
amount of randomness needed per trace context is limited. A
92
108
consistent sampling decision is accomplished by propagating a specific
93
109
random variable known as the r-value.
@@ -145,7 +161,7 @@ import (
145
161
146
162
func nextRValueLeading () int {
147
163
x := uint64 (rand.Int63 ()) // 63 least-significant bits are random
148
- y := x << 1 | 0x7 // 61 most-significant bits are random
164
+ y := x << 1 | 0x3 // 62 most-significant bits are random
149
165
return bits.LeadingZeros64 (y)
150
166
}
151
167
```
@@ -160,13 +176,13 @@ import (
160
176
161
177
func nextRValueTrailing () int {
162
178
x := uint64 (rand.Int63 ())
163
- for r := 0 ; r < 61 ; r++ {
179
+ for r := 0 ; r < 62 ; r++ {
164
180
if x & 0x1 == 0x1 {
165
181
return r
166
182
}
167
183
x = x >> 1
168
184
}
169
- return 61
185
+ return 62
170
186
}
171
187
```
172
188
@@ -178,7 +194,7 @@ but not at probabilities 1-in-16 and smaller.
178
194
179
195
### Proposed ` tracestate ` syntax
180
196
181
- The consistent sampling r-value (` r ` ) and and head sampling
197
+ The consistent sampling r-value (` r ` ) and the parent sampling
182
198
probability p-value (` p ` ) will be propagated using two bytes of base16
183
199
content for each of the two fields, as follows:
184
200
@@ -206,7 +222,7 @@ tracestate: ot=r:0a;p:03
206
222
and translates to
207
223
208
224
```
209
- base16(p-value) = 03 // 1-in-8 head probability
225
+ base16(p-value) = 03 // 1-in-8 parent sampling probability
210
226
base16(r-value) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
211
227
```
212
228
@@ -215,7 +231,7 @@ A `ParentBased` Sampler will include `ot=r:0a;p:03` in the stored
215
231
count of 8 spans. The ` sampled=true ` flag remains set.
216
232
217
233
A ` TraceIDRatioBased ` Sampler configured with probability 2** -10 or
218
- greater will enable ` sampled=true ` and convey a new head sampling
234
+ greater will enable ` sampled=true ` and convey a new parent sampling
219
235
probability via ` tracestate: ot=r:0a;p:0a ` .
220
236
221
237
A ` TraceIDRatioBased ` Sampler configured with probability 2** -11 or
@@ -226,7 +242,7 @@ setting `tracestate: ot=r:0a`.
226
242
227
243
The reasoning behind restricting the set of sampling rates is that it:
228
244
229
- - Lowers the cost of propagating head sampling probability
245
+ - Lowers the cost of propagating parent sampling probability
230
246
- Limits the number of random bits required
231
247
- Avoids floating-point to integer rounding errors
232
248
- Makes math involving partial traces tractable.
@@ -238,22 +254,23 @@ explains how to work with a limited number of power-of-2 sampling rates.
238
254
### Behavior of the ` TraceIDRatioBased ` Sampler
239
255
240
256
The Sampler MUST be configured with a power-of-two probability
241
- expressed as ` 2**-s ` with s being an integer in the range [ 0, 61]
242
- except for the special case of zero probability.
257
+ expressed as ` 2**-s ` with s being an integer in the range [ 0, 62]
258
+ except for the special case of zero probability (in which case ` p=63 `
259
+ is used).
243
260
244
261
If the context is a new root, the initial ` tracestate ` must be created
245
- with randomness value ` r ` , as described above, in the range [ 0, 61 ] .
262
+ with randomness value ` r ` , as described above, in the range [ 0, 62 ] .
246
263
If the context is not a new root, output a new ` tracestate ` with the
247
264
same ` r ` value as the parent context.
248
265
266
+ In both cases, set the sampled bit if the outgoing ` p ` is less than or
267
+ equal to the outgoing ` r ` (i.e., ` p <= r ` ).
268
+
249
269
When sampled, in both cases, the context's p-value ` p ` is set to the
250
270
value of ` s ` in the range [ 0, 62] . If the sampling probability is
251
271
zero (the special case where ` s ` is undefined), use ` p=63 ` the
252
272
specified value for zero probability.
253
273
254
- In both cases, set the sampled bit if the outgoing ` p ` is less than or
255
- equal to the outgoing ` r ` (i.e., ` p <= r ` ).
256
-
257
274
If the context is not a new root and the incoming context's r-value
258
275
is not set, the implementation SHOULD notify the user of an error
259
276
condition and follow the incoming context's ` sampled ` flag.
@@ -262,12 +279,12 @@ condition and follow the incoming context's `sampled` flag.
262
279
263
280
The ` ParentBased ` sampler is unmodified by this proposal. It honors
264
281
the W3C ` sampled ` flag and copies the incoming ` tracestate ` keys to
265
- the child context. If the incoming context has known head sampling
282
+ the child context. If the incoming context has known parent sampling
266
283
probability, so does the Span.
267
284
268
- The span's head probability is known when both ` p ` and ` r ` are defined
269
- are defined in the ` ot ` sub-key of ` tracestate ` . When ` r ` or ` p `
270
- areis not defined, the span's head sampling probability is unknown.
285
+ The span's parent sampling probability is known when both ` p ` and ` r `
286
+ are defined in the ` ot ` sub-key of ` tracestate ` . When ` r ` or ` p ` are
287
+ not defined, the span's parent sampling probability is unknown.
271
288
272
289
### Behavior of the ` AlwaysOn ` Sampler
273
290
@@ -298,10 +315,11 @@ Values of `p` are interpreted as follows:
298
315
| 6 | 64 |
299
316
| 7 | 0 |
300
317
301
- Note there are only 6 non-zero, non-unknown values for the adjusted
302
- count. Thus there are six defined values of ` r ` and ` s ` . The
303
- following table shows ` r ` and the corresponding selection probability,
304
- along with the calculated adjusted count for each ` s ` :
318
+ Note there are only seven known non-zero values for the adjusted count
319
+ (` p ` ) ranging from 1 to 64. Thus there are seven defined values of ` r `
320
+ and ` s ` . The following table shows ` r ` and the corresponding
321
+ selection probability, along with the calculated adjusted count for
322
+ each ` s ` :
305
323
306
324
| ` r ` value | probability of ` r ` | ` s=0 ` | ` s=1 ` | ` s=2 ` | ` s=3 ` | ` s=4 ` | ` s=5 ` | ` s=6 ` |
307
325
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
@@ -315,12 +333,11 @@ along with the calculated adjusted count for each `s`:
315
333
316
334
Notice that the sum of ` r ` probability times adjusted count in each of
317
335
the ` s=* ` columns equals 1. For example, in the ` s=4 ` column we have
318
- `0* 1/2 + 0* 1/4 + 0* 1/8 + 0* 1/16 + 16* 1/32 + 16* 1/64 + 16* 1/64 =
319
- 16/32 + 16/64 + 16/64 = 1` . In the ` s=2` column we have ` 0* 1/2 +
320
- 0* 1/4 + 4* 1/8 + 4* 1/16 + 4* 1/32 + 4* 1/64 + 4* 1/64 = 4/8 + 4/16 +
321
- 4/32 + 4/64 + 4/64 = 1/2 + 1/4 + 1/8 + 1/16 + 1/16 = 1`. We conclude
322
- that when ` r ` is chosen with the given probabilities, any choice of
323
- ` s ` produces one expected span.
336
+ `0* 1/2 + 0* 1/4 + 0* 1/8 + 0* 1/16 + 16* 1/32 + 16* 1/64 + 16* 1/64 = 1/2 +
337
+ 1/4 + 1/4 = 1` . In the ` s=2` column we have ` 0* 1/2 + 0* 1/4 + 4* 1/8 +
338
+ 4* 1/16 + 4* 1/32 + 4* 1/64 + 4* 1/64 = 1/2 + 1/4 + 1/8 + 1/16 + 1/16 = 1`.
339
+ We conclude that when ` r ` is chosen with the given probabilities,
340
+ any choice of ` s ` produces one expected span.
324
341
325
342
## Invariant checking
326
343
@@ -334,7 +351,7 @@ respect to the incoming and outgoing values for `p`, `r`, and
334
351
| TraceIDRatio(Non-Root) | used | unused | ignored | checked and passed through | set to ` s ` | set to ` p <= r ` |
335
352
| TraceIDRatio(Root) | n/a | n/a | n/a | random variable | set to ` s ` | set to ` p <= r ` |
336
353
337
- There are several cases where the resulting span's head sampling
354
+ There are several cases where the resulting span's parent sampling
338
355
probability is unknown:
339
356
340
357
| Sampler | Unknown condition |
@@ -360,18 +377,18 @@ as discussed below.
360
377
361
378
The violation is always addressed by honoring the ` sampled ` flag and
362
379
correcting ` p ` to either 63 (for zero adjusted count) or unset (for
363
- unknown adjusted count ).
380
+ unknown parent sampling probability ).
364
381
365
382
If ` sampled ` is false and the invariant is violated, drop ` p ` from the
366
- outgoing context to convey unknown head probability.
383
+ outgoing context to convey unknown parent sampling probability.
367
384
368
385
The case where ` sampled ` is true with ` p=63 ` indicating 0% probability
369
386
may by regarded as a special case to allow zero adjusted count
370
387
sampling, which permits non-probabilistic sampling to take place in
371
388
the presence of probability sampling. Set ` p ` to 63.
372
389
373
390
If ` sampled ` is true with ` p<63 ` (but ` p>r ` ), drop ` p ` from the
374
- outgoing context to convey unknown head probability.
391
+ outgoing context to convey unknown parent sampling probability.
375
392
376
393
## Prototype
377
394
@@ -399,9 +416,9 @@ way with respect to the bits of the TraceID.
399
416
400
417
### Not using TraceID randomness
401
418
402
- It would be possible, if TraceID were specified to have at least 61
419
+ It would be possible, if TraceID were specified to have at least 62
403
420
uniform random bits, to compute the randomness value described above
404
- as the number of leading zeros among those 61 random bits.
421
+ as the number of leading zeros among those 62 random bits.
405
422
406
423
However, this would require modifying the W3C traceparent specification,
407
424
therefore we do not propose to use bits of the TraceID.
@@ -422,15 +439,15 @@ data to avoid the computational cost of hashing TraceIDs.
422
439
423
440
### Restriction to power-of-two
424
441
425
- Restricting head sampling rates to powers of two does not limit tail
442
+ Restricting parent sampling probabilities to powers of two does not limit tail
426
443
Samplers from using arbitrary probabilities. The companion [ OTEP
427
444
170] ( https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md ) has discussed
428
445
the use of a ` sampler.adjusted_count ` attribute that would not be
429
446
limited to power-of-two values. Discussion about how to represent the
430
447
effective adjusted count for tail-sampled Spans belongs in [ OTEP
431
448
170] ( https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md ) , not this OTEP.
432
449
433
- Restricting head sampling rates to powers of two does not limit
450
+ Restricting parent sampling probabilities to powers of two does not limit
434
451
Samplers from using arbitrary effective probabilities over a period of
435
452
time. For example, a typical trace sampling rate of 5% (i.e., 1 in
436
453
20 ) can be accomplished by choosing 1/16 sampling 60% of the time and
0 commit comments