Skip to content

Commit c861c17

Browse files
jmacdcarlosalberto
authored andcommitted
Probability sampler composition rules (open-telemetry#175)
1 parent c35c26d commit c861c17

File tree

2 files changed

+362
-151
lines changed

2 files changed

+362
-151
lines changed

text/trace/0168-sampling-propagation.md

+66-49
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
# Propagate head trace sampling probability
1+
# Propagate parent sampling probability
22

3-
Use the W3C trace context to convey consistent head trace sampling probability.
3+
Use the W3C trace context to convey consistent parent sampling probability.
44

55
## Motivation
66

7-
The head trace sampling probability is the probability associated with
7+
The parent sampling probability is the probability associated with
88
the start of a trace context that was used to determine whether the
99
W3C `sampled` flag is set, which determines whether child contexts
1010
will be sampled by a `ParentBased` Sampler. It is useful to know the
11-
head trace sampling probability associated with a context in order to
11+
parent sampling probability associated with a context in order to
1212
build span-to-metrics pipelines when the built-in `ParentBased`
1313
Sampler is used. Further motivation for supporting span-to-metrics
1414
pipelines is presented in [OTEP
@@ -30,10 +30,9 @@ itself](https://github.com/open-telemetry/opentelemetry-specification/pull/1852)
3030

3131
## Explanation
3232

33-
Two pieces of information are needed to convey consistent head trace
34-
sampling probability:
33+
Two pieces of information are needed to convey consistent parent sampling probability:
3534

36-
1. p-value representing the head trace sampling probability.
35+
1. p-value representing the parent sampling probability.
3736
2. r-value representing the "randomness" as the source of consistent sampling decisions.
3837

3938
This proposal uses 6 bits of information to propagate each of these
@@ -42,11 +41,28 @@ sufficiently specified for probability sampling at this time. This
4241
proposal closely follows [research by Otmar
4342
Ertl](https://arxiv.org/pdf/2107.07703.pdf).
4443

44+
### Adjusted count
45+
46+
The concept of adjusted count is introduced in [OTEP
47+
170](./0170-sampling_probability.md). Briefly, adjusted count is defined
48+
in terms of the sampling probability, where:
49+
50+
| Sampling probability | Adjusted count | Notes |
51+
| -- | -- | -- |
52+
| `probability` != 0 | `adjusted_count` = `1/probability` | For spans selected with non-zero probability, adjusted count is the inverse of their sampling probability. |
53+
| `probability` == 0 | `adjusted_count` = 0 | For spans that were not selected by a probability sampler, adjusted count is zero. |
54+
55+
The term is used to convey the representivity of an item that was (or
56+
was not) selected by a probability sampler. Items that are not
57+
selected by a probability sampler are logically assigned zero adjusted
58+
count, such that if they are recorded for any other reason they do not
59+
introduce bias in the estimated count of the total span population.
60+
4561
### p-value
4662

4763
To limit the cost of this extension and for statistical reasons
48-
documented below, we propose to limit head trace sampling probability
49-
to powers of two. This limits the available head trace sampling
64+
documented below, we propose to limit parent sampling probability
65+
to powers of two. This limits the available parent sampling
5066
probabilities to 1/2, 1/4, 1/8, and so on. We can compactly encode
5167
these probabilities as small integer values using the base-2 logarithm
5268
of the adjusted count.
@@ -60,7 +76,7 @@ When propagated, the "p-value" as it is known will be interpreted as
6076
shown in the following table. The p-value for known sampling
6177
probabilities is the negative base-2 logarithm of the probability:
6278

63-
| p-value | Head Probability |
79+
| p-value | Parent Probability |
6480
| ----- | ----------- |
6581
| 0 | 1 |
6682
| 1 | 1/2 |
@@ -74,20 +90,20 @@ probabilities is the negative base-2 logarithm of the probability:
7490

7591
[As specified in OTEP 170 for the Trace data
7692
model](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md),
77-
head sampling probability can be stored in exported Span data to
93+
parent sampling probability can be stored in exported Span data to
7894
enable span-to-metrics pipelines to be built. Because `tracestate` is
7995
already encoded in the OpenTelemetry Span, this proposal is requires
8096
no changes to the Span protocol. Accepting this proposal means the
81-
p-value can be derived from `tracesstate` when the head sampling
97+
p-value can be derived from `tracestate` when the parent sampling
8298
probability is known.
8399

84100
An unknown value for `p` cannot be propagated using `tracestate`
85-
explicitly, simply omitting `p` conveys an unknown head sampling
101+
explicitly, simply omitting `p` conveys an unknown parent sampling
86102
probability.
87103

88104
### r-value
89105

90-
With head trace sampling probabilities limited to powers of two, the
106+
With parent sampling probabilities limited to powers of two, the
91107
amount of randomness needed per trace context is limited. A
92108
consistent sampling decision is accomplished by propagating a specific
93109
random variable known as the r-value.
@@ -145,7 +161,7 @@ import (
145161

146162
func nextRValueLeading() int {
147163
x := uint64(rand.Int63()) // 63 least-significant bits are random
148-
y := x << 1 | 0x7 // 61 most-significant bits are random
164+
y := x << 1 | 0x3 // 62 most-significant bits are random
149165
return bits.LeadingZeros64(y)
150166
}
151167
```
@@ -160,13 +176,13 @@ import (
160176

161177
func nextRValueTrailing() int {
162178
x := uint64(rand.Int63())
163-
for r := 0; r < 61; r++ {
179+
for r := 0; r < 62; r++ {
164180
if x & 0x1 == 0x1 {
165181
return r
166182
}
167183
x = x >> 1
168184
}
169-
return 61
185+
return 62
170186
}
171187
```
172188

@@ -178,7 +194,7 @@ but not at probabilities 1-in-16 and smaller.
178194

179195
### Proposed `tracestate` syntax
180196

181-
The consistent sampling r-value (`r`) and and head sampling
197+
The consistent sampling r-value (`r`) and the parent sampling
182198
probability p-value (`p`) will be propagated using two bytes of base16
183199
content for each of the two fields, as follows:
184200

@@ -206,7 +222,7 @@ tracestate: ot=r:0a;p:03
206222
and translates to
207223

208224
```
209-
base16(p-value) = 03 // 1-in-8 head probability
225+
base16(p-value) = 03 // 1-in-8 parent sampling probability
210226
base16(r-value) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
211227
```
212228

@@ -215,7 +231,7 @@ A `ParentBased` Sampler will include `ot=r:0a;p:03` in the stored
215231
count of 8 spans. The `sampled=true` flag remains set.
216232

217233
A `TraceIDRatioBased` Sampler configured with probability 2**-10 or
218-
greater will enable `sampled=true` and convey a new head sampling
234+
greater will enable `sampled=true` and convey a new parent sampling
219235
probability via `tracestate: ot=r:0a;p:0a`.
220236

221237
A `TraceIDRatioBased` Sampler configured with probability 2**-11 or
@@ -226,7 +242,7 @@ setting `tracestate: ot=r:0a`.
226242

227243
The reasoning behind restricting the set of sampling rates is that it:
228244

229-
- Lowers the cost of propagating head sampling probability
245+
- Lowers the cost of propagating parent sampling probability
230246
- Limits the number of random bits required
231247
- Avoids floating-point to integer rounding errors
232248
- Makes math involving partial traces tractable.
@@ -238,22 +254,23 @@ explains how to work with a limited number of power-of-2 sampling rates.
238254
### Behavior of the `TraceIDRatioBased` Sampler
239255

240256
The Sampler MUST be configured with a power-of-two probability
241-
expressed as `2**-s` with s being an integer in the range [0, 61]
242-
except for the special case of zero probability.
257+
expressed as `2**-s` with s being an integer in the range [0, 62]
258+
except for the special case of zero probability (in which case `p=63`
259+
is used).
243260

244261
If the context is a new root, the initial `tracestate` must be created
245-
with randomness value `r`, as described above, in the range [0, 61].
262+
with randomness value `r`, as described above, in the range [0, 62].
246263
If the context is not a new root, output a new `tracestate` with the
247264
same `r` value as the parent context.
248265

266+
In both cases, set the sampled bit if the outgoing `p` is less than or
267+
equal to the outgoing `r` (i.e., `p <= r`).
268+
249269
When sampled, in both cases, the context's p-value `p` is set to the
250270
value of `s` in the range [0, 62]. If the sampling probability is
251271
zero (the special case where `s` is undefined), use `p=63` the
252272
specified value for zero probability.
253273

254-
In both cases, set the sampled bit if the outgoing `p` is less than or
255-
equal to the outgoing `r` (i.e., `p <= r`).
256-
257274
If the context is not a new root and the incoming context's r-value
258275
is not set, the implementation SHOULD notify the user of an error
259276
condition and follow the incoming context's `sampled` flag.
@@ -262,12 +279,12 @@ condition and follow the incoming context's `sampled` flag.
262279

263280
The `ParentBased` sampler is unmodified by this proposal. It honors
264281
the W3C `sampled` flag and copies the incoming `tracestate` keys to
265-
the child context. If the incoming context has known head sampling
282+
the child context. If the incoming context has known parent sampling
266283
probability, so does the Span.
267284

268-
The span's head probability is known when both `p` and `r` are defined
269-
are defined in the `ot` sub-key of `tracestate`. When `r` or `p`
270-
areis not defined, the span's head sampling probability is unknown.
285+
The span's parent sampling probability is known when both `p` and `r`
286+
are defined in the `ot` sub-key of `tracestate`. When `r` or `p` are
287+
not defined, the span's parent sampling probability is unknown.
271288

272289
### Behavior of the `AlwaysOn` Sampler
273290

@@ -298,10 +315,11 @@ Values of `p` are interpreted as follows:
298315
| 6 | 64 |
299316
| 7 | 0 |
300317

301-
Note there are only 6 non-zero, non-unknown values for the adjusted
302-
count. Thus there are six defined values of `r` and `s`. The
303-
following table shows `r` and the corresponding selection probability,
304-
along with the calculated adjusted count for each `s`:
318+
Note there are only seven known non-zero values for the adjusted count
319+
(`p`) ranging from 1 to 64. Thus there are seven defined values of `r`
320+
and `s`. The following table shows `r` and the corresponding
321+
selection probability, along with the calculated adjusted count for
322+
each `s`:
305323

306324
| `r` value | probability of `r` | `s=0` | `s=1` | `s=2` | `s=3` | `s=4` | `s=5` | `s=6` |
307325
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
@@ -315,12 +333,11 @@ along with the calculated adjusted count for each `s`:
315333

316334
Notice that the sum of `r` probability times adjusted count in each of
317335
the `s=*` columns equals 1. For example, in the `s=4` column we have
318-
`0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/64 + 16*1/64 =
319-
16/32 + 16/64 + 16/64 = 1`. In the `s=2` column we have `0*1/2 +
320-
0*1/4 + 4*1/8 + 4*1/16 + 4*1/32 + 4*1/64 + 4*1/64 = 4/8 + 4/16 +
321-
4/32 + 4/64 + 4/64 = 1/2 + 1/4 + 1/8 + 1/16 + 1/16 = 1`. We conclude
322-
that when `r` is chosen with the given probabilities, any choice of
323-
`s` produces one expected span.
336+
`0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/64 + 16*1/64 = 1/2 +
337+
1/4 + 1/4 = 1`. In the `s=2` column we have `0*1/2 + 0*1/4 + 4*1/8 +
338+
4*1/16 + 4*1/32 + 4*1/64 + 4*1/64 = 1/2 + 1/4 + 1/8 + 1/16 + 1/16 = 1`.
339+
We conclude that when `r` is chosen with the given probabilities,
340+
any choice of `s` produces one expected span.
324341

325342
## Invariant checking
326343

@@ -334,7 +351,7 @@ respect to the incoming and outgoing values for `p`, `r`, and
334351
| TraceIDRatio(Non-Root) | used | unused | ignored | checked and passed through | set to `s` | set to `p <= r` |
335352
| TraceIDRatio(Root) | n/a | n/a | n/a | random variable | set to `s` | set to `p <= r` |
336353

337-
There are several cases where the resulting span's head sampling
354+
There are several cases where the resulting span's parent sampling
338355
probability is unknown:
339356

340357
| Sampler | Unknown condition |
@@ -360,18 +377,18 @@ as discussed below.
360377

361378
The violation is always addressed by honoring the `sampled` flag and
362379
correcting `p` to either 63 (for zero adjusted count) or unset (for
363-
unknown adjusted count).
380+
unknown parent sampling probability).
364381

365382
If `sampled` is false and the invariant is violated, drop `p` from the
366-
outgoing context to convey unknown head probability.
383+
outgoing context to convey unknown parent sampling probability.
367384

368385
The case where `sampled` is true with `p=63` indicating 0% probability
369386
may by regarded as a special case to allow zero adjusted count
370387
sampling, which permits non-probabilistic sampling to take place in
371388
the presence of probability sampling. Set `p` to 63.
372389

373390
If `sampled` is true with `p<63` (but `p>r`), drop `p` from the
374-
outgoing context to convey unknown head probability.
391+
outgoing context to convey unknown parent sampling probability.
375392

376393
## Prototype
377394

@@ -399,9 +416,9 @@ way with respect to the bits of the TraceID.
399416

400417
### Not using TraceID randomness
401418

402-
It would be possible, if TraceID were specified to have at least 61
419+
It would be possible, if TraceID were specified to have at least 62
403420
uniform random bits, to compute the randomness value described above
404-
as the number of leading zeros among those 61 random bits.
421+
as the number of leading zeros among those 62 random bits.
405422

406423
However, this would require modifying the W3C traceparent specification,
407424
therefore we do not propose to use bits of the TraceID.
@@ -422,15 +439,15 @@ data to avoid the computational cost of hashing TraceIDs.
422439

423440
### Restriction to power-of-two
424441

425-
Restricting head sampling rates to powers of two does not limit tail
442+
Restricting parent sampling probabilities to powers of two does not limit tail
426443
Samplers from using arbitrary probabilities. The companion [OTEP
427444
170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md) has discussed
428445
the use of a `sampler.adjusted_count` attribute that would not be
429446
limited to power-of-two values. Discussion about how to represent the
430447
effective adjusted count for tail-sampled Spans belongs in [OTEP
431448
170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md), not this OTEP.
432449

433-
Restricting head sampling rates to powers of two does not limit
450+
Restricting parent sampling probabilities to powers of two does not limit
434451
Samplers from using arbitrary effective probabilities over a period of
435452
time. For example, a typical trace sampling rate of 5% (i.e., 1 in
436453
20) can be accomplished by choosing 1/16 sampling 60% of the time and

0 commit comments

Comments
 (0)