Randomness requirements following W3C Trace Context level 2 #4162

jmacd · 2024-07-25T15:00:31Z

Changes

Updates Trace SDK and Propagator specifications with

List W3C propagator requirements (e.g., should propagate tracestate)
Introduce W3C Trace Context Level 2 w/ the Random flag
Define explicit randomness feature from OTEP 235, OTEP 261
Trace SDK default ID generator should include 56 bits of randomness in the correct location
Trace SDK for root spans: either set the random flag to confirm the above, or use an explicit randomness value.

Part of #1413.
Part of #3602.

Product of the Sampling SIG members @kentquirk @kalyanaj @oertl @PeterF778 and myself.

specification/trace/sdk.md

marcalff

See nit comment about text formatting, for diff

specification/trace/sdk.md

marcalff

LGTM.

specification/trace/sdk.md

specification/context/api-propagators.md

specification/trace/tracestate-handling.md

specification/context/api-propagators.md

…ication into jmacd/sampling_new

Thanks Co-authored-by: Robert Pająk <pellared@hotmail.com> Co-authored-by: Kent Quirk <kentquirk@gmail.com>

…pecification into jmacd/sampling_new

github-actions · 2024-08-15T03:17:25Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

…ication into jmacd/sampling_new

jmacd · 2024-08-15T14:52:50Z

@open-telemetry/specs-trace-approvers @open-telemetry/specs-approvers @open-telemetry/technical-committee this PR has reached consensus in the Sampling SIG, we have multiple prototypes implemented, and we are looking for final approvals.

kentquirk

The Sampling Sig seems to agree that this is ready now.

github-actions · 2024-08-28T03:17:27Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

…ication into jmacd/sampling_new

jmacd · 2024-08-29T14:30:03Z

@open-telemetry/specs-trace-approvers @open-telemetry/specs-approvers @open-telemetry/technical-committee this PR has reached consensus in the Sampling SIG, we have multiple prototypes implemented, and we are looking for final approvals.

github-actions · 2024-09-06T03:17:33Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

dyladan

LGTM overall. Seems to work around the missing traceflag propagation nicely.

One question I have for you that I would take back to the w3c: with this in place, is the randomness trace flag actually providing any value? Seems like tracestate is being set all the time anyway, so the flag isn't serving the original purpose it was meant to serve.

dyladan · 2024-09-06T16:58:03Z

specification/context/api-propagators.md

@@ -355,6 +356,17 @@ Additional `Propagator`s implementing vendor-specific protocols such as AWS
 X-Ray trace header protocol MUST NOT be maintained or distributed as part of
 the Core OpenTelemetry repositories.

+### W3C Trace Context Requirements
+
+A W3C Trace Context propagator MUST parse and set the `traceparent` and `tracestate` HTTP headers as specified in [W3C Trace Context Level 2](https://www.w3.org/TR/trace-context-2/).


This seems to imply we always want to set tracestate regardless of the sampling strategy used. Is that the intention?

Revised in 56d8e26. I want to say that both headers are validated, and that tracestate only propagates when it is not empty. Note however, that the larger proposal is for AlwaysOn samplers to set ot=th:0 on all spans that have 100% sampling; samplers that do not set a tracestate shouldn't cause tracestate to propatate.

dyladan · 2024-09-06T16:59:24Z

specification/trace/api.md

+- [Sampled](https://www.w3.org/TR/trace-context-2/#sampled-flag)
+- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)
+
+`TraceState` carries vendor-specific trace identification data, represented as a list of key-value pairs.


nit: we (the w3c) are trying to move away from "vendor" terminology to "tracing system" or similar

dyladan · 2024-09-06T17:02:39Z

specification/trace/sdk.md

@@ -466,6 +482,51 @@ The following configuration properties should be available when creating the sam
 [jaeger-remote-sampling-api]: https://www.jaegertracing.io/docs/1.41/apis/#remote-sampling-configuration-stable
 [jaeger-adaptive-sampling]: https://www.jaegertracing.io/docs/1.41/sampling/#adaptive-sampling

+### Sampling Requirements
+
+The [W3C Trace Context Level 2][W3CCONTEXTMAIN] Candidate Recommendation includes [a Random trace flag][W3CCONTEXTRANDOMFLAG] for indicating that the TraceID contains 56 random bits, specified for statistical purposes.


Someone raised to my attention that 56 bits is actually a problem when using 64 bit floating points as the max safe integer is 2^53 - 1. I've been meaning to raise this with the w3c. It's not really an issue here because you're following the CR correctly, but I thought it was worth pointing out. Does this SIG have a take on that particular issue?

Have you raised this with the w3c yet? Is this an issue because Javascript uses floats for all numbers? Seems like a pretty important patch to be made so that we are all consistent. I had recently been trying to track down why in Erlang we were using 2^63 - 1, which was what some others were doing, but some were doing 2^64.

It'd be great to not only be consistent across languages but to document why a value is chosen so we can answer these questions.

And Javscript is a pretty big user base :)

@dyladan Is this issue specific to JavaScript? If so, for such use cases (that need to look at the last 56 bits of the traceid as an integer), couldn't something like BigInt be used?

I believe this should not be a problem. I assume the JS libraries have some way to create the randomness required for TraceIDs. It is possible to treat the random strings used for sampling as either bytes or hexadecimal strings and use lexicographical comparison as opposed to numerical comparison.

In an older prototype of the OTel Collector Contrib Sampling package I made a side-by-side comparison of the numerical vs lexicographical method, and IMO the numerical methods are a bit easier to read and performed a little better, so I kept the numerical form and removed the lexicographical implementation. I am having trouble finding that implementation, but there are comments left in pkg/sampling/encoding_test.go with a bit of history:

// There were two benchmarks used to choose the implementation for the // Threshold type in this package. The results indicate that it is // faster to compare a 56-bit number than to compare as 7 element // []byte. // The current implementation, using unsigned: // // BenchmarkThresholdCompareAsUint64-10 1000000000 0.4515 ns/op 0 B/op 0 allocs/op // // vs the tested and rejected, using bytes: // // BenchmarkThresholdCompareAsBytes-10 528679580 2.288 ns/op 0 B/op 0 allocs/op

The ability to perform lexicographical comparison is the reason the reviewers have insisted that we specify lower-case hexadecimal for the threshold and randomness fields (the same is true in W3C Trace Context, traceparent requires lowercase hex too)-- you can perform use 100% string manipulations and comparison to implement this sampling logic.

dyladan · 2024-09-06T17:03:46Z

specification/trace/sdk.md

+This flag indicates that [the least-significant ("rightmost") 7 bytes or 56 bits of the TraceID are random][W3CCONTEXTTRACEID].
+
+Note the Random flag does not propagate through [Trace Context Level 1][W3CCONTEXTLEVEL1] implementations, which do not recognize the flag.
+Therefore, this flag is considered meaningful only when it is set on a root span context.


The flag should be considered meaningful any time it is received. Level 1 will only set it to 0 so if you see 1, you know the sender is Level 2+

I replaced this sentence with "When this flag is 1, it is considered meaningful. When this flag is 0, it may be due to a non-random TraceID or because a Trace Context Level 1 propagator was used.".

specification/trace/sdk.md

…ication into jmacd/sampling_new

yuanyuanzhao3 · 2024-09-18T18:21:00Z

specification/trace/sdk.md

@@ -316,6 +324,14 @@ When asked to create a Span, the SDK MUST act as if doing the following in order
   `Span` is created without an SDK installed or as described in
   [wrapping a SpanContext in a Span](api.md#wrapping-a-spancontext-in-a-span).

+#### Span flags
+
+The OTLP representation for Span and Span Link include a 32-bit field declared as Span Flags.


Nit: "includes"?

yuanyuanzhao3 · 2024-09-19T01:03:02Z

specification/trace/sdk.md

@@ -316,6 +324,14 @@ When asked to create a Span, the SDK MUST act as if doing the following in order
   `Span` is created without an SDK installed or as described in
   [wrapping a SpanContext in a Span](api.md#wrapping-a-spancontext-in-a-span).


For list item (2):
(Note that the [built-in `ParentBasedSampler`](#parentbased) can be used to use the sampling decision of the parent, translating a set SampledFlag to RECORD and an unset one to DROP)

Does the built-in ParentBasedSampler, ..., translating a set SampledFlag to RECORD or RECORD_AND_SAMPLE?

jpkrohling

Overall, LGTM, only a question about being more explicit in how SDKs can accept the randomness values set by API users.

jpkrohling · 2024-09-19T06:46:06Z

spec-compliance-matrix.md

@@ -87,6 +87,8 @@ formats is required. Implementing more than one format is optional.
 | [Built-in `SpanProcessor`s implement `ForceFlush` spec](specification/trace/sdk.md#forceflush-1) |          |     | +    |     | +      | +    | +      | +   | +    | +   | +    |       |
 | [Attribute Limits](specification/common/README.md#attribute-limits)                              | X        |     | +    |     | +      | +    | +      | +   |      |     |      |       |
 | Fetch InstrumentationScope from ReadableSpan                                                     |          |     | +    |     | +      |      |        | +   |      |     |      |       |
+| TraceID generator implements W3C Trace Context Level 2 randomness                                |          |     |      |     |        |      |        |     |      |     |      |       |


should there be a couple of implementations before this is marked as required?

jpkrohling · 2024-09-19T12:04:30Z

specification/trace/api.md

+The current version of the specification supports two flags:
+
+- [Sampled](https://www.w3.org/TR/trace-context-2/#sampled-flag)
+- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)


Sorry if this has been asked before, but are we comfortable recommending the draft? Or are we merging this only once it's marked as recommended?

jpkrohling · 2024-09-19T12:10:06Z

specification/trace/sdk.md

+
+#### TraceID randomness
+
+For root span contexts, the SDK SHOULD implement the TraceID randomness requirements of the [W3C Trace Context Level 2][W3CCONTEXTTRACEID] Candidate Recommendation when generating TraceID values.


I guess this answers my previous question on whether we are OK using the CR :-)

jpkrohling · 2024-09-19T12:13:11Z

specification/trace/sdk.md

+
+#### Random trace flag
+
+For root span contexts, the SDK SHOULD set the `Random` flag in the trace flags when it generates TraceIDs that meet the [W3C Trace Context Level 2 randomness requirements][W3CCONTEXTTRACEID].


I can't think of any, but is there a disadvantage of using level 2 vs. level 1? Would I ever want to use level 1 when I'm aware of level 2?

jpkrohling · 2024-09-19T12:14:12Z

specification/trace/sdk.md

+
+#### User-defined explicit trace randomness
+
+Trace SDKs MAY permit users to setup explicit randomness by entering it into the [`rv` sub-key of the OpenTelemetry TraceState][OTELRVALUE] of the context before creating a root span.  This lets users have consistent sampling across traces.


Should this spec define how this is accomplished, so that it's consistent across SDKs?

jpkrohling · 2024-09-19T12:17:50Z

specification/trace/tracestate-handling.md

+hexdigit = DIGIT ; a-f
+```
+
+The explicit randomness value is meant to be used instead of extracting randomness from TraceIDs, therefore it contains the same number of bits as a W3C Trace Context Level 2 recommends for TraceIDs.


It might not belong to this doc, and I think I asked this already on another PR, but why would I do this? What are the use-cases behind this? I understand this is so that end-users can control the decisions, but do you have a specific use-case in mind?

jmacd commented Jul 25, 2024

View reviewed changes

specification/trace/sdk.md Show resolved Hide resolved

jmacd changed the title ~~Draft rules for span context in Trace Context level 2~~ Randomness requirements for W3C Trace Context level 2 Jul 26, 2024

jmacd changed the title ~~Randomness requirements for W3C Trace Context level 2~~ Randomness requirements following W3C Trace Context level 2 Jul 26, 2024

This was referenced Jul 29, 2024

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

Open

Prototype for W3C Trace Context Level 2 support in TraceIDRatioBased sampler open-telemetry/opentelemetry-go#5645

Draft

rebase

71750cd

jmacd force-pushed the jmacd/sampling_new branch from fa9ec4c to 71750cd Compare July 29, 2024 23:22

jmacd marked this pull request as ready for review July 29, 2024 23:24

jmacd requested review from a team July 29, 2024 23:24

github-actions bot assigned jack-berg Jul 29, 2024

marcalff reviewed Jul 30, 2024

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

line breaks

3c4b8fa

marcalff approved these changes Jul 30, 2024

View reviewed changes

trask reviewed Jul 30, 2024

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

kentquirk reviewed Aug 1, 2024

View reviewed changes

specification/context/api-propagators.md Outdated Show resolved Hide resolved

specification/trace/tracestate-handling.md Outdated Show resolved Hide resolved

specification/trace/tracestate-handling.md Show resolved Hide resolved

oertl reviewed Aug 1, 2024

View reviewed changes

specification/trace/tracestate-handling.md Outdated Show resolved Hide resolved

pellared reviewed Aug 1, 2024

View reviewed changes

specification/context/api-propagators.md Outdated Show resolved Hide resolved

jmacd and others added 7 commits August 7, 2024 07:47

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

a73e378

…ication into jmacd/sampling_new

revise trace randomness requirements for clarity

c174e36

Apply suggestions from code review

93dcd7a

Thanks Co-authored-by: Robert Pająk <pellared@hotmail.com> Co-authored-by: Kent Quirk <kentquirk@gmail.com>

clarify which contexts for each requirement

76c6c71

Merge branch 'jmacd/sampling_new' of github.com:jmacd/opentelemetry-s…

d1784fa

…pecification into jmacd/sampling_new

user-defined explicit randomness: may

b9ec958

lowercase hex only

6a2d6c5

github-actions bot added the Stale label Aug 15, 2024

jmacd removed the Stale label Aug 15, 2024

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

01d5aac

…ication into jmacd/sampling_new

kalyanaj approved these changes Aug 15, 2024

View reviewed changes

oertl approved these changes Aug 17, 2024

View reviewed changes

kentquirk approved these changes Aug 20, 2024

View reviewed changes

github-actions bot added the Stale label Aug 28, 2024

jmacd removed the Stale label Aug 29, 2024

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

6d8f30d

…ication into jmacd/sampling_new

github-actions bot added the Stale label Sep 6, 2024

jmacd removed the Stale label Sep 6, 2024

dyladan approved these changes Sep 6, 2024

View reviewed changes

tsloughter requested changes Sep 10, 2024

View reviewed changes

specification/trace/sdk.md Show resolved Hide resolved

jmacd added 6 commits September 12, 2024 10:00

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

2fbd8b2

…ication into jmacd/sampling_new

Clarify when tracestate and traceparent are propagated

56d8e26

tracing-system-specific, not vendor

7489914

idgenerator randomness

6ad9842

marker interface

47d3132

changelog fix

bfa4ca2

jpkrohling self-requested a review September 18, 2024 07:50

yuanyuanzhao3 approved these changes Sep 19, 2024

View reviewed changes

jpkrohling reviewed Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomness requirements following W3C Trace Context level 2 #4162

Randomness requirements following W3C Trace Context level 2 #4162

jmacd commented Jul 25, 2024 •

edited

Loading

marcalff left a comment

marcalff left a comment

github-actions bot commented Aug 15, 2024

jmacd commented Aug 15, 2024

kentquirk left a comment

github-actions bot commented Aug 28, 2024

jmacd commented Aug 29, 2024

github-actions bot commented Sep 6, 2024

dyladan left a comment

dyladan Sep 6, 2024

jmacd Sep 12, 2024

dyladan Sep 6, 2024

jmacd Sep 12, 2024

dyladan Sep 6, 2024

tsloughter Sep 10, 2024

kalyanaj Sep 12, 2024

jmacd Sep 12, 2024

dyladan Sep 6, 2024

jmacd Sep 12, 2024

yuanyuanzhao3 Sep 18, 2024

yuanyuanzhao3 Sep 19, 2024

jpkrohling left a comment

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

		@@ -316,6 +324,14 @@ When asked to create a Span, the SDK MUST act as if doing the following in order
		`Span` is created without an SDK installed or as described in
		[wrapping a SpanContext in a Span](api.md#wrapping-a-spancontext-in-a-span).


		#### TraceID randomness

		For root span contexts, the SDK SHOULD implement the TraceID randomness requirements of the [W3C Trace Context Level 2][W3CCONTEXTTRACEID] Candidate Recommendation when generating TraceID values.


		#### Random trace flag

		For root span contexts, the SDK SHOULD set the `Random` flag in the trace flags when it generates TraceIDs that meet the [W3C Trace Context Level 2 randomness requirements][W3CCONTEXTTRACEID].


		#### User-defined explicit trace randomness

		Trace SDKs MAY permit users to setup explicit randomness by entering it into the [`rv` sub-key of the OpenTelemetry TraceState][OTELRVALUE] of the context before creating a root span. This lets users have consistent sampling across traces.

Randomness requirements following W3C Trace Context level 2 #4162

Are you sure you want to change the base?

Randomness requirements following W3C Trace Context level 2 #4162

Conversation

jmacd commented Jul 25, 2024 • edited Loading

Changes

marcalff left a comment

Choose a reason for hiding this comment

marcalff left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 15, 2024

jmacd commented Aug 15, 2024

kentquirk left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 28, 2024

jmacd commented Aug 29, 2024

github-actions bot commented Sep 6, 2024

dyladan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpkrohling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd commented Jul 25, 2024 •

edited

Loading