Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default trace-id format to be similar to AWS X-Ray (use timestamp) #1947

Open
bogdandrutu opened this issue Sep 21, 2021 · 11 comments
Open
Assignees
Labels
area:sampling Related to trace sampling area:sdk Related to the SDK spec:trace Related to the specification/trace directory

Comments

@bogdandrutu
Copy link
Member

What are you trying to achieve?

The motivation for this change is to try to help backends/processing of trace data. If we add a "timestamp" as the first 32-bits into the generated trace-id, it will ensure that trace data (full traces) are generated (sent to the backend) in a "pseudo" order. The order cannot be guaranteed, but it is still an improvement for different backends (stores) to write events in a more deterministic order than completely random.

What did you expect to see?

A change to the default trace-id generator that specifies that the trace-id should have the first 32-bits as timestamp.

Additional context.

  1. This will be a backwards compatible change since the trace-id is right now consider to be opaque.
  2. For backends that do not work well with this a simple deterministic "hash" of the trace-id (shuffle bits is enough) can be used to avoid ordering. Unfortunately the opposite cannot be achieved, so it will help to include the timestamp from the source.

Add any other context about the problem here. If you followed an existing documentation, please share the link to it.

See AWS X-ray definition of the trace-id:
https://docs.aws.amazon.com/xray/latest/devguide/xray-api-sendingdata.html

The suggestion is not to follow that model exactly but just to encode the same "timestamp" as the left most 8 hex bytes of the trace-id.

@bogdandrutu bogdandrutu added the spec:trace Related to the specification/trace directory label Sep 21, 2021
@Oberon00
Copy link
Member

I think the important thing is that we are on the same page on whether the left or right bytes are random or if it does not matter, for use with probability sampling. For example I think an old UUID v1 has the timestamp as first bytes and the last bytes are a static node ID. https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.2 Not sure if these are still relevant though.

I think it should not matter much for the "precision" of probability sampling whether we use a 32 or 64 bit number as input or maybe we decided to use some hash function anyway (CC @oertl @jmacd).

@oertl
Copy link

oertl commented Sep 21, 2021

@Oberon00 The smallest sampling rate that you can achieve with X random bits is 1/2^X. For 32 bits we have 1/2^32 = 2.3E-10. I am not 100% sure, if this limit is future-proof.

@Oberon00
Copy link
Member

Sorry, wrong number! If we are talking about the trace ID and only 32 bit timestamp within that, we would still have 128 - 32 = 96 random bits within that, and for sampling the 64 bit span ID is more relevant anyway which would still be fully random with this suggestion?

@bogdandrutu
Copy link
Member Author

Also this is how OpenTelemetry will generate, backends should not expect that all IDs have this structure, but naturally if most of the IDs are genereting from Otel then they can benefit.

@Oberon00 Oberon00 added area:sampling Related to trace sampling area:sdk Related to the SDK labels Sep 21, 2021
@yurishkuro
Copy link
Member

I think this is a major feature that requires an OTEP that should go into details on motivation and trade-offs. The current two-sentence motivation raises way more questions than it answers, like:

  • what exactly are the backends supposed to do with the timestamp portion, why is it beneficial
  • what happens if trace-id is generated on end-user device with timestamp being 10yrs off

@bogdandrutu
Copy link
Member Author

bogdandrutu commented Sep 21, 2021

@yurishkuro happy to do an OTEP if there is interest into doing this.

what exactly are the backends supposed to do with the timestamp portion, why is it beneficial

As mentioned in the issue, it is hard to guarantee that the timestamp will be present and correctly set. This is a small optimization for backends that store the trace-id as a key in a backend that does better when data are "kind of sorted" like Cassandra, HBase, etc.

Later if we want to guarantee the timestamp is present we can propose a "bit" in the trace-flags that when set the first 32 bits are guaranteed to be a timestamp.

what happens if trace-id is generated on end-user device with timestamp being 10yrs off

I don't want to enter into the backend design, but things like dropping old "traces" can be implemented once we know for sure that the timestamp is present.

@yurishkuro
Copy link
Member

This is a small optimization for backends that store the trace-id as a key in a backend that does better when data are "kind of sorted" like Cassandra, HBase, etc.

I would prefer clear description / open spec for those optimizations in the OTEP. Cf. sampling OTEPs from @jmacd which honestly discuss costs/benefits, without references/implications of vendor-only "secret sauce".

@tigrannajaryan
Copy link
Member

FYI, ULIDs are an open standard for 128 bit globally unique ids with the first 48 bits being a timestamp and 80bits of randomness (ULIDs are implemented in most languages so should be easy to use in our SDKs):

 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

@tedsuo
Copy link
Contributor

tedsuo commented Sep 21, 2021

Given that spans are guaranteed to include a timestamp field, start_time_unix_nano , what is the advantage of repeating this information in the trace ID? Why not configure your DB to sort by start time? This is guaranteed to work even when the trace ID is fully random, padded with zeros, etc.

I assume this is a technical limitation in Cassandra, etc, but it would be helpful to have it spelled out in the proposal.

One thing I will point out: this will be a very permanent change, and we give up the possibility of ~128-bits of randomness in the process. So when deciding this please consider what people will want in ten years, not just today.

@anuraaga
Copy link
Contributor

Why not configure your DB to sort by start time?

This requires the DB to support secondary indexes and there is cost in maintaining those. Having a timestamp in the ID can provide some more flexibility for backends, especially to allow non-relational DBs.

For example, AFAIK object store providers generally provide a way to list objects after a certain prefix. A dirt simple ingestion backend could be implemented in probably just a few lines of code, save spans to a file traceid/spanid.json. If the traceids were sortable, this would be surprisingly usable since you'd be able to query for a time window by converting the time window to a prefix query for listing files, such as using startOffset parameter to Cloud Storage's list objects API.

Just throwing out there what could open up when the primary ID used during injection is sortable by time.

@bogdandrutu
Copy link
Member Author

bogdandrutu commented Sep 22, 2021

@tigrannajaryan that format sounds very good, would be interested to compare both options (Xray format vs ULIDs). I would prefer to not reinvent the wheel when it comes to formats, so I think these are the two options on the table for the moment.

@tedsuo:

Given that spans are guaranteed to include a timestamp field, start_time_unix_nano , what is the advantage of repeating this information in the trace ID? Why not configure your DB to sort by start time? This is guaranteed to work even when the trace ID is fully random, padded with zeros, etc.

This is not about a single Span, the trace-id will include the "start_time" of the root span and not the start_time of individual spans, it is a big difference when it comes to indexing and querying. Also this is very similar reason on why @jmacd proposes "p" value to be propagated, similar argument can be made there, if every span records it's own "p" value the propagation is unnecessary.

I assume this is a technical limitation in Cassandra, etc, but it would be helpful to have it spelled out in the proposal.

Did I not say that?

One thing I will point out: this will be a very permanent change, and we give up the possibility of ~128-bits of randomness in the process. So when deciding this please consider what people will want in ten years, not just today.

No it is not. There is no guarantee (unless we mark some trace flag, or trace state, or other place) that the ID has any specific format (opaque value). OpenTelemetry still has the ability to change the IDs generator so any "backend/vendor" has either to support any opaque value, or needs to ask the customers to ensure all IDs follow a specific format.

This proposal does not ask to change "trace-id" format which is still a 16-byte array (opaque value), this proposal asks only to change the default "implementation" of the Id Generators to follow a format where we have some sort of ordering.

Since the TraceID is still an "opaque" 16-byte array this change can be considered backwards compatible, the only property required by the w3c spec is to be "globally" unique, which after this change will not be affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sampling Related to trace sampling area:sdk Related to the SDK spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

7 participants