Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace http.target with http.path and http.query? #2056

Closed
trask opened this issue Oct 25, 2021 · 8 comments
Closed

Replace http.target with http.path and http.query? #2056

trask opened this issue Oct 25, 2021 · 8 comments
Assignees
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:trace Related to the specification/trace directory

Comments

@trask
Copy link
Member

trask commented Oct 25, 2021

What are you trying to achieve?

Less confusion for end users who are writing/configuring samplers and processors by replacing the confusing http.target with the much more commonly understood http.path and http.query.

Additional context.

http.target is confusing to end users because "target" is not a commonly understood part of the url (for example it's not mentioned anywhere in https://en.wikipedia.org/wiki/URL or https://datatracker.ietf.org/doc/html/rfc2616#section-3.2.2).

At the same time, it's one of the most important attributes for end users who are writing (or configuring) samplers and processors (e.g. the common health check suppressing sampler).

More additional context.

Most (but not all) Java server libraries already expose http.path + http.query separately instead of as the combined http.target (and in the couple of cases where http.target is exposed directly, it's - also confusingly - called "uri" or "path", not "target").

I'm not sure what the situation is for other languages, but in cases where the full http.target is exposed, at least parsing to http.path + http.query is not complex, since you only need to look for the first ? and split.

@trask trask added the spec:trace Related to the specification/trace directory label Oct 25, 2021
@Oberon00 Oberon00 added the area:semantic-conventions Related to semantic conventions label Oct 25, 2021
@Oberon00
Copy link
Member

The current HTTP conventions give you many options, in order to make writing instrumentations easy, and with the understanding that normalization to one of these choices can be done on the backend just as well and is sometimes lossy (or subtle).
In that spirit, you could offer these two attributes as additional choices that can optionally replace the target where it is required.

One point where the conversion is subtle: A target ending with ? and nothing after. You could represent that with query not set vs empty string.

One point where the conversion is lossy: The target can be a full URL. In that case, you lose the information that it was a full URL, and you have to decide for one http.host or store everything in a single http.url (which, by the way, can always be used instead of target)

@pyohannes
Copy link
Contributor

pyohannes commented Oct 26, 2021

I'm not sure what the situation is for other languages, but in cases where the full http.target is exposed, at least parsing to http.path + http.query is not complex, since you only need to look for the first ? and split.

According to the example in the current spec, http.target also includes the fragment. So for parsing out the query, one would need to look at the first ?, and the first # after that, taking url encoding into account.

/path/12314/?q=ddds#123

It might make sense to discuss that in the HTTP semantic convention workgroup. I don't have a strong opinion about splitting up http.target, but I definitely think we can improve on wording in the spec. If we decide to stay with http.target we should define it more precise, not as "full request target" (which is ambiguous, because I'm not sure if that necessarily contains the fragment), but rather as path + query + fragment (using industry standard terms).

@Oberon00
Copy link
Member

Oberon00 commented Oct 26, 2021

Oh, I just noticed we indeed lost the definition of http.target during some editiorial rewrite. See this old version: https://github.com/open-telemetry/opentelemetry-specification/blob/bdf39cb935a6f48badc687daeb78a66dc352761f/specification/trace/semantic_conventions/http.md#common-attributes

The http.target is well-defined in an RFC here: https://datatracker.ietf.org/doc/html/rfc7230#section-5.3 and we definitely need to add that link back, that's a spec defect.

Note also the asterisk form: OPTIONS * HTTP/1.1 would result in a http.target of * which is not really a path.

@pyohannes
Copy link
Contributor

The http.target is well-defined in an RFC here: https://datatracker.ietf.org/doc/html/rfc7230#section-5.3 and we definitely need to add that link back, that's a spec defect.

According to the RFC, a request target never includes a fragment (that's at least how I read it). But the example for http.target in the semantic conventions does. A link to the RFC from the semantic conventions would definitely be helpful.

@Oberon00
Copy link
Member

Oberon00 commented Oct 27, 2021

The fragment example is atypical. I recall there being text saying something to the effect of "the fragment is not usually included in the target, but if it is, you should keep it in the http.target_attribute i.e. not sanitize it"

@anuraaga
Copy link
Contributor

anuraaga commented Oct 28, 2021

The current HTTP conventions give you many options, in order to make writing instrumentations easy

Interestingly, we've found that the many options make writing instrumentation harder for us. There is a tendency to fill as many as possible, under the impression that it could provide more robust data. But it's wasteful performance-wise, but the alternative means picking one of the tuples, a decision-making process that adds effort to instrumentation authors.

In Java we've decided to make the decision for the instrumentation authors in the instrumentation API - client instrumentation always only populates URL, since it's generally more common, though not overwhelmingly so, to have access to a full URL in client instrumentation, and in server instrumentation always populate target / host / scheme, since it's overwhelmingly more common to not have access to a full URL in server libraries. Yet we've found, that we also rarely have access to a "target", the libraries almost always only provide path and query.

With yet two more attributes, we could make the decision in the instrumentation API again to expect path and query, not target, so the experience for instrumentation authors wouldn't be that bad. But I think it makes the spec even harder to follow with yet more options.

@ericsampson
Copy link

I landed on this issue because I read through the http semantic conventions document and was unable to find the standard name for query params.
So you can treat this as another user experience report :)

@trask
Copy link
Member Author

trask commented Feb 9, 2023

Closing, this will either happen in #3181 as part of ECS alignment or not at all (due to high bar for breaking changes at this point).

@trask trask closed this as completed Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:trace Related to the specification/trace directory
Projects
Development

No branches or pull requests

6 participants