Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to detect virtual nodes in the servicegraph processor #2365

Merged
merged 12 commits into from
May 12, 2023

Conversation

mapno
Copy link
Member

@mapno mapno commented Apr 25, 2023

What this PR does:

Adds the ability to detect virtual nodes in the servicegraph processor.

Virtual nodes are nodes that form part of the lifecycle of a trace, but spans for them are not being collected because they're outside of the user's reach (eg. an external service for payment processing) or are not instrumented (eg. a frontend application).

Virtual nodes can be detected in two different ways

  • The root span has span.kind set to server. This indicates that the request has initiated by an external system that's not instrumented, like a frontend application or an engineer via curl
  • A client span does not have its matching server span, but has a peer attribute present. In this case we make the assumption that a call was made to an external service, for which Tempo won't receive spans.
    • The default peer attributes are peer.service, net.peer.name, net.sock.peer.name, rpc.service.key, net.sock.peer.addr, http.url, http.target.
    • The order of the attributes is important, as the first one that is present will be used as the virtual node name.

PRs in the OTel collector implementation

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mapno mapno marked this pull request as ready for review April 26, 2023 09:34
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - A few small q's and comments.

func (p *Processor) upsertPeerNode(e *store.Edge, spanAttr []*v1_common.KeyValue) {
for _, peerKey := range p.Cfg.PeerAttributes {
if v, ok := processor_util.FindAttributeValue(peerKey, spanAttr); ok {
e.PeerNode = v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this logic - is it correct to say that the order of PeerAttributes is their priority? If net.sock.peer.addr exists it will always be used first. If so let's add a note to the documentation snippet in service_graphs.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the order of peerAttributes is the priority, I would suggest that names be preferred over addresses in the default list. Also, does the code look at peer.service anywhere? There is an otel way to map host name or ip address to peer.service: https://opentelemetry.io/docs/instrumentation/java/automatic/agent-config/#peer-service-name and see https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/span-general/#general-remote-service-attributes as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestions. Applied both.

@@ -258,6 +278,17 @@ func (p *Processor) onComplete(e *store.Edge) {

func (p *Processor) onExpire(e *store.Edge) {
p.metricExpiredEdges.Inc()

e.ConnectionType = store.VirtualNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is detecting an unmatched edge and registering the virtual node if conditions are right. Could we add a comment? Should e.ConnectionType only be set if one of the conditions is true? I think always setting it (even if not needed) is a little unclear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to clarify what this is doing.

Should e.ConnectionType only be set if one of the conditions is true?

Doesn't really matter, as if it doesn't match one of the two conditions to be considered a virtual node, it won't be collected.

Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc changes look good. Thank you for updating the docs.

@mapno mapno enabled auto-merge (squash) May 12, 2023 10:50
@mapno mapno merged commit 264ab60 into grafana:main May 12, 2023
@mapno mapno deleted the peer-node-service-graph branch May 12, 2023 11:03
@mohammadjizi
Copy link

Hi. Can i know please when this feature will be released?

@joe-elliott
Copy link
Member

This will be in Tempo 2.2. There is no set release date, but due to our normal cadence it will likely be cut late July/early August

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants