Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add basic gateway tracing #8595

Merged
merged 13 commits into from
Apr 4, 2022
Merged

feat: add basic gateway tracing #8595

merged 13 commits into from
Apr 4, 2022

Conversation

guseggert
Copy link
Contributor

@guseggert guseggert commented Dec 8, 2021

@lidel's blurb for Release Notes (wip)

see copy text in tracing/doc.go

Added spans

  • The root span for gateway request is named Rateway.Request
    • http.target has "original" request
    • ResolvedPath has immutable IPFS path behind the original request (matters when /ipns/)
  • Each response format type has its own span:
    • Gateway.ServeRawBlock
    • Gateway.ServeCar
    • Gateway.ServeUnixFs
      • Gateway.ServeFile
      • Gateway.ServeDirectory

How to use Jaeger UI

One can use the jaegertracing/all-in-one Docker image to run a full Jaeger
stack and configure go-ipfs to publish traces to it (here, in an ephemeral
container):

$ docker run --rm -it --name jaeger \
    -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
    -p 5775:5775/udp \
    -p 6831:6831/udp \
    -p 6832:6832/udp \
    -p 5778:5778 \
    -p 16686:16686 \
    -p 14268:14268 \
    -p 14250:14250 \
    -p 9411:9411 \
    jaegertracing/all-in-one

Then, in other terminal, start go-ipfs with Jaeger tracing enabled:

$ IPFS_TRACING=1 IPFS_TRACING_JAEGER=1 ipfs daemon

Finally, the Jaeger UI is available at http://localhost:16686

Demo

CAR Block File Directory
2022-04-01_01-46 block_2022-04-01_01-47 2022-04-01_01-49 dir_2022-04-01_01-48

@guseggert guseggert linked an issue Dec 8, 2021 that may be closed by this pull request
9 tasks
@guseggert guseggert removed a link to an issue Dec 8, 2021
9 tasks
@guseggert guseggert mentioned this pull request Dec 8, 2021
9 tasks
cmd/ipfs/main.go Outdated Show resolved Hide resolved
core/coreapi/path.go Outdated Show resolved Hide resolved
core/coreapi/path.go Outdated Show resolved Hide resolved
@guseggert guseggert force-pushed the feat/tracing branch 4 times, most recently from 5cab996 to 9b9691c Compare December 16, 2021 16:17
@BigLep BigLep linked an issue Dec 16, 2021 that may be closed by this pull request
9 tasks
@guseggert guseggert force-pushed the feat/tracing branch 4 times, most recently from 791c811 to 2a4398d Compare December 16, 2021 20:35
Copy link
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

settings, prefix, err := options.UnixfsAddOptions(opts...)
if err != nil {
return nil, err
}

span.SetAttributes(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocations can be avoided here (and elsewhere when attributes are used) by only setting them when the span is recording. e.g.

if span.IsRecording() {
   // set stuffs
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, this is tedious for some of the one-liners that start a span and set attributes together; I wonder if there is an alternative...

Copy link
Member

@frrist frrist Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if it's worth adding an Attributes method to some of these types like settings, for example. It may be more appropriate for a follow-on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process here was that we already know we have way bigger fish to fry in terms of memory allocations, so I wasn't too concerned with this. Hopefully one day we get to the point where these matter...in the meantime, we can enjoy the simpler code

@guseggert guseggert marked this pull request as ready for review March 26, 2022 14:05
@guseggert guseggert self-assigned this Mar 26, 2022
@guseggert guseggert added the kind/feature A new feature label Mar 26, 2022
@guseggert guseggert added this to the go-ipfs 0.13 milestone Mar 26, 2022
@guseggert guseggert requested a review from lidel March 28, 2022 20:54
Copy link
Contributor

@iand iand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and I've been using this branch to study gateway performance characteristics.


// Span starts a new span using the standard IPFS tracing conventions.
func Span(ctx context.Context, componentName string, spanName string, opts ...traceapi.SpanStartOption) (context.Context, traceapi.Span) {
return otel.Tracer("go-ipfs").Start(ctx, fmt.Sprintf("%s.%s", componentName, spanName), opts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the argument to otel.Tracer should be an empty string to use the default instrumentating library name as per the documentaion. Note the distinction made between Instrumented Library and Instrumenting Library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this tracer name ("instrumentation name") is similar to our logging "subsystems", in that the application can use these to know which component a span comes from, and use that to decide things like whether to record the span at all (see e.g. https://opentelemetry.io/docs/instrumentation/java/automatic/agent-config/#suppressing-specific-agent-instrumentation and https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/span_processor_filter_example_test.go#L53 ). E.g. if an application uses this repo as a library, it can decide to drop all spans from go-ipfs using this instrumentation name. If we use empty string here, the default go.opentelemetry.io/otel/sdk/tracer will be used. See also https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#get-a-tracer .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is intended that the default tracer is used since we aren't changing the format of the traces. We already record the name of the system and the software used to record the trace in the Resource section. See, for example, this section of one trace. The otel documentation is not super clear but I feel that the presence of Version and SchemaURL fields in the InstrumentationLibrary object imply that this is related to the type of trace that has been produced.

    "Resource": [
        {
            "Key": "service.name",
            "Value": {
                "Type": "STRING",
                "Value": "go-ipfs"
            }
        },
        {
            "Key": "service.version",
            "Value": {
                "Type": "STRING",
                "Value": "0.13.0-dev"
            }
        },
        {
            "Key": "telemetry.sdk.language",
            "Value": {
                "Type": "STRING",
                "Value": "go"
            }
        },
        {
            "Key": "telemetry.sdk.name",
            "Value": {
                "Type": "STRING",
                "Value": "opentelemetry"
            }
        },
        {
            "Key": "telemetry.sdk.version",
            "Value": {
                "Type": "STRING",
                "Value": "1.6.1"
            }
        }
    ],
    "InstrumentationLibrary": {
        "Name": "go-ipfs",
        "Version": "",
        "SchemaURL": ""
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the service is more at the "application" scope, so that we can tell from which system the spans of a distributed trace originate.

I'm also new to these concepts, so I've generally been mimicking the provided examples, which seem to use the same distinction: https://github.com/open-telemetry/opentelemetry-go/tree/main/example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I don't feel strongly about it. It's possible that the SchemaURL/Version fields are intended to describe the schema of the tags that the library is writing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure what those are for. This is all clearly communicated as experimental so that we have some wiggle room to adjust these over time as we learn more.

}

// Span starts a new span using the standard IPFS tracing conventions.
func Span(ctx context.Context, componentName string, spanName string, opts ...traceapi.SpanStartOption) (context.Context, traceapi.Span) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a candidate for moving to a new library package. As we add tracing to more base libraries we will repeat this multiple times and we can add more helpers to keep things consistent and to reduce the number of imports needed to start a span.

I started one here https://github.com/iand/go-ipfs-tracing as part of my experiments with this branch and instrumenting other related libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense, I was waiting to create something like this until I used OpenTelemetry in a few repos to understand our usage patterns better. I'm also hesitant to make a tracing "kitchen sink", since it seems like a place that would accumulate dependencies over time and significantly contribute to our dependency hell.

// Span names follow a convention of <Component>.<Span>, some examples:
//
// - component=Gateway + span=Request -> Gateway.Request
// - component=CoreAPI.PinAPI + span=Verify.CheckPin -> CoreAPI.PinAPI.Verify.CheckPin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we tighten this convention up? I think we need to define what a component is and what convention we have for naming the span - is it the type name and method, or should it include the package somehow? We'll need this as soon as we start adding tracing to other ipfs libraries.

One convention could be that the component is the repo name go-ipfs/go-blockservice/go-namesys and the Span name is the function name, prefixed by the typename if it's a method. Additional spans within a function would use a name derived from this base. The convention would lead to names like:

  • go-ipfs.PinAPI.Add
  • go-blockservice.Session.GetBlock
  • go-namesys.ResolveIPNS

We could include package name but it would make the names much longer for no great benefit, go-ipfs.PinAPI.Add could be ``go-ipfs.core.coreapi.PinAPI.Add` (would also be extra maintenance burden to keep names in sync when refactoring)

This convention is not the prettiest but it would work consistently across all our libraries and gives signposts for developers looking for where a particular span was created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to be modified if we keep the library name in the Instrumentation Library field (although I don't know how/if that surfaces in the Jaeger UI for example)

Copy link
Contributor Author

@guseggert guseggert Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to protect our ability to refactor repos, modules, and packages without breaking tracing consumers, so I'd like to avoid coupling it to those as much as possible. Much like we do with logging subsystems (I toyed with the idea of coupling these with logging subsystems but gave up.) The downside though is that it makes it harder to find the code origin of traces, but I still don't think that's very hard with a quick GitHub search. Or we could add some middleware that adds an attribute to all spans with the source file + line info of where the span originated?

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took it for a spin, works as expected 👍

Added more Gateway.* spans in #8841 (needs to be merged into this PR) + small asks below.

tracing/doc.go Show resolved Hide resolved
test/sharness/t0310-tracing.sh Outdated Show resolved Hide resolved
@BigLep
Copy link
Contributor

BigLep commented Apr 1, 2022

2022-04-01: @lidel will incorporate feedback in #8841, then merge it here, review PR as a whole, merge!

Adds more visibility into how long generic lookup takes vs producing specific response type.
@lidel
Copy link
Member

lidel commented Apr 4, 2022

The only CI test that fails is go-ipfs-http-client, which is unrelated to changes from this PR and being fixed in #8838

Resolved conflicts, merging to avoid doing this again.

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can adjust / add more spans in future PRs.

@lidel lidel merged commit f855bfe into master Apr 4, 2022
@lidel lidel deleted the feat/tracing branch April 4, 2022 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A new feature
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add comprehensive tracing
6 participants