Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HTTP metrics for Connectors #6067

Open
wants to merge 9 commits into
base: next
Choose a base branch
from

Conversation

pubmodmatt
Copy link

@pubmodmatt pubmodmatt commented Sep 26, 2024

Add standard metrics for connectors HTTP requests and responses:

  • http.client.request.duration
  • http.client.request.body.size
  • http.client.response.body.size

Selectors are defined for:

  • subgraph_name
  • connector_source
  • connector_http_request_header
  • connector_http_response_header
  • connector_http_response_status
  • connector_http_method
  • connector_url_template
  • static

The implementation uses the HTTP client service, as a single call to the connector service can result in multiple HTTP calls. The metrics make more sense at the HTTP client level anyway. However, since the instrumentation types are generic over a single request/response type, this means we'll be unable to have any other instruments at the connector service level that use the same selectors. However, that should be fine as the interesting data is at the HTTP level for connectors.

Since the HTTP client service is not public, this required making the telemetry plugin private. This in turn required some small changes to the plugin test harness, which was previously incompatible with private plugins.

Planning ahead for future Connectors beyond the HTTP REST API connector, the type names include Http where appropriate so that additional types can be created in the future without conflict or confusion. For example, in addition to ConnectorHttpSelector, there might eventually be a ConnectorGrpcSelector. These need to be different types, because the Selector implementation has associated request and response types. These are HttpRequest and HttpResponse for the HTTP implementation but would be something else for gRPC.

Additionally, the configuration settings are under http under connectors, to accommodate future configuration.

Examples

Add a metric for the number of 404 response codes from a particular connector source API:

telemetry:
  instrumentation:
    instruments:
      connector:
        http:
          not.found.count:
            value: unit
            type: counter
            unit: count
            description: "Count of 404 responses from the user API"
            condition:
              all:
                - eq:
                  - 404
                  - connector_http_response_status: code
                - eq:
                  - "user_api"
                  - connector_source: name

Add connector attributes to the http.client.request.duration instrument:

http.client.request.duration:
  attributes:
    subgraph.name: true
    connector.source:
      connector_source: name
    connector.http.method: true
    connector.url.template: true

Create a histogram of the remaining rate limit from an API based on a response header value:

rate.limit:
  value:
    connector_http_response_header: "x-ratelimit-remaining"
  unit: count
  type: histogram
  description: "Rate limit remaining"
  condition:
    eq:
      - "user_api"
      - connector_source: name

It's also now possible to configure events for connectors:

telemetry:
  instrumentation:
    events:
      connector:
        http:
          request: off
          response: off
          error: error
          connector.request.event:
            message: "Connector request"
            level: info
            on: request
            attributes:
              http_method:
                connector_http_method: true
              url_template:
                connector_url_template: true
          connector.response.event:
            message: "Connector response"
            level: info
            on: response
            attributes:
              http_method:
                connector_http_method: true
              url_template:
                connector_url_template: true
              response_status:
                connector_http_response_status: code

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@pubmodmatt pubmodmatt self-assigned this Sep 26, 2024
@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from 6459d7c to 6de78c3 Compare September 26, 2024 20:31
Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of config for instruments:

telemetry:
  instrumentation:
    instruments:
      subgraph:
        http.client.request.duration:
          attributes:
            subgraph.name: true
            connector.source:
              connector_source: name
            subgraph.http.method: true
            subgraph.graphql.operation.type: string
            connector.url.template: true
        specific.metrics.for.a.connector:
          condition:
            eq:
            - service_kind: true
            - connector
          value:
            connector_http_response_header: "x-ratelimit-remaining"
          unit: count
          type: histogram
          description: "my desc"

We also have to think about testing other instrumentation elements like events and spans using the new selectors and attributes

@@ -264,6 +272,78 @@ impl DefaultForLevel for SubgraphAttributes {
}
}

#[derive(Deserialize, JsonSchema, Clone, Default, Debug)]
#[serde(deny_unknown_fields, default)]
pub(crate) struct ConnectorHttpAttributes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put this in subgraph attributes. Let's only keep subgraph attributes for now, and as it's namespaced by connector. It will be enough, I also think we shouldn't have both connector.http.method and subgraph.http.method we should just keep the subgraph one for now. If people wants to know if it's a connector or a subgraph I would suggest adding in SubgraphSelelector something like service_kind which could be connector or subgraph. (I don't think service_kind is the right naming but you got the idea).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that the relationship between subgraph and connector is not 1-to-1. There can be a number of connectors on a given subgraph, and there can be multiple calls to each connector per GraphQL request. So a connector isn't really a "kind" of subgraph - the subgraph service lifecycle stage is not invoked for connectors. The lifecycle is really now like this:

                 ------ subgraph service ------ http client service
                |
fetch service --|
                |                                ------- http client service
                 ------ connector service ------|
                                                 ------- http client service

The fetch service and connector service aren't currently exposed as plugin lifecycle hooks, but likely will be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm indeed, I didn't know we could have several http calls for a connector. Any idea @BrynCooke ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's here is the way forward, and then we add http_client service support to the telemetry plugin.

I know it's a big job, but it feels like the right thing to do.

@@ -81,6 +86,11 @@ pub(crate) struct InstrumentsConfig {
SubgraphInstrumentsConfig,
Instrument<SubgraphAttributes, SubgraphSelector, SubgraphValue>,
>,
/// Connector service instruments. For more information see documentation on Router lifecycle.
pub(crate) connector: Extendable<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think everything should be in subgraph for now. To keep only subgraph. Because I think usually people will want to have the same attributes even if it's a connector or subgraph (if not they would still be able to know using the selector and a condition to add specific attributes or metrics if it's a connector for example).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue here is that HTTP connectors are only one kind of connector - there will be more. So unlike subgraphs, where every request is HTTP, we may have connector requests that are SQL, gRPC, etc. As we go forward with these, sharing attributes between connectors and subgraph would make less and less sense, since the attributes would never apply to a subgraph.

}
}

pub(crate) fn new_builtin_connector_instruments(&self) -> HashMap<String, StaticInstrument> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is basically the same than subgraph instruments so probably one more reason to merge everything in subgraph

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request and response types limit things here. You can't have attributes or selectors that are for both subgraph request/response and http request/response - they can only be generic over one type of request/response. That limits how things can be arranged.

@@ -3569,4 +3852,203 @@ mod test {
Some("default".into())
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tests including both subgraph calls and connectors calls to make sure it works properly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test with both connectors and subgraph, let me know if it's sufficient.

apollo-router/src/plugins/telemetry/mod.rs Show resolved Hide resolved
@bnjjj
Copy link
Contributor

bnjjj commented Sep 27, 2024

cc @BrynCooke I need your review here too

Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this is looking good. Can we pull the connectors stuff out into a separate module though rather than adding to the already large attributes.rs instruments.rs and selectors.rs

We should eventually be aiming for a separate module for each router pipeline stage.

Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just needs more tests for instruments but also for events and spans, feel free to ask if you need help or if you don't find these existing tests

Comment on lines 3856 to 3888
const TEST_SUBGRAPH_NAME: &str = "test_subgraph_name";
const TEST_SOURCE_NAME: &str = "test_source_name";
const TEST_URL_TEMPLATE: &str = "/test";
const TEST_HEADER_NAME: &str = "test_header_name";
const TEST_HEADER_VALUE: &str = "test_header_value";
const TEST_STATIC: &str = "test_static";

#[fixture]
fn connector_info() -> ConnectorInfo {
ConnectorInfo {
subgraph_name: TEST_SUBGRAPH_NAME.to_string(),
source_name: Some(TEST_SOURCE_NAME.to_string()),
http_method: HTTPMethod::Get.as_str().to_string(),
url_template: TEST_URL_TEMPLATE.to_string(),
}
}

#[fixture]
fn context(connector_info: ConnectorInfo) -> RouterContext {
let context = RouterContext::default();
context
.insert(CONNECTOR_INFO_CONTEXT_KEY, connector_info)
.unwrap();
context
}

#[fixture]
fn http_request(context: RouterContext) -> HttpRequest {
HttpRequest {
http_request: http::Request::builder().body("".into()).unwrap(),
context,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel confident introducing this technique to debug. We try as much as we can to lower the compilation time and if we introduce macros in our tests it will increase the time. Could you try to use the same kind of tests we did before. If it's something you would like to introduce in the router codebase then I would suggest to talk about this during one of our router arch meeting because we need opinion from the team. Also we have yaml based tests in apollo-router/src/plugins/telemetry/config_new/fixtures Could you add more tests directly there please ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel confident introducing this technique to debug. We try as much as we can to lower the compilation time and if we introduce macros in our tests it will increase the time. Could you try to use the same kind of tests we did before. If it's something you would like to introduce in the router codebase then I would suggest to talk about this during one of our router arch meeting because we need opinion from the team. Also we have yaml based tests in apollo-router/src/plugins/telemetry/config_new/fixtures Could you add more tests directly there please ?

I replaced the use of rstest with just plain unit tests, and added more tests in fixtures.

apollo-router/src/plugins/telemetry/mod.rs Show resolved Hide resolved
@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from ae88b1b to 970b75d Compare October 1, 2024 23:51
@pubmodmatt
Copy link
Author

pubmodmatt commented Oct 1, 2024

It just needs more tests for instruments but also for events and spans, feel free to ask if you need help or if you don't find these existing tests

I've added support for connectors HTTP events, and added tests for that in the places I found existing tests for other events. Please let me know if I didn't find the tests you were looking for.

Spans are another matter - I haven't added support for spans in this PR. The HTTP client service isn't really set up for this - it creates the http_request span internally, so there's no way to add attributes to it at the plugin hook level. This is something we can look at adding in the future, but changes here could impact other uses of that service besides connectors.

@bnjjj - let me know if this sounds ok, and if the tests I added are what you were looking for

@pubmodmatt
Copy link
Author

pubmodmatt commented Oct 2, 2024

In general this is looking good. Can we pull the connectors stuff out into a separate module though rather than adding to the already large attributes.rs instruments.rs and selectors.rs

We should eventually be aiming for a separate module for each router pipeline stage.

@BrynCooke - I moved connectors related things out to a connectors module, and there's an http module under that for these HTTP specific things. This required making some of the common types pub(super).

@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from 970b75d to 0a6ba87 Compare October 2, 2024 22:33
@pubmodmatt pubmodmatt requested a review from a team as a code owner October 3, 2024 19:46
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Oct 3, 2024

✅ Docs Preview Ready

No new or changed pages found.

Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, could you also open an issue explaining the connector selectors are not already supported for spans link it directly in this PR please ? Also could be great to document that limitation in selectors docs and spans docs probably.

@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from 3801f57 to 994d598 Compare October 4, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants