Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration][Datadog] Add metrics availabity data on service and hosts #867

Merged

Conversation

phalbert
Copy link
Contributor

@phalbert phalbert commented Aug 1, 2024

Description

  • Add new kinds of metric and serviceMetric to get specified metrics

Type of change

Please leave one option from the following and delete the rest:

  • New feature (non-breaking change which adds functionality)

Screenshots

Screenshot 2024-08-13 at 07 59 38 Screenshot 2024-08-13 at 07 54 03 Screenshot 2024-08-13 at 07 54 16 Screenshot 2024-08-13 at 07 54 35

API Documentation

Provide links to the API documentation used for this integration.

@phalbert phalbert self-assigned this Aug 1, 2024
@github-actions github-actions bot added the size/L label Aug 1, 2024
@phalbert phalbert changed the title [Integration][Datadog] datadog add metrics on service and maybe hosts [Integration][Datadog] Add metrics on service and maybe hosts Aug 1, 2024
@phalbert phalbert changed the title [Integration][Datadog] Add metrics on service and maybe hosts [Integration][Datadog] Add metrics on service and hosts Aug 1, 2024
@phalbert phalbert changed the title [Integration][Datadog] Add metrics on service and hosts [Integration][Datadog] Add metrics availabity data on service and hosts Aug 1, 2024
@phalbert
Copy link
Contributor Author

phalbert commented Aug 1, 2024

A few other improvements I could make. cc @PeyGis , @Tankilevitch

  1. Dashboard and Query Data: Implement a caching mechanism to store fetched dashboard definitions and query results for a certain duration.

  2. Time Window: Allow users to configure the time window (currently hardcoded to 10 minutes) for the check_metric_data_exists function. This gives users more control over the metric availability checks.

@phalbert phalbert force-pushed the PORT-9179-datadog-add-metrics-on-service-and-maybe-hosts branch from 5fa6fe1 to dd6dbf9 Compare August 1, 2024 13:32
Copy link
Contributor

@Tankilevitch Tankilevitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation is too specific for the user use-case.

We have multiple resources touched to support that capability. And I think they needs to be separated.

Service
Host
Dashboard
Dashboard-Metrics

IMO we need to approach it by taking advantage of port builder capabilities, relations, mirrors, aggregation properties.

Service -> Dashboard Metrics or Service -> Dashboard -> Metrics
Or
Metrics -> Service

integrations/datadog/client.py Show resolved Hide resolved
@phalbert
Copy link
Contributor Author

phalbert commented Aug 6, 2024

I think the implementation is too specific for the user use-case.

We have multiple resources touched to support that capability. And I think they needs to be separated.

Service Host Dashboard Dashboard-Metrics

IMO we need to approach it by taking advantage of port builder capabilities, relations, mirrors, aggregation properties.

Service -> Dashboard Metrics or Service -> Dashboard -> Metrics Or Metrics -> Service

Screenshot 2024-08-07 at 00 27 23

This change now adds available metrics as relational array of metric ids that are available for a host or service

integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
Copy link
Member

@matan84 matan84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, will re-review once fixed

integrations/datadog/client.py Show resolved Hide resolved
integrations/datadog/main.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
phalbert and others added 5 commits August 12, 2024 15:48
Co-authored-by: Matan <51418643+matan84@users.noreply.github.com>
Co-authored-by: Matan <51418643+matan84@users.noreply.github.com>
Co-authored-by: Matan <51418643+matan84@users.noreply.github.com>
Co-authored-by: Matan <51418643+matan84@users.noreply.github.com>
…mits using a semaphore

This change refactors the `_fetch_metrics_for_services` method to introduce concurrency using `asyncio.gather`. Previously, the fetch operations were performed sequentially, even though the code utilized `asyncio`.

By fetching metrics concurrently, we aim to improve performance by potentially overlapping delays caused by network latency or Datadog API response times. However, we still maintain control over the rate of requests using a semaphore to ensure we stay within Datadog's API rate limits.

Key changes:

* Fetch operations are now performed concurrently within each service iteration using `asyncio.gather`.
* A semaphore is used to control concurrency and prevent exceeding Datadog's rate limits.
* Error handling and further optimizations may be considered in future iterations.
integrations/datadog/.port/resources/port-app-config.yaml Outdated Show resolved Hide resolved
integrations/datadog/.port/spec.yaml Show resolved Hide resolved
integrations/datadog/overrides.py Outdated Show resolved Hide resolved
integrations/datadog/overrides.py Outdated Show resolved Hide resolved
integrations/datadog/overrides.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
integrations/datadog/client.py Outdated Show resolved Hide resolved
Comment on lines 339 to 346
Returns:
The metric name, or None if no metric name is found.

Examples:
- "sum:container.memory.usage" -> "container.memory.usage"
- "avg:system.cpu.used{service:my-service}" -> "system.cpu.used"
"""
match = re.search(r"(?:\w+:)?(\w+\.\w+\.\w+)", query_string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are doing that? as eventually I do care about the avg in avg:system.cpu.used?
is there a different naming terminology between avg:system.cpu.used to system.cpu.used in datadog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just metadata info. we still have the full metric name in the title of the entity and query property

{
  "identifier": "avg:system.mem.used/service:mayicard-mayicore/env:staging",
  "title": "avg:system.mem.used{service:mayicard-mayicore,env:staging}",
  "icon": null,
  "blueprint": "datadogServiceMetric",
  "team": [],
  "properties": {
    "query": "avg:system.mem.used",
    "series": [],
    "res_type": "time_series",
    "from_date": "2024-08-16T09:15:34Z",
    "to_date": "2024-08-16T09:25:34Z",
    "metric_name": "system.mem.used",
    "env": "staging"
  },
  "relations": {
    "service": "mayicard-mayicore"
  }
}

phalbert and others added 2 commits August 19, 2024 10:02
Co-authored-by: Tom Tankilevitch <59158507+Tankilevitch@users.noreply.github.com>
@phalbert phalbert requested a review from a team as a code owner August 19, 2024 07:03
Copy link
Contributor

@Tankilevitch Tankilevitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tankilevitch Tankilevitch merged commit 654f554 into main Aug 19, 2024
15 checks passed
@Tankilevitch Tankilevitch deleted the PORT-9179-datadog-add-metrics-on-service-and-maybe-hosts branch August 19, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants