Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard system metrics and semantic conventions #119

Merged

Conversation

aabmass
Copy link
Member

@aabmass aabmass commented Jun 19, 2020

See open-telemetry/opentelemetry-specification#651. This OTEP proposes some standard system metric names as well as semantic conventions for naming system/runtime metrics. This mostly follows the work done in #108 and the Collector. I left a few TODOs and open questions, the biggest things being standard runtime metrics and process metrics.

Copy link
Member

@james-bebbington james-bebbington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It may also be worth defining the data type (Int64 or Double)

text/0119-standard-system-metrics.md Outdated Show resolved Hide resolved
text/0119-standard-system-metrics.md Outdated Show resolved Hide resolved
text/0119-standard-system-metrics.md Outdated Show resolved Hide resolved
text/0119-standard-system-metrics.md Outdated Show resolved Hide resolved
text/0119-standard-system-metrics.md Outdated Show resolved Hide resolved
@jmacd jmacd added the metrics Relates to the Metrics API/SDK label Jun 23, 2020
Copy link

@jlegoff jlegoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julien from New Relic - I work on our infrastructure product and I have a couple of comments / questions. Sorry if they are obvious, I'm still getting up to speed with OTEL!

text/0119-standard-system-metrics.md Outdated Show resolved Hide resolved
text/0119-standard-system-metrics.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. I especially like "usage" and "utilization" as standard names.

Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely good enough to be approved as an OTEP and move on to the spec itself.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

dmitryax added a commit to dmitryax/opentelemetry-collector-contrib that referenced this pull request Jul 20, 2020
Update kubeletstats receiver metrics according to open-telemetry/oteps#119 :
- Remove "k8s." from container metrics
- Change "/" metric delimiter to "."
- Change ...network -> ...network.io
- Avoid short forms in metrics: "mem" -> "memory", "fs" -> "filesystem"
- Use "cpu.usage" in [0,1] scale instead of "cpu.utilization" in percents
- Use "cpu.time" in seconds instead of "cpu/cumulative" in nanoseconds

Also introduce additional metrics:  "network.errors" and "filesystem.usage".
bogdandrutu pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Jul 20, 2020
Update kubeletstats receiver metrics according to open-telemetry/oteps#119 :
- Remove "k8s." from container metrics
- Change "/" metric delimiter to "."
- Change ...network -> ...network.io
- Avoid short forms in metrics: "mem" -> "memory", "fs" -> "filesystem"
- Use "cpu.usage" in [0,1] scale instead of "cpu.utilization" in percents
- Use "cpu.time" in seconds instead of "cpu/cumulative" in nanoseconds

Also introduce additional metrics:  "network.errors" and "filesystem.usage".
@jmacd
Copy link
Contributor

jmacd commented Jul 21, 2020

@open-telemetry/specs-metrics-approvers please review.

@james-bebbington
Copy link
Member

james-bebbington commented Jul 31, 2020

I believe this PR is ready to be merged but when writing this up for the specs repo, it would be good to add a convention for process counts (with "state" = running / inactive)

|----------------------|-------|-----------------|----------|---------|-----------------------------------|
|system.cpu.time |seconds|SumObserver |Double |state |idle, user, system, interrupt, etc.|
| | | | |cpu |1 - #cores |
|system.cpu.utilization|1 |UpDownSumObserver|Double |state |idle, user, system, interrupt, etc.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/UpDownSumObserver/ValueObserver

aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
aabmass added a commit to aabmass/opentelemetry-specification that referenced this pull request Sep 9, 2020
jmacd added a commit to open-telemetry/opentelemetry-specification that referenced this pull request Oct 15, 2020
* System metrics semantic conventions

Conventions from [OTEP
119](open-telemetry/oteps#119)

* change process count to UpDownSumObserver

* fix system.cpu.utilization, use better example

* first several comments

* add description columns, update units to UCUM

* markdown-toc

* clarify OS process level metrics

* clarify load average exapmle

* move general conventions + OTEP 108 into README.md

* renamed swap -> paging

* add addition fs labels

* fix links

* fix link

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* fix tigran comments

* add disk io_time and operation_time

* add descriptions/footnotes for dropped packets and net errors

* lint, more info for net dropped packets/errors

* "dropped_packets" -> "dropped"

* Apply suggestions from James' code review

Co-authored-by: James Bebbington <jbebbington@google.com>

* comments from James' code review

* clarify windows perf counter

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

* reflow text

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: James Bebbington <jbebbington@google.com>
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
jsuereth pushed a commit to jsuereth/otel-semconv-test that referenced this pull request Apr 19, 2023
* System metrics semantic conventions

Conventions from [OTEP
119](open-telemetry/oteps#119)

* change process count to UpDownSumObserver

* fix system.cpu.utilization, use better example

* first several comments

* add description columns, update units to UCUM

* markdown-toc

* clarify OS process level metrics

* clarify load average exapmle

* move general conventions + OTEP 108 into README.md

* renamed swap -> paging

* add addition fs labels

* fix links

* fix link

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* fix tigran comments

* add disk io_time and operation_time

* add descriptions/footnotes for dropped packets and net errors

* lint, more info for net dropped packets/errors

* "dropped_packets" -> "dropped"

* Apply suggestions from James' code review

Co-authored-by: James Bebbington <jbebbington@google.com>

* comments from James' code review

* clarify windows perf counter

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

* reflow text

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: James Bebbington <jbebbington@google.com>
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
jsuereth pushed a commit to open-telemetry/semantic-conventions that referenced this pull request May 11, 2023
* System metrics semantic conventions

Conventions from [OTEP
119](open-telemetry/oteps#119)

* change process count to UpDownSumObserver

* fix system.cpu.utilization, use better example

* first several comments

* add description columns, update units to UCUM

* markdown-toc

* clarify OS process level metrics

* clarify load average exapmle

* move general conventions + OTEP 108 into README.md

* renamed swap -> paging

* add addition fs labels

* fix links

* fix link

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* fix tigran comments

* add disk io_time and operation_time

* add descriptions/footnotes for dropped packets and net errors

* lint, more info for net dropped packets/errors

* "dropped_packets" -> "dropped"

* Apply suggestions from James' code review

Co-authored-by: James Bebbington <jbebbington@google.com>

* comments from James' code review

* clarify windows perf counter

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

* reflow text

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: James Bebbington <jbebbington@google.com>
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics Relates to the Metrics API/SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants