Skip to content

Latest commit

 

History

History
200 lines (116 loc) · 12.7 KB

SPEC.md

File metadata and controls

200 lines (116 loc) · 12.7 KB

Autometrics Specification

** This spec is deprecated please see the new versioned specs. **

This is a work in progress specification for Autometrics.

It aims to describe the full feature set of the Autometrics libraries, but it may have important details missing. We will attempt to update this document to describe the expectations across all of the language implementations.

API

Libraries SHOULD expose a decorator, macro, wrapper function, or use another metaprogramming technique offered by the language to instrument functions and methods in the user's source code. Ideally, the function attribute should simply be called autometrics or Autometrics, but libraries MAY append a suffix to the name if necessary.

Libraries MAY enable the decorator, macro, etc to apply to an entire class definition. If they do, they SHOULD provide an option for users to skip or ignore particular methods.

Libraries MAY need an initialization function.

Libraries MAY expose additional functionality for exporting metrics to Prometheus and/or other metrics collection servers. This MAY include serializing the metrics to the Prometheus text format, OpenMetrics export format, the OpenTelemetry Protocol and/or exposing the metrics on a specific port and HTTP path to be scraped.

Note: there is an open discussion about whether libraries should export metrics on a default port and path. There is another open discussion about support for pushing metrics to a collector.

Service-Level Objectives (SLOs)

Libraries SHOULD expose functionality to create objectives within the source code. Objectives can be "attached" to functions by passing the objective to the Autometrics decorator, macro, etc for one or more functions.

Objectives can relate to functions' success rate and/or latencies.

Success rate objectives add the objective.name and objective.percentile labels to the function.calls metric.

Latency objectives add the objective.name, objective.percentile, and objective.latency_threshold labels to the function.calls.duration metric.

Metric Collection Libraries

Libraries MUST support producing metrics using an OpenTelemetry library. Libraries MAY also support Prometheus client libraries and allow users to configure which one should be used to produce metrics.

Libraries MUST support exporting metrics to Prometheus, or provide documentation for how users can export the metrics from the OpenTelemetry format to the Prometheus exposition format.

Exemplars (Optional)

Autometrics libraries MAY support attaching exemplars to the metrics generated if the underlying metrics library or libraries they use support them. See Grafana's explainer, the OpenMetrics Spec, and the OpenTelemetry Spec for more details about exemplars.

Libraries that support exemplars SHOULD integrate with popular tracing libraries and/or the OpenTelemetry library to extract exemplar fields from the context or span a given function is called within.

Libraries SHOULD support extracting the trace_id field and attaching it as an exemplar label or attribute. Libraries MAY support extracting other fields automatically or provide the user functionality to customize which fields are used.

Metrics

Autometrics uses the OpenTelemetry Metric Semantic Conventions for naming metrics, including using .'s as separators.

When the metrics are exported to Prometheus, all dot (.) separators are replaced by underscores (_). Suffixes are appended where required by Prometheus/OpenMetrics.

function.calls

Prometheus Name: function_calls_total

Required Labels: function, module, service.name, result, caller

Additional Labels (if a success rate objective is attached to the given function): objective.name and objective.percentile

This metric is a 64-bit monotonic counter that tracks the number of times a given function was invoked.

When this metric is exported to Prometheus, its name SHOULD be function_calls_total, because Prometheus/OpenMetrics specifies that counters SHOULD have the _total suffix. Note that library authors may need to append the suffix because not all Prometheus client libraries or exporters will do so.

If possible, libraries SHOULD start this counter off at zero (by incrementing the counter by 0) in order to expose the names of instrumented functions to visualization tools that use the metrics. Libraries SHOULD use as many of the labels as possible for the initial call to increment by zero, including those related to objectives and setting result="ok".

function.calls.duration

Prometheus Name: function_calls_duration_seconds

Required Labels: function, module, service.name

Additional labels (if a latency objective is attached to the given function): objective.name, objective.percentile, objective.latency_threshold

This is a 64-bit floating point histogram that tracks the duration or latency of function calls.

It MUST track the duration in seconds (not milliseconds). Libraries using OpenTelemetry SHOULD set the units in the resource metadata.

Libraries SHOULD support the default OpenTelemetry histogram buckets as label values. Libraries MAY allow users to specify custom histogram buckets.

When this metric is exported to Prometheus, its name SHOULD be function_calls_duration_seconds, because Prometheus/OpenMetrics specifies that metrics SHOULD include their units. Note that library authors may need to append the unit suffix because not all Prometheus client libraries or exporters will do so.

build_info

Prometheus Name: build_info

Required Labels: version, commit, branch, service.name

This is a gauge or up/down counter.

It MUST always have the value of 1.0.

function.calls.concurrent

Prometheus Name: function_calls_concurrent

Required Labels: function, module, service.name

This metric is optional. Libraries MAY provide an option to the user for enabling this on a per-function basis.

This is a gauge or "up/down counter" used for tracking concurrent calls to the specific function. When the function is initially called, the gauge is incremented by 1 and when it finishes, the value is decremented by 1.

Labels

When the metrics are exported to Prometheus, all dot (.) separators in the label keys are replaced by underscores (_).

Label values MAY contain any Unicode characters.

See the metrics for which labels are valid on each metric.

branch

The Git branch of the user's project. If this information is not available, this label MAY be absent or empty ("").

caller

Note: there is an ongoing discussion about whether this should be replaced with multiple labels such as caller_function and caller_module.

The name of the function that invoked the given function. If the caller is not known, this label MAY be absent or empty ("").

This SHOULD refer to Autometrics-instrumented functions. Therefore, if Function A calls Function B, which calls Function C and only Functions A and C are instrumented but not B, the caller of Function C would be Function A.

Libraries MAY make this label optional (on an opt-out basis) if collecting this information has a non-negligible performance overhead.

commit

The short (8-byte) Git commit hash of the user's project. If this information is not available, this label MAY be absent or empty ("").

function

The name of the function or method, exactly as it appears in the source code.

module

The fully-qualified module or file path of the function. The combination of the function and module labels MUST be sufficient to uniquely identify the function within the project's source code. The exact contents of this label value are assumed to be language-specific.

Note: There is an ongoing discussion about whether the class should be added to the module label or if there should be a separate class label.

objective.name

If a function has an SLO attached, this label contains the user-specified name of the objective. If there is no SLO attached, this label MAY be absent or empty ("").

objective.percentile

If a function has an SLO attached, this label specifies the percentage of requests that should return the result="ok" OR the percentage of requests that should meet the specified objective.latency_threshold.

The value MUST be expressed as a percentage, so 99.9% would be "99.9" (without the % symbol).

If there is no SLO attached, this label MAY be absent or empty ("").

Libraries SHOULD support the following percentiles: "90", "95", "99", "99.9". Libraries MAY allow users to specify custom percentiles but care should be taken to ensure that users generate separate Prometheus recording rules for the custom percentiles.

objective.latency_threshold

If a function has an SLO attached, this specifies the maximum duration of function calls that are considered meeting the objective.

This MUST be specified in seconds (not milliseconds).

Libraries SHOULD support the default OpenTelemetry histogram buckets as label values. Libraries MAY allow users to specify custom latencies but care should be taken to ensure that the value of this label matches one of the histogram buckets supported by the function.calls.duration metric.

result

Whether the function executed successfully or errored. An error MAY either mean that the function returned an error or that it threw an exception.

The value of this label MUST either be "ok" or "error".

Libraries MAY offer users the ability to override the default behavior for determining whether the result label should be "ok" or "error", for example to allow users to treat client-side errors as "ok".

service.name

The logical name of a service. This matches the OpenTelemetry Service specification.

All metrics produced by a library from a given instance SHOULD use a single service.name. All instances of a horizontally scaled service SHOULD also use the same service.name.

Libraries SHOULD support setting the service.name using environment variables (AUTOMETRICS_SERVICE_NAME and OTEL_SERVICE_NAME, with the first taking precedence if both are set). Libraries MAY also support configuring this value in an initialization function.

version

The version of the user's project, ideally using Semantic Versioning. It SHOULD only contain the version number and SHOULD NOT start with a v.