Memory leak when using more than one MetricReader #4115

skadefro · 2023-09-04T08:53:23Z

What happened?

Steps to Reproduce

Add two addMetricReader's to a meterProvider and see memory explode

Expected Result

A reasonable amount of memory usage, preferably the same as only having one

Actual Result

a very fast groth of ram usage

Additional Details

Using the @opentelemetry/api 1.4.1 i began seeing massive spikes in memory usage, but only on some systems, after some digging around, i found it was only on systems with more than one opentelemetry collector.
That has not been an issue before, so i down graded to

"@opentelemetry/api-metrics": "^0.33.0",
"@opentelemetry/exporter-metrics-otlp-grpc": "^0.34.0",
"@opentelemetry/host-metrics": "^0.33.1",
"@opentelemetry/sdk-metrics-base": "^0.31.0",
"@opentelemetry/sdk-node": "^0.34.0",

that cut the memory usage in half, but it was it is still leaking memory
The graph below is memory usage for 3 runs.
First run is with newest package versions
Second run is with older package versions
last run is with only one MetricReader

While gathering data and graphs for this issue, i had the idea of removing AsyncHooksContextManager since I'm not really using that right now.
I tried adding 2 MetricReader, update to latest version of the packages, but remove AsyncHooksContextManager. That seems to reduce the memory usage drastically, but I also see memory usage is slowly creeping up over time, so it did not solve the issue. ( here compared with the above graphs )

The project this is code is running in, is super basic. It will start 1 or 2 new processes, and restarting them, if they die. so it should not really be using any ram og cpu.

OpenTelemetry Setup Code

import { api, node, tracing } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { Context, context, ROOT_CONTEXT, SpanContext, TraceFlags, trace } from '@opentelemetry/api';
import { MeterProvider, PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { HostMetrics } from '@opentelemetry/host-metrics';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { AsyncHooksContextManager } from '@opentelemetry/context-async-hooks'

// static functions on instrumentation class
    public static addMeterURL(url: string) {
        const statMetricReader = new PeriodicExportingMetricReader({
            exporter: new OTLPMetricExporter({
                url: url
            }),
            exportIntervalMillis: 3000,
        });
        instrumentation.meterProvider.addMetricReader(statMetricReader);
    }
    public static addTraceURL(url: string) {
        const stateTraceExporter = new OTLPTraceExporter({ url });
        instrumentation.traceProvider.addSpanProcessor(new tracing.BatchSpanProcessor(stateTraceExporter, { maxQueueSize: instrumentation.maxQueueSize }))
    }


// init in instrumentation class
        const contextManager = new AsyncHooksContextManager().enable();
        context.setGlobalContextManager(contextManager)

        instrumentation.traceProvider = new node.NodeTracerProvider({
            resource: instrumentation.resource
        })
        api.trace.setGlobalTracerProvider(instrumentation.traceProvider)
        instrumentation.meterProvider = new MeterProvider({
            resource: instrumentation.resource
        });
        api.metrics.setGlobalMeterProvider(instrumentation.meterProvider);
        if (process.env.enable_analytics != "false") {
            instrumentation.addMeterURL(instrumentation.baseurl)
            instrumentation.addTraceURL(instrumentation.baseurl)
        }
        if (process.env.otel_metric_url != null && process.env.otel_metric_url != "") {
            instrumentation.addMeterURL(process.env.otel_metric_url);
        }
        if (process.env.otel_trace_url != null && process.env.otel_trace_url != "") {
            instrumentation.addTraceURL(process.env.otel_trace_url);
        }
        const hostMetrics = new HostMetrics({
            meterProvider: instrumentation.meterProvider,
            name: "nodeagent"
        });
        if (process.env.otel_log_level != null && process.env.otel_log_level != "") {
            console.log("Starting HostMetrics");
        }
        hostMetrics.start();
        instrumentation.creatememorymeters()
        process.on('SIGINT', async () => {
            try {
                if (instrumentation.traceProvider != null) {
                    await instrumentation.traceProvider.shutdown();
                    instrumentation.traceProvider = null;
                }
                if (instrumentation.meterProvider != null) {
                    await instrumentation.meterProvider.shutdown();
                    instrumentation.meterProvider = null;
                }
            } catch (error) {
                console.error(error);
            } finally {
            }
        });

package.json

"@opentelemetry/api": "^1.4.1",
"@opentelemetry/auto-instrumentations-node": "^0.39.2",
"@opentelemetry/exporter-metrics-otlp-grpc": "^0.41.2",
"@opentelemetry/host-metrics": "^0.33.1",

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

skadefro · 2023-09-04T18:27:30Z

I could not for the life of me, remember why i never had this issue before, so had a look in some old code i wrote a few years back that also implemented otel. And back then I wrote my own MultiSpanExporter and MultiMetricExporter classes, so I am not registering multiple exporters, I simply handle doing the export from multiple exports, in a wrapper class.
I guess I will just copy this classes over, and do that again, since i see others reported this issue and it does not look like something that is going to be fixed any time soon.

pichlermarc · 2023-09-05T14:31:53Z

I created a minimal reproducer repository for it.
https://github.com/pichlermarc/repro-4115

It looks like a memory leak is in the metric collection/export pipeline; memory leaks quicker when the collection interval is shorter.

pichlermarc · 2023-09-26T08:05:46Z

Self-assigning this because I think I've finally gotten to the bottom of this.

It looks like we're keeping track of unreported accumulations in TemporalMetricProcessor even if the storage is only intended for one collector. (see here, where a storage is returned only for one collector). This causes _unreportedAccumulations map to keep track of every change for n-1 collectors without ever resetting.

I've tried just collecting with one MetricCollectorHandle, which fixes the memory leaks, but that messes with Views and drops data from the first collection cycle for some readers. Fixing this will likely be non-trivial.

I'll see to have a PR ready for this by the end of this week/early next week.

skadefro · 2023-09-26T08:21:52Z

For anyone affected by this, here is a very crude work around ...

skadefro added bug Something isn't working triage labels Sep 4, 2023

pichlermarc added priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc and removed triage labels Sep 5, 2023

pichlermarc added the has:reproducer This bug/feature has a minimal reproducer provided label Sep 5, 2023

pichlermarc mentioned this issue Sep 5, 2023

Potential memory leak with the prometheus metrics exporter #3621

Closed

pichlermarc self-assigned this Sep 25, 2023

pichlermarc mentioned this issue Sep 26, 2023

fix(sdk-metrics): prevent per-reader storages from keeping unreported accumulations in memory #4163

Merged

3 tasks

pichlermarc closed this as completed in #4163 Oct 10, 2023

pichlermarc mentioned this issue Jan 24, 2024

Periodically seeing fatal Timeout error messages #3740

Closed

danstarns mentioned this issue May 29, 2024

performance: high latency when using otel libs inside common endpoints #4741

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when using more than one MetricReader #4115

Memory leak when using more than one MetricReader #4115

skadefro commented Sep 4, 2023 •

edited

Loading

skadefro commented Sep 4, 2023 •

edited

Loading

pichlermarc commented Sep 5, 2023 •

edited

Loading

pichlermarc commented Sep 26, 2023

skadefro commented Sep 26, 2023 •

edited

Loading

Memory leak when using more than one MetricReader #4115

Memory leak when using more than one MetricReader #4115

Comments

skadefro commented Sep 4, 2023 • edited Loading

What happened?

Steps to Reproduce

Expected Result

Actual Result

Additional Details

OpenTelemetry Setup Code

package.json

Relevant log output

skadefro commented Sep 4, 2023 • edited Loading

pichlermarc commented Sep 5, 2023 • edited Loading

pichlermarc commented Sep 26, 2023

skadefro commented Sep 26, 2023 • edited Loading

skadefro commented Sep 4, 2023 •

edited

Loading

skadefro commented Sep 4, 2023 •

edited

Loading

pichlermarc commented Sep 5, 2023 •

edited

Loading

skadefro commented Sep 26, 2023 •

edited

Loading