Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics support for Netty allocators and event executors #522

Closed
zeagord opened this issue Mar 27, 2018 · 15 comments · Fixed by #3742
Closed

Metrics support for Netty allocators and event executors #522

zeagord opened this issue Mar 27, 2018 · 15 comments · Fixed by #3742
Assignees
Labels
enhancement A general enhancement
Milestone

Comments

@zeagord
Copy link

zeagord commented Mar 27, 2018

It will be nice to see metrics added around Netty. For example,

  1. Number of threads in the eventloop
  2. Processing time in the eventloop
  3. Throughput
@jkschneider jkschneider added the help wanted An issue that a contributor can help us with label Apr 3, 2018
@tkp1n
Copy link
Contributor

tkp1n commented Mar 6, 2019

I recently added the following netty-micrometer glue code to a spring boot project to monitor memory usage by the default PooledByteBufAllocator.
Although not in your list, this may be useful to some...

lots of code hidden...
package x.y.z;

import io.micrometer.core.instrument.Metrics;
import io.micrometer.core.instrument.Tags;
import io.netty.buffer.*;
import org.springframework.context.annotation.Configuration;

import javax.annotation.PostConstruct;

@Configuration
public class NettyMetricsConfig {
    private static final String NETTY = "netty";
    private static final String ALLOC = "alloc";
    private static final String POOLED = "pooled";
    private static final String UNPOOLED = "unpooled";

    private static final String MEMORY = "memory";
    private static final String DIRECT = "direct";
    private static final String HEAP = "heap";

    private static final String ARENA = "arena";

    private static final String CHUNK = "chunk";

    private static final String THREAD = "thread";
    private static final String LOCAL = "local";

    private static final String CACHE = "cache";
    private static final String SIZE = "size";
    private static final String TINY = "tiny";
    private static final String SMALL = "small";
    private static final String NORMAL = "normal";
    private static final String HUGE = "huge";

    private static final String SUBPAGE = "subpage";
    private static final String CHUNKLIST = "chunklist";
    private static final String NUMBER = "number";

    private static final String USED = "used";
    private static final String COUNT = "count";

    private static final String ALLOCATION = "allocation";
    private static final String DEALLOCATION = "deallocation";
    private static final String ACTIVE = "active";
    private static final String BYTE = "byte";

    private static final String ELEMENT = "element";
    private static final String MAX = "max";
    private static final String MIN = "min";
    private static final String AVAILABLE = "available";
    private static final String PAGE = "page";
    private static final String USAGE = "usage";

    @PostConstruct
    public void configureNettyMetrics() {
        PooledByteBufAllocatorMetric pooledMetric = PooledByteBufAllocator.DEFAULT.metric();

        Tags pooled = Tags.of(ALLOC, POOLED);

        Metrics.gauge(dot(NETTY, ALLOC, MEMORY, USED), pooled.and(MEMORY, DIRECT), pooledMetric, ByteBufAllocatorMetric::usedDirectMemory);
        Metrics.gauge(dot(NETTY, ALLOC, MEMORY, USED), pooled.and(MEMORY, HEAP), pooledMetric, ByteBufAllocatorMetric::usedHeapMemory);

        Metrics.gauge(dot(NETTY, ALLOC, ARENA, COUNT), pooled.and(MEMORY, DIRECT), pooledMetric, PooledByteBufAllocatorMetric::numDirectArenas);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, COUNT), pooled.and(MEMORY, HEAP), pooledMetric, PooledByteBufAllocatorMetric::numHeapArenas);
        Metrics.gauge(dot(NETTY, ALLOC, THREAD, LOCAL, CACHE, COUNT), pooled, pooledMetric, PooledByteBufAllocatorMetric::numThreadLocalCaches);

        Metrics.gauge(dot(NETTY, ALLOC, CACHE, SIZE), pooled.and(CACHE, TINY), pooledMetric, PooledByteBufAllocatorMetric::tinyCacheSize);
        Metrics.gauge(dot(NETTY, ALLOC, CACHE, SIZE), pooled.and(CACHE, SMALL), pooledMetric, PooledByteBufAllocatorMetric::smallCacheSize);
        Metrics.gauge(dot(NETTY, ALLOC, CACHE, SIZE), pooled.and(CACHE, NORMAL), pooledMetric, PooledByteBufAllocatorMetric::normalCacheSize);
        Metrics.gauge(dot(NETTY, ALLOC, CHUNK, SIZE), pooled, pooledMetric, PooledByteBufAllocatorMetric::chunkSize);

        for (int i = 0; i < pooledMetric.directArenas().size(); i++) {
            Tags tags = Tags.of(MEMORY, DIRECT)
                    .and(dot(ARENA, NUMBER), Integer.toString(i));

            meterPoolArena(tags, pooledMetric.directArenas().get(i));
        }

        for (int i = 0; i < pooledMetric.heapArenas().size(); i++) {
            Tags tags = Tags.of(MEMORY, HEAP)
                    .and(dot(ARENA, NUMBER), Integer.toString(i));

            meterPoolArena(tags, pooledMetric.heapArenas().get(i));
        }

        ByteBufAllocatorMetric unpooledMetric = UnpooledByteBufAllocator.DEFAULT.metric();
        Tags unpooled = Tags.of(ALLOC, UNPOOLED);

        Metrics.gauge(dot(NETTY, ALLOC, MEMORY, USED), unpooled.and(MEMORY, DIRECT), unpooledMetric, ByteBufAllocatorMetric::usedDirectMemory);
        Metrics.gauge(dot(NETTY, ALLOC, MEMORY, USED), unpooled.and(MEMORY, HEAP), unpooledMetric, ByteBufAllocatorMetric::usedHeapMemory);
    }

    private void meterPoolArena(Tags tags, PoolArenaMetric pam) {
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, THREAD, CACHE, COUNT), tags, pam, PoolArenaMetric::numThreadCaches);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, SUBPAGE, COUNT), tags.and(SUBPAGE, TINY), pam, PoolArenaMetric::numTinySubpages);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, SUBPAGE, COUNT), tags.and(SUBPAGE, SMALL), pam, PoolArenaMetric::numSmallSubpages);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, CHUNKLIST, COUNT), tags, pam, PoolArenaMetric::numChunkLists);

        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, COUNT), tags, pam, PoolArenaMetric::numAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, COUNT), tags.and(SIZE, TINY), pam, PoolArenaMetric::numTinyAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, COUNT), tags.and(SIZE, SMALL), pam, PoolArenaMetric::numSmallAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, COUNT), tags.and(SIZE, NORMAL), pam, PoolArenaMetric::numNormalAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, COUNT), tags.and(SIZE, HUGE), pam, PoolArenaMetric::numHugeAllocations);


        Metrics.gauge(dot(NETTY, ALLOC, ARENA, DEALLOCATION, COUNT), tags, pam, PoolArenaMetric::numDeallocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, DEALLOCATION, COUNT), tags.and(SIZE, TINY), pam, PoolArenaMetric::numTinyDeallocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, DEALLOCATION, COUNT), tags.and(SIZE, SMALL), pam, PoolArenaMetric::numSmallDeallocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, DEALLOCATION, COUNT), tags.and(SIZE, NORMAL), pam, PoolArenaMetric::numNormalDeallocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, DEALLOCATION, COUNT), tags.and(SIZE, HUGE), pam, PoolArenaMetric::numHugeDeallocations);

        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, ACTIVE, COUNT), tags, pam, PoolArenaMetric::numActiveAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, ACTIVE, COUNT), tags.and(SIZE, TINY), pam, PoolArenaMetric::numActiveTinyAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, ACTIVE, COUNT), tags.and(SIZE, SMALL), pam, PoolArenaMetric::numActiveSmallAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, ACTIVE, COUNT), tags.and(SIZE, NORMAL), pam, PoolArenaMetric::numActiveNormalAllocations);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ALLOCATION, ACTIVE, COUNT), tags.and(SIZE, HUGE), pam, PoolArenaMetric::numActiveHugeAllocations);

        Metrics.gauge(dot(NETTY, ALLOC, ARENA, ACTIVE, BYTE, COUNT), tags, pam, PoolArenaMetric::numActiveHugeAllocations);

        for (int i = 0; i < pam.tinySubpages().size(); i++) {
            Tags tinySubpage = tags.and(SUBPAGE, TINY)
                    .and(dot(SUBPAGE, NUMBER), Integer.toString(i));

            meterSubpage(tinySubpage, pam.tinySubpages().get(i));
        }

        for (int i = 0; i < pam.smallSubpages().size(); i++) {
            Tags tinySubpage = tags.and(SUBPAGE, SMALL)
                    .and(dot(SUBPAGE, NUMBER), Integer.toString(i));

            meterSubpage(tinySubpage, pam.smallSubpages().get(i));
        }

        for (int i = 0; i < pam.chunkLists().size(); i++) {
            Tags chunkList = tags.and(dot(CHUNKLIST, NUMBER), Integer.toString(i));

            meterChunkList(chunkList, pam.chunkLists().get(i));
        }
    }

    private void meterSubpage(Tags tags, PoolSubpageMetric psm) {
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, SUBPAGE, ELEMENT, MAX), tags, psm, PoolSubpageMetric::maxNumElements);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, SUBPAGE, AVAILABLE, COUNT), tags, psm, PoolSubpageMetric::numAvailable);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, SUBPAGE, ELEMENT, SIZE), tags, psm, PoolSubpageMetric::elementSize);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, SUBPAGE, PAGE, SIZE), tags, psm, PoolSubpageMetric::pageSize);
    }

    private void meterChunkList(Tags tags, PoolChunkListMetric pclm) {
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, CHUNKLIST, USAGE, MIN), tags, pclm, PoolChunkListMetric::minUsage);
        Metrics.gauge(dot(NETTY, ALLOC, ARENA, CHUNKLIST, USAGE, MAX), tags, pclm, PoolChunkListMetric::maxUsage);
    }

    private String dot(String... strings) {
        return String.join(".", strings);
    }
}

@shakuzen shakuzen added the enhancement A general enhancement label Apr 10, 2019
@shakuzen shakuzen added this to the 1.x milestone Apr 10, 2019
@franz1981
Copy link

franz1981 commented Jul 30, 2020

Hi, I've just implemented for https://github.com/apache/activemq-artemis/ a Netty pooled allocator MeterBinder: right now I'm pushing it in a PR for artemis but I would be more then happy to send a PR here as well...anyone interested? :)

@tokuhirom
Copy link

I hope this feature gets implemented!

@franz1981 Can you send a PR to the micrometer?

@shakuzen
Copy link
Member

We would certainly take a look at a pull request, given the popularity of this enhancement request. I think there are a lot of areas of potential interest in Netty metrics, and different people may be interested in different metrics. I'm not an expert in Netty, though, so it may be harder for me to review what is important and makes sense. Has anyone asked the Netty team about having Micrometer metrics maintained in the Netty project itself? It could be an optional dependency or a separate module, but it would help keep the metrics in sync with any changes in Netty and have the best Netty experts available to maintain it.

@franz1981
Copy link

Thanks @tokuhirom and @shakuzen

Let me bring @normanmaurer in, although I can anticipate (just my 2c as Netty committer, not its project lead) that I don't see Netty as the right place for this: micrometer isn't the sole way to publish such metrics and they are technically already "exposed" through a lower level API.
In short: adding a micrometer dep on Netty, that's central to many project that probably already implements their own way to expose metrics, won't benefit the Netty project itself, if not by adding an additional maintenance burden.
There's an old issue in which a comment made by @trustin make clear what was the Netty's team position at that time: netty/netty#8546 (comment)

So probably it makes sense the opposite, if the community (as @tokuhirom has summarize with enthusiasm :)) think it worths for Micrometer project instead

@franz1981
Copy link

FYI this is the MeterBinder: https://github.com/apache/activemq-artemis/blob/58e59ef67989f06675c488e16d4418f766e46549/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/metrics/NettyPooledAllocatorMetrics.java

I believe this could be improved both from a Netty pov (and probably can ask @trustin some help to check if the collected metrics makes sense) and Micrometer as well (happy to receive reviews in case of a contribution)

@shakuzen
Copy link
Member

Thanks for the comments, @franz1981.

In short: adding a micrometer dep on Netty

As an optional dependency, it would not be passed onto netty users. They would have to include a Micrometer dependency themselves to use the classes containing Micrometer code. There are examples of other projects doing this like HikariCP. The downside for netty users that don't use Micrometer would be a few more classes in the JAR that they won't ever use.

The other option I mentioned of a separate module also avoids adding a dependency for netty users that don't want it. We used to maintain Hibernate metrics in the Micrometer repository but this proved difficult to support a wide range of versions due to API changes and needing to use incubating APIs to get some metrics. These problems are solved by the Micrometer instrumentation being in the Hibernate repository now. It's not a problem to use incubating API there because it will be updated when the API is updated and versions will be aligned. We cannot do that maintaining the code here.

adding an additional maintenance burden.

Maintenance of netty metrics needs to be done somewhere. I'm suggesting that the burden would be less if metrics instrumentation were closer to the code and experts on what is being instrumented.

I'm happy to review a pull request for netty metrics in this repository because I think it would be beneficial for a lot of users. I also wanted to start the conversation about something in Netty in parallel.

@franz1981
Copy link

franz1981 commented Jan 26, 2022

Thanks @tokuhirom for the prompty answers

There are examples of other projects doing this like HikariCP

I see HikariCP itself as a good example of what I've suggested to be as a maintenance additional "cost", not present in any form ATM in Netty: see https://github.com/brettwooldridge/HikariCP/tree/dev/src/main/java/com/zaxxer/hikari/metrics

It shows micrometer/prometheus/other(?) metrics provider(s) and I suppose that if a N framework exposing metrics is coming out, it means managing/considering that in Netty as well (if popular enough), given that there are no reason to accept one framework and not any other ones. This is means dealing with new version(s), API changes, testing, security issues...for all these different potential provider(s).

I'm suggesting that the burden would be less if metrics instrumentation were closer to the code and experts on what is being instrumented.

This is a valid point, I agree.
But Micrometer isn't the sole metric provider(s) out there so cannot see why binding it into Netty would be beneficial, for a single exposed metric.
Consider that Netty internal(s) change so little over time (and metrics are considered public, so they won't change at all really, if not after a deprecation process) that whatever metric provider is going to expose them would just pay the initial cognitive cost (and I can help there, with some help from @trustin :P) to make it "right".
This would make the cost decentralized (as every metric provider can just impl their own netty plugin, if their community wish and need) and its maintenance more related to the framework that change more often (micrometer can build a whole new version or new way to expose/configure exposing his metrics without any impact on the many libs is going to "observe").

I think about this as how Linux kernel expose some its observation points, now in the form of eBPF or other raw/low level composable observability hooks: monitoring tools depends on the kernel (that changes itself overtime with its own pace), and the kernel preserve low level hooks compatible and semantically meaningful (or documented otherwise) over time to make life of such tools easier....but none of such tools will be embedded in the kernel.
Hope to not been too much off-track with this, but it summarize my pov about it: I see Netty as a skeletal/core/low level frameworks that higher-level products can use in so many ways that it's their choice to decide how/if to make use of the low level metrics. There are ones providing tracking of Netty metrics without any metric provider a-la Micrometer (as we've done in AMQ Artemis for long time).

I'm happy to review a pull request for netty metrics in this repository because I think it would be beneficial for a lot of users.

Many thanks! I can work on that, but as said, need some help to make it right: I'm not a micrometer user myself

I also wanted to start the conversation about something in Netty in parallel.

Feel free to open a discussion on these channels (maybe separate issue) or on Netty ones as well and I'll try to answer with what I know

@bclozel
Copy link
Contributor

bclozel commented Mar 22, 2023

This is obviously a highly requested feature and we are considering it for the next minor version.
I've had a look at what Netty exposes and what libraries already instrument; I think Micrometer could offer a basic instrumentation for common cases, but that transport-specific instrumentation might belong to libraries themselves.

EventLoop metrics

Reactor Netty already offers metrics around EventLoop, namely "reactor.netty.eventloop.pending.tasks". This is counting the number of pending tasks, on a per EventLoop basis and tagging the gauge with the event loop name. This is done for all SingleThreadEventExecutor instances.

Armeria seems to be on the path of instrumenting the EventLoopGroup with the total number of workers (SingleThreadEventExecutor or other) with "event.loops.num.workers" and the number of pending tasks with "event.loops.pending.tasks".

I believe we could contribute an instrumentation that's very similar to Reactor's with:

  • "netty.eventloop.tasks.pending" gauges, tagged by loop name
  • I don't know if the number of workers is a metric most applications would need, but "netty.eventloop.workers.current" could be a candidate.

Memory allocation metrics

This is another candidate for metrics shared by Frameworks and libraries. Both Reactor Netty and Artemis provide such metrics.

For both memory allocation and event loop metrics, we could allow libraries to use a custom prefix, as they probably want to expose such metrics in the same namespace as additional metrics that they provide. Overall this does not represent a large amount of code and we could implement this in micrometer-core directly, if:

  • this would be used by Frameworks and libraries in place of their direct instrumentation
  • this would be useful to other direct Netty users

We would like to get your opinion on this @violetagg @franz1981 @ikhoon !

@violetagg
Copy link
Contributor

@bclozel Do you consider also Netty 5 or the target is only Netty 4.1?

@bclozel
Copy link
Contributor

bclozel commented Mar 23, 2023

@violetagg we could do both as they live in different package.

@ikhoon
Copy link
Contributor

ikhoon commented Mar 31, 2023

As all know, Netty is a widely used network library. It should be very helpful to provide standard Netty metrics for both normal users and library authors.

@ikhoon
Copy link
Contributor

ikhoon commented Mar 31, 2023

In addition to EventLoop and memory allocation metrics, DNS metrics will be useful.
Armeria records DNS metrics using DnsQueryLifecycleObserver.
https://github.com/line/armeria/blob/5b384fbe27e7e6f9225d6db91cbb684d09dfbb5e/core/src/main/java/com/linecorp/armeria/client/DefaultDnsQueryLifecycleObserver.java#L41

@franz1981
Copy link

franz1981 commented Mar 31, 2023

Please @violetagg @bclozel @ikhoon let me know if I can help: I've already shared the pooled allocator stuff I've build for AMQ but it needs to be reviewed again, because:

  • the allocator has moved to a new jemalloc version and some of the metrics are non needed anymore (are no op in practice)
  • "I was so young and naive" - few metrics collections can slow down the allocator and should be removed
  • new metrics has been added into Netty eg pinned memory

It's important to remember that if cleaners are used, Netty would make uses of standard mechanism to allocate off-heap memory, hence JMX/MBean can track direct memory usage BUT (there's always one!) it won't help to distinguish which one belong to Netty and which to the application itself, in case there are other "direct" memory allocations outside of Netty pooled arenas/TLABs: in short; it would be better to have a simple Netty metrics to be exposed (IIRC Rector Netty does it already? maybe it could be integrated into the work we will do here in the micrometer repo?)

Re the event loop, I've opened a GSoC few years ago without getting much interest (io_uring was the star at that time :) ): but we can use some of the JCTools capabilities to expose few metrics out of loops (eg submitted vs consumed tasks) + using a timed singleton Runnable sentinel (a single instance per event loop) that can just be submitted and measure how much it takes to be executed - and be resubmitted again. This will give some metric of "event loop business", but I have no idea how to name it.

@bclozel
Copy link
Contributor

bclozel commented Apr 4, 2023

I've submitted a PR for review in #3742 - I'll expand there about the choices I've made and we can discuss improvements. Note that this is time sensitive if we want to include it for the upcoming 1.11 release.

@bclozel bclozel changed the title [FeatureRequest] Netty Support Metrics support for Netty allocators and event executors Apr 6, 2023
@bclozel bclozel modified the milestones: 1.x, 1.11.0-RC1 Apr 6, 2023
@bclozel bclozel closed this as completed in d985e62 Apr 6, 2023
@bclozel bclozel removed the help wanted An issue that a contributor can help us with label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A general enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants