Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carbon-relay-ng never releases RAM when there is a lot of traffic, memory leak #221

Closed
obersoldat opened this issue Sep 18, 2017 · 4 comments

Comments

@obersoldat
Copy link

So in my current setup metrics are sent over keepalived server which forwards them to a haproxy server, which then balances them(layer 4) on the two carbon-relay-ng servers, in active-active setup. When I had no balancing protocol setup, one relay would have issues as it would get most of the traffic. RAM usage would pile up on the first, while on the other relay would be fine, no memory leaks. Both relays have 4 cores and 16 GB of RAM each. And what I have noticed is that up to 4 GB, the usage will be fine, releasing the RAM, but after that, it simply hoards all the RAM that it can. Once RAM is consumed, it will never be released. Up until OOM kills the process and starts it again. But then I lose some metrics that were queued at that time. I was hoping this was fixed in version carbon-relay-ng-0.9.2_2_g295c204-1.x86_64, but no luck.
# grep -i oom /var/log/messages Sep 18 08:02:27 carbon-relay-ng-1 kernel: kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0 Sep 18 08:02:27 carbon-relay-ng-1 kernel: [<ffffffff81184cfe>] oom_kill_process+0x24e/0x3c0 Sep 18 08:02:27 carbon-relay-ng-1 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name

@Dieterbe
Copy link
Contributor

there are 2 known memory leaks:

furthermore, you should know that the go runtime will always hold on to memory, once it's requested from the OS, it will never be returned. for better or worse, this is a deliberate design characteristic of the go runtime, though the go team is discussing to change this.
this measurement is thus more incidental and not that telling in terms of how much the go runtime actually has allocated, whether it's failing to properly release objects, etc. there's some good articles out there that explain this in more detail

I have just comitted c33700e which adds measurement of actual ram usage by the heap / allocated objects.
Can you run the latest code, and import the latest dashboard from this repo? you'll then see both memory obtained from the system, and memory used by the heap, which will clarify what is going on.

finally, the big question is of course why is it allocating memory in the first place. this will depend on your configuration, in particular bufSize settings, and whether it needs to use the buffer.

can you reproduce this with the latest code and post a snapshot using the latest dashboard out of this repo?

@bzed
Copy link

bzed commented Dec 20, 2017

@obersoldat You could check #222 and the pull request #248 and see if it fixes your issues. There is a memory leak on each re-connection to a destination.

@obersoldat
Copy link
Author

@bzed Can no longer report issues. I have a Graphite cluster that monitors production, and can't have issues such as these. So I switched to carbon-c-relay couple of weeks after this thread and had absolutely no issues with it. And for pennies of resources compared.

@Dieterbe
Copy link
Contributor

nothing actionable here. needed more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants