-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
link: add maxactive for kretprobe #755
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your PR!
Can you explain your use case for maxactive and why implementing your own sampling in BPF isn't sufficient?
As it stands, the PR is too special cased for me:
- doesn't work with pmu probes
- only supported for retprobes (?)
Kprobes have a large API, and supporting all of it is not possible. So we need a very compelling reason to add something like your proposed change.
fd4973a
to
49a907d
Compare
If the traced function can sleep, then when the number of processes exceeds the number of CPUs, the kretprobe will not be triggered. Because the kernel defaults maxactive to NR_CPU. When we trace some kernel network functions (such as inet_accept), when the network connection pressure is very high, there will be a lot of kretprobe not triggered. When writing kprobe_event through tracefs, you can set maxactive to a maximum of 2048, which is enough. The implementation of pmu probe in the kernel will call create_local_trace_kprobe, in which maxactive is hardcoded to 0. |
Thanks for the background, that's interesting. So what is the behaviour of a pmu retprobe in the same scenario? It drops retprobe calls, without being able to work around the limitation? Asked another way, what does it mean that maxactive is 0 in the pmu case? What problem does maxactive guard against in the first place? Maybe we could just hardcode it to 2048 in the tracefs case? Cc @mmat11 |
I believe this is the case
I found an explanation here iovisor/bcc#1072 (comment)
From this commit torvalds/linux@696ced4 it seems the maximum is 4096. I guess there's some overhead (see https://elixir.bootlin.com/linux/v5.19/source/kernel/kprobes.c#L2202) in having a big number, else it could have been hardcoded in the kernel itself (?). I wonder why this parameter isn't configurable via |
hi, @mmat11
Tow related commits(not merged):
It seems to be discussing a more elegant way to implement.
Agree, I also expect this feature. |
Thank you for your reply!
Agree
Oh my mistake, the maximum is 4096.
As @nashuiliang said, the feature is under discussion now.
Yes, I agree. I will add this and commit again. |
@alahaiyo I'm not sure yet what the best approach is here, so there is a risk that your work might be wasted. I'll read through all the info here and try to reach an opinion today or tomorrow. |
I'm really looking forward to your opinion. |
From the kernel side we have two separate APIs to create a kretprobe:
On the user space side, maxactive is exposed in Let's consider the implications of the
All in all, @alahaiyo in your PR description you say that you want to "miss as little as possible". Would it be acceptable to you if you instead had a way to see how many misses you had, and scaling your result accordingly? If not, what value would you set If just misses is not enough, we could add a higher level option like |
@alahaiyo do you notice the missed events on specific machines, or is your goal to reduce misses outright? If there is a problem with a specific machine, is CONFIG_PREEMPT enabled? What is the value of |
From iovisor/bcc#2224 (comment):
The issue is that < 4.12 kernels will treat the |
It's not on specific machines. CONFIG_PREEMPT is not set and the value of |
Yes, that is a good reference. |
We can handle with that.
In the interface of
Even setting
In fact, I don't know how to set maxactive. In our case it was only found that there were kretprobe events missing and this was not acceptable. The solution to this problem is to increase
But I still think exposing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay, wrapping my head around this issue has been tricky. I was hoping to find a better solution that exposing maxactive
, but I'm drawing a blank.
I've left some comments for what I'd like you to change, mostly documentation and making sure we're refusing maxactive in as many cases as possible. Please also add tests that cover the maxactive API.
a2a73bf
to
118ae4a
Compare
I'm glad that this PR can continue to push forward. I have made changes to the code based on the comments. @lmb |
@alahaiyo I think the issue I pointed out in #755 (comment) is not addressed yet? |
2642713
to
a253f5f
Compare
This commit exposes kretprobe's maxactive parameter to the user. You can modify the limit on the number of concurrent events with maxactive. This feature is currently supported from kernel version 4.12. Using maxactive on kernels lower than version 4.12 will result in the creation of events with names that do not match expectations. Signed-off-by: daikunhai <daikunhai@didiglobal.com>
a253f5f
to
4dd0416
Compare
I found that it already has a query for the event ID after the event is created. So I just need to determine if I need to recreate it. My new commit is pushed. @lmb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how you integrated the check for unsupported maxactive! In the interests of getting this merged soon I'll push a couple of commits that contain changes I would've asked you to make. There is only one functional change: I'm dropping the fallback to disable maxactive.
Please take a look at my commits and let me know if you are OK with the modifications. Please also update the tests as mentioned.
link/kprobe.go
Outdated
pe := fmt.Sprintf("-:%s/%s", group, sanitizeSymbol(symbol)) | ||
var pe string | ||
if lowKernel { | ||
pe = fmt.Sprintf("-:kprobes/r_%s_0", symbol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how the transform from r4096:ebpf_XYZ/sym
into kprobes/r_sym_0
works? I'd expect that the 4096 is interpreted as a part of the event name and so would show up in the mangled event name.
Why is this not sanitizing the symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old kernel, if the second character is not a:
(https://elixir.bootlin.com/linux/v4.10/source/kernel/trace/trace_kprobe.c#L638), the event will not be created temporarily. Then here(https://elixir.bootlin.com/linux/v4.10/source/kernel/trace/trace_kprobe.c#L713), the kernel will create an event named r_symbol_offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not sanitizing the symbol?
So I think we can't sanitizing the symbol when kernel would use full symbol name to create event.
There is an example:
root@hecs-352404:~# echo "r1024:ebpf_abcdefg/_ib_umem_release __ib_umem_release" >> /sys/kernel/debug/tracing/kprobe_events
root@hecs-352404:~# cat /sys/kernel/debug/tracing/kprobe_events
r:kprobes/r___ib_umem_release_0 __ib_umem_release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment to reflect this.
6b48b81
to
9d835bb
Compare
I total OK with that. And I push my commits which deal with tests and a little bugfix. |
Move the code that deals with various kernel quirks such as not getting ErrExist and maxactive not being supported into the function used to create a probe event.
We modified logic in createTraceFSProbeEvent, so that in any kernel can not create a event when it already exists.
When the kernel creates strange event names because it does not support maxactive, you need to use the full symbol name to remove it.
d8bbf99
to
5041050
Compare
5041050
to
40c2c7a
Compare
// args.retprobeMaxActive is used on non kprobe types. Returns ErrNotSupported if | ||
// the kernel is too old to support kretprobe maxactive. | ||
func createTraceFSProbeEvent(typ probeType, args probeArgs) (uint64, error) { | ||
// Before attempting to create a trace event through tracefs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmat11 does my refactor here make sense? I think it's nicer if createTraceFSProbeEvent encapsulates all the ugly bits of the tracefs interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
A good catch. I've opted to return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alahaiyo thanks for pushing this through, sorry it took so long!
In some case we want to missed as little as possible and
it is helpful to have this configuration option.