Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laurel does not aggregate all EXECVE events #178

Open
SolitudePy opened this issue Dec 12, 2023 · 20 comments
Open

Laurel does not aggregate all EXECVE events #178

SolitudePy opened this issue Dec 12, 2023 · 20 comments

Comments

@SolitudePy
Copy link

Hi, while doing our work we noticed probably a minor bug in Laurel that on some events it generates a json without the EXECVE/PROCTITLE key.
We checked /var/log/audit and filtered based on msg, and we saw multiple events for EXECVE(SYSCALL,EXECVE,PROCTITLE,CWD etc) then we checked the matched laurel json event(/var/log/laurel based on ID) and it only had a SYSCALL key, missing the EXECVE key.
We checked and it happens on multiple servers, without any correlation to event sizing/high buffering.
Our current Auditd configuration is not verbose for the other syscall types so we only encountered that for EXECVE.

I can't have a sample of the events that have this bug.
Would like if you could help in some way, and I will help as much as I can, Thanks!

@hillu
Copy link
Collaborator

hillu commented Dec 12, 2023

Without sample data, there is not much I can do.

What version of Laurel are you using? Did laurel log anything unusual to syslog?

@hillu
Copy link
Collaborator

hillu commented Dec 27, 2023

@SolitudePy Can you provide data or instructions on how to reproduce the issue?

@hillu
Copy link
Collaborator

hillu commented Dec 27, 2023

@SolitudePy Incidentally, I stumbled upon a bug today that affected EXECVE events for very long command lines (> 2^16 arguments). (This has been fixed in d89c80c.)
Does this look llike the symptom you observed?

@SolitudePy
Copy link
Author

SolitudePy commented Dec 27, 2023

@hillu Hello, we are using Laurel v0.5.3, I did not see anything peculiar that laurel logged.
The command line wasnt that long for sure. also, from what I experienced the EXECVE field was totally dropped from the laurel log even though the SYSCALL.syscall is indeed EXECVE.
I am not sure how you can produce that yourself, but you could try ingesting a lot of logs to a solution like Splunk and then search where SYSCALL.syscall equals to execve but and EXECVE is null for example

@hillu
Copy link
Collaborator

hillu commented Jan 11, 2024

@SolitudePy Does Laurel or auditd log anything strange or meaningful around the time where you are missing data in the log?

@SolitudePy
Copy link
Author

Yes, I forgot to mention but we checked on multiple servers and it seems the correlated event was from auditd: dispatch err (pipe full) event lost

@hillu
Copy link
Collaborator

hillu commented Jan 11, 2024

dispatch err (pipe full) event lost

This basically means that auditd (or audispd if you are using auditd < 3.0) is trying to write lines faster than Laurel consumes them.

The file descriptor that gets passed to Laurel as STDIN is actually one end of an AF_LOCAL socket so there's an associated buffer whose size can be increased (SO_SNDBUF). IIRC, there's no setting in auditd, though.

Reducing the number of events generated using a tweaked audit ruleset should help.

@SolitudePy
Copy link
Author

@hillu yes I thought so. its quite surprising flood of events cause the dispatcher to miss full lines of EXECVE and therefore have laurel miss it. also, as I stated before our ruleset is quite basic and we planned to make it more verbose, it will be sad if laurel could not handle it, since the original audit.log does log all of the events :\

@hillu
Copy link
Collaborator

hillu commented Jan 11, 2024

I'm sorry; as far as I know there isn't anything laurel can do here until we put reading rom input into a separate thread.

If we do the equivalent of a

setsockopt(fd, SO_SNDBUF, newsize, sizeof(newsize))

on Laurel's stdin, this should change the size of the wrong buffer. According to unix(7)

The SO_SNDBUF socket option does have an effect for UNIX domain sockets, but the SO_RCVBUF option does not.

Do you think you might be able to run a patched version of auditd?

@SolitudePy
Copy link
Author

No, I'm sorry, are you saying there cant be a fix in laurel? also if that speculation is correct I should see more events per second in that gap rather than servers that do not have this bug, right?

@hillu
Copy link
Collaborator

hillu commented Jan 11, 2024

are you saying there cant be a fix in laurel?

Not quite. The communication between auditd and laurel is buffered – and the cause of lines getting lost is most likely intermittent bursts of lines and overflowing that buffer before Laurel can catch up. The natural solution would be increasing the size of that buffer, but that can only be done on the sending side, i.e. not by Laurel.

Another solution would be to switch input handling on Laurel's side to a separate thread. I am open to pursuing this path, but this won't be done by the end of the week and I'd need to rely on you to test stuff for me.

We don't observe this problem frequently enough that we consider it an enormous problem.

Can you give me ballpark numbers about the number of events (unique message IDs) per second? What kind of hardware are you running on?

@hillu
Copy link
Collaborator

hillu commented Jan 11, 2024

No, I'm sorry, are you saying there cant be a fix in laurel? also if that speculation is correct I should see more events per second in that gap rather than servers that do not have this bug, right?

Yes, pretty much. Another explanation would be that something is slowing down Laurel in processing or writing its log files considerably.

@SolitudePy
Copy link
Author

SolitudePy commented Jan 12, 2024

@hillu we are also seeing selinux msgs about laurel trying to get rpm info for files for many random files for example, it doesnt seem to affect laurel though... I will be able to give you the exact numbers next week

@hillu
Copy link
Collaborator

hillu commented Jan 13, 2024

we are also seeing selinux msgs about laurel trying to get rpm info for files for many random files

Those are AVC messages, right? It would be really helpful if you could post some of those.

@SolitudePy
Copy link
Author

yes they appear in avc and also selinux troubleshoot, I will post them next week

@SolitudePy
Copy link
Author

Hello @hillu we checked an option to change q_depth of audispd (rhel 7) and it might fix the error of pipe full, but we afterwards still encountered logs that laurel has with syscall.syscall = execve and execve record does not exist. About SELinux:
It has a lot of errors we logged on permissive, some of them were:

denied write
denied unlink
denied sys_ptrace for /proc//environ,stat
denied getattr for many many files such as: /usr/bin/rpm, /etc/passwd, /usr/bin/dash and many more
In general, it seems laurel is working only if its selinux type is permissive.
Thanks for your help

@hillu
Copy link
Collaborator

hillu commented Jan 16, 2024

In general, it seems laurel is working only if its selinux type is permissive.

oh… are you not using the SELinux policy from contrib/selinux?

Regarding q_depth and other settings … I think that I found a way to add an I/O threat that may fix the problem, but I'd need somebody to test that before releasing it. Could you do that?

@SolitudePy
Copy link
Author

Iil come back to you with an answer, regarding q_depth doesnt it fix the buffer size you mentioned before?

@SolitudePy
Copy link
Author

I am using the selinux policy in the git, the permissive type is included there with a comment of removing it only if there are no avcs

@hillu
Copy link
Collaborator

hillu commented Jan 16, 2024

regarding q_depth doesnt it fix the buffer size you mentioned before?

Apparently, q_depth means that messages are buffered in user-space. Yes, this should help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants