-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSS] input plugins codec thread safety #8830
Comments
Decided to go ahead and make the line codec threadsafe logstash-plugins/logstash-codec-line/pull/13 |
new threadsafe line codec plugin version 3.0.6 pushed. |
After discussing this further - this line codec patch logstash-plugins/logstash-codec-line#13 does fix the problem but it might not be the optimal solution here WRT the UDP input plugin.
One potential problem I could think of with this strategy is if the kernel UDP receive+buffering is non atomic and issuing socket read could return partially copied UDP packets which would be completed in the subsequent read operation, in which case the above patch would be required. WDYT @jordansissel ? Another problem I can think of is if (and this is a big if) someone would want to implement some higher protocol in a codec, this codec would probably be better off being shared across threads in the UDP input plugin. |
It seems that BSD based implementations uses lists of mbufs and each mbuf holds a single udp packet data so in that respect socket read operations would always return a full udp packet. I believe this should also be true in Linux implementations. |
interesting related question on SO - talks about truncations https://stackoverflow.com/questions/3069204/reading-partially-from-sockets but we are providing sensible options defaults to deal with that
|
So it seems we do not need to guard against partial socket reads. |
@colinsurprenant yeah, I dont' think we would get partial reads. The maximum udp packet size is 65536 bytes (IP packet length field is 2-byte value). The network stack should deliver datagrams one-at-a-time, so no
+1 I enjoyed reading the summary of your research on this topic! :) |
I enjoyed reading the research too. Some thoughts:
|
I added the option for receive_buffer_bytes. There isn't one correct value. OS default could be 64K while max allowed could be set to 1024K. |
@IrlJidel sure, my question is what's the use case for setting it lower? You could always truncate the message later in the filter section of logstash. |
@andrewvc receive_buffer_bytes is used when setting up the socket to set the buffer the OS network stack uses to store data to be read by the application. Honestly, not sure if most users will need to set this, and I'm open to exploring this setting being omitted. It exists to set the buffer size of unread bytes before the OS starts dropping packets and is useful in cases where a burst of data comes in temporarily faster than Logstash can process. What you describe "you could always truncate" is about the confusingly-named |
Codecs aren't bound to threads, they are bound to streams. TCP input, for example, one connection is one stream. For UDP, one packet is one stream. For File input, one file is one stream. For Kafka, one message is one stream. For HTTP, one request body is one stream. |
receive_buffer_bytes is the socket receive buffer size, which holds multiple packets. Its an important setting for us to handle microbursts. main udp is always the bottleneck on our system and if set too low results in udp drops.
|
A few notes about buffer sizes:
|
Ok thanks for keeping it ☺ Sorry if this is off topic but the main reason for udp bottleneck is due to locks as we're only doing a recvmsg call. Ruby doesn't support recvmmsg but I wonder what perf would be like if we used java just like the tcp input. |
@andrewvc not sure I am following. As @jordansissel mentioned, a codec need to be bound to a stream. This whole issue was raised because the UDP input plugin uses multiple worker threads but these were sharing a single codec instance. In the case of the line codec this created a problem because the line codec is not threadsafe. My first idea was to fix the line codec thread safety issue (which in fact solves the issue) but forgetting that UDP cannot form a stream since UDP packets order in not guaranteed. It turns out that each UDP packet (resulting from the |
Oh, then never mind Colin, we're on the same page, just a misunderstanding
on my part :)
…On Thu, Dec 14, 2017 at 12:39 PM, Colin Surprenant ***@***.*** > wrote:
In what scenario do we actually want a threadsafe codec? If the right
answer is to clone the codec for the local thread then isn't that
encouraging bad behavior? In a multi-threaded situation that creates a
choke point.
@andrewvc <https://github.com/andrewvc> not sure I am following. As
@jordansissel <https://github.com/jordansissel> mentioned, a codec need
to be bound to a stream. This whole issue was raised because the UDP input
plugin uses multiple worker threads but these were sharing a single codec
instance. In the case of the line codec this created a problem because the
line codec is *not* threadsafe. My first idea was to fix the line codec
thread safety issue (which in fact solves the issue) but forgetting that
UDP cannot form a stream since UDP packets order in *not* guaranteed. It
turns out that each UDP packet (resulting from the
@***@***.***_size) call should be considered a *complete
stream in itself* so each input worker can just safely clone the
configured codec. So in that respect I updated the upd input for this
logstash-plugins/logstash-input-udp#32
<logstash-plugins/logstash-input-udp#32> and
reverted my line codec thread safety fix logstash-plugins/logstash-
codec-line#14
<logstash-plugins/logstash-codec-line#14>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8830 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIBYwm8f8n3ZWvt8TuuTodkRH9D04neks5tAWtEgaJpZM4Q-ISP>
.
|
Closing - this is now resolved with latest udp input and line codec releases. |
In logstash-plugins/logstash-input-udp/issues/4 (and related https://discuss.elastic.co/t/udp-input-plugin-error-typeerror-cant-convert-nil-into-string/110565) I realized the the udp input plugin uses multiple input workers to parallelize the decoding of the input data.
The problem here is that, in this specific case, the line codec is not thread safe (more precisely the usage of the
FileWatch::BufferedTokenizer
is not threadsafe).Now, this can be fixed in multiple ways:
Add thread safety in the udp input to protect the codec decoding execution
Add thread safety in the line codec to protect the usage of the
FileWatch::BufferedTokenizer
.Observations/Questions:
WDYT?
The text was updated successfully, but these errors were encountered: