Fix cohttp EOF read #1035

gabrielmoise17 · 2024-06-11T07:54:45Z

This work has been done together by the following people:
@savvadia
@vect0r-vicall
@picojulien
@johnyob
@raphael-proust

Motivation

We have discovered that there is a resource leak for particular cases. Here are two possible scenarios:

If the callback never resolves, EOF is never detected, leading to a resource leak.
If the callback is very slow and resource-intensive, EOF is not detected until the callback resolves. Consequently, client resources are wasted even after the connection has been closed by the peer.

Solution

A new wait_eof_or_closed function is introduced, which repeatedly peeks (using MSG_PEEK) into the data stream for an EOF, in which case it closes the connection. MSG_PEEK used for this does not disturb reading from the stream by the handle_client. If the connection is closed locally before this happens, the function stops waiting for EOF.

If provided to the wait_eof_or_closed, the sleep_fn argument will be used to dictate the periodicity of the checks for EOF from the peer, yielding control periodically. If sleep_fn is not provided, the server keeps its legacy behaviour.

PR format

The format of the PR is as follows:

The first [WIP] commit adds a test_leak.ml test, which showcases the EOF leak; the [WIP] tag highlights that this test is not something that we want to be added necessarily, but we would be happy if you find it desirable or helpful and choose to incorporate it
The second commit fixes the leak showcased in the first commit

Testing

You can observe the initial leaking behaviour by cherry picking only the first commit:

$ g checkout v5-backports

$ g cherry-pick <first_commit>

$ make

$ dune exec ./_build/default/cohttp-lwt-unix/test/test_leak.exe
Server running on port 8080

Now, the server is listening, so you can call on a separate console curl -s 'localhost:8080/sleep' and observe that the server keeps printing messages of I slept. Now, when you close the curl request with CTRL+C, the server does not stop.

To solve this problem, add the second commit as well, and re-do this test, and after stopping the curl request, you will see this:

$ dune exec ./_build/default/cohttp-lwt-unix/test/test_leak.exe
Server running on port 8080           
Cohttp connection on 1
I slept 
I slept 
I slept 
I slept 
Cohttp connection closed

So, the leak is now fixed.

One can also monitor the resources used by the executable thanks to a tool such as lsof:

$ lsof -c test_leak.exe | grep localhost

gabrielmoise17 · 2024-06-11T07:55:40Z

cohttp-mirage/src/io.ml

@@ -76,4 +76,6 @@ module Make (Channel : Mirage_channel.S) = struct
      | Read_exn e -> Lwt.return_error (Read_error e)
      | Write_exn e -> Lwt.return_error (Write_error e)
      | ex -> Lwt.fail ex)
+
+  let wait_eof_or_closed _conn _ic _sleep_fn = assert false


Here we ask for the advice of the maintainers of the library, as we advocate that this function will never be called from this library, but we want to receive a second opinion.

raphael-proust · 2024-06-11T11:37:35Z

Ping @rgrinberg and @mefyl who might be able to review.

mefyl · 2024-07-15T13:23:45Z

Thank you for the detailed report and reproduction case!

TLDR: this is IMO more a design choice than a bug. But it's a good design question.

There is a preliminary question to all this: is it a good idea to kill the server request handler if the client disconnects? I'm not really sure it is, as one could have some heavy processing server side after sending the response headers, and abruptly killing this processing if the client disconnects before the end of the body is a very opinionated choice. In the test example, the "resource leak" mentioned exists because the handler never ever returns, so the handler is a user-created resource leak in itself.

Now I do reckon that your point has merit, and stopping processing of a request when the client disconnects is a desirable feature in some if not most cases - it's just that failing to do so is a sub-optimal while doing so inappropriately is a straight bug. With that in mind I think it would be better to provide the expert request handler with the possibility to monitor client disconnection, and maybe provide some simpler default behavior in the simple interface. Happy to discuss this.

gabrielmoise17 · 2024-07-24T09:35:49Z

Thank you for the detailed report and reproduction case!

TLDR: this is IMO more a design choice than a bug. But it's a good design question.

There is a preliminary question to all this: is it a good idea to kill the server request handler if the client disconnects? I'm not really sure it is, as one could have some heavy processing server side after sending the response headers, and abruptly killing this processing if the client disconnects before the end of the body is a very opinionated choice. In the test example, the "resource leak" mentioned exists because the handler never ever returns, so the handler is a user-created resource leak in itself.

Now I do reckon that your point has merit, and stopping processing of a request when the client disconnects is a desirable feature in some if not most cases - it's just that failing to do so is a sub-optimal while doing so inappropriately is a straight bug. With that in mind I think it would be better to provide the expert request handler with the possibility to monitor client disconnection, and maybe provide some simpler default behavior in the simple interface. Happy to discuss this.

Thank you for your response!
We agree that there is motivation behind not defaulting to closing the connection automatically when the client disconnects and EOF is received. We therefore propose to have one more optional callback to configure this behaviour.
In cohttp-lwt/src/server.ml we proposed to add sleep_fn optional callback to check for EOF periodically.
We can add another one, callback_on_eof which will be triggered if EOF is detected. The default behaviour (if sleep_fn is provided and thus cohttp is configured to look for EOF) is to cancel the promise.
So the whole solution will be the following:

if the client wants to wait for EOF while handling the request, sleep_fn is provided
- if the client wants to interrupt request handling on EOF, callback_on_eof is not provided and cohttp uses Lwt.pick waiting for EOF
- if the client provides callback_on_eof then cohttp waits for EOF in Lwt.choose which will not interrupt the promise execution. If EOF is received, the callback is called and then it's up to the client what to do with the main handler

The new callback should accept conn in the same way as callback to identify the connection:

  type t = {
    callback : conn -> Cohttp.Request.t -> Body.t -> response_action Lwt.t;
    conn_closed : conn -> unit;
    sleep_fn : (unit -> unit Lwt.t) option;
    callback_on_eof : conn -> unit;
  }

Please let us know what you think, we would be happy to chat about it. We can also showcase this adaptation in a fixup commit, if that is clearer for you. Would you prefer to continue this discussion in mails/comments here, or would it be easier for you to have a call?

gabrielmoise17 added 2 commits June 11, 2024 10:51

[WIP] Add test for EOF leak

46bc5e7

Fix Cohttp EOF read with Lwt_unix recv MSG_PEEK

c29f936

gabrielmoise17 commented Jun 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cohttp EOF read #1035

Fix cohttp EOF read #1035

gabrielmoise17 commented Jun 11, 2024

gabrielmoise17 Jun 11, 2024

raphael-proust commented Jun 11, 2024

mefyl commented Jul 15, 2024

gabrielmoise17 commented Jul 24, 2024

Fix cohttp EOF read #1035

Are you sure you want to change the base?

Fix cohttp EOF read #1035

Conversation

gabrielmoise17 commented Jun 11, 2024

Motivation

Solution

PR format

Testing

gabrielmoise17 Jun 11, 2024

Choose a reason for hiding this comment

raphael-proust commented Jun 11, 2024

mefyl commented Jul 15, 2024

gabrielmoise17 commented Jul 24, 2024