Http optimizations #8002

asterite · 2019-07-27T00:25:48Z

These are a series of refactors to improve parsing of HTTP requests. Check each commit for more details (each has a description in it).

This is more code, more complex code, and more low-level code, but that's the whole idea of Crystal: if you want to go deeper and optimize hot paths you can do it in Crystal itself.

I wrote this benchmark:

require "benchmark"
require "http"

# # request.txt was generated using this code
# request = <<-REQUEST.lines.join("\r\n") + "\r\n\r\n"
# GET /hello.htm HTTP/1.1
# User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
# Host: www.tutorialspoint.com
# Accept-Language: en-us
# Accept-Encoding: gzip, deflate
# Connection: Keep-Alive
# REQUEST
# File.write("request.txt", request)

io = File.open("request.txt")

Benchmark.ips do |x|
  x.report("from_io") do
    io.rewind
    HTTP::Request.from_io(io)
  end
end

I read from a file to simulate reading from an external resource, though the entire file probably fits IO::Buffered's buffer (but the same should be true for Socket). But later I'll show the results with IO::Memory too.

If I run the above benchmark against master:

$ crystal foo.cr --release
from_io 335.90k (  2.98µs) (± 1.07%)  2.1kB/op  fastest

When against this PR:

$ bin/crystal foo.cr --release
from_io 495.93k (  2.02µs) (± 0.85%)  816B/op  fastest

So a 30% improvement! 😄

Also note the memory allocated per op: 2.1kB before, now 816B. This is the main reason it's faster.

If I use IO::Memory instead of File I get:

Before: from_io 509.34k (  1.96µs) (± 0.75%)  2.1kB/op  fastest
After:  from_io   1.01M (991.82ns) (± 0.75%)  816B/op  fastest

Since HTTP servers are pretty common in Crystal I thought this is a good way to optimize all apps out there. I know usually a lot more goes on in a typical web server (for example rendering) but the less time and memory the framework takes for itself, the better.

I didn't benchmark an HTTP::Server with ab or similar after this change, but you are welcome to do that and post the results here!

src/http/headers.cr

src/http/request.cr

spec/std/http/headers_spec.cr

Instead of storing headers internally as `Hash(String, Array(String))` we store them as `Hash(String, String | Array(String))`. This involves a bit more logic when dealing with this union but it saves a fair amount of memory because for headers with just a single value, which is the most common case, we avoid allocating memory for an array.

src/http/common.cr

src/http/request.cr

src/http/common.cr

src/http/request.cr

asterite · 2019-07-27T13:41:48Z

Sorry about the constant rebase, next time I'll keep adding commits to make it easier to review and just at the end I'll rebase everything before merging (I learned rebase -i and now I can't stop, hehe :-P)

Instead of using `String#split`, which creates an array with three strings, we find the space indexes and create subslices/substrings for each of the pieces. We also avoid allocating a string for common HTTP methods (GET, POST, etc.) and for the supported HTTP versions. Finally, we use `IO#peek` to see if we can find the request line there instead of allocating a String for it.

straight-shoota · 2019-07-27T15:02:58Z

git commit --fixup is great for this. It annotates a commit as fixup of a previous one and with git rebase -i --autosquash they get automatically inserted at the right place.

asterite · 2019-07-27T15:53:07Z

Ooooh... I didn't know that. I'll try it next time. Thanks!

src/http/request.cr

When creating an `HTTP::Request` and passing it some `HTTP::Headers`, the headers are dupped to prevent the request from modifying data that the user might hold. However, dupping the headers when parsing a request from an IO is not necessary. This avoid some unneeded memory allocations.

We try to use `IO#peek` and read header lines directly from there, avoiding an extra String allocation for the entire request line. Then we avoid allocating strings for common header field names like `Host` and `Content-Length`).

straight-shoota · 2019-07-29T17:11:59Z

Thank you @asterite

RX14 · 2019-07-29T17:18:00Z

I was still reviewing this and had changes to request :<

straight-shoota · 2019-07-29T17:26:42Z

Oh, I'm sorry 😢 I should've just merged it yesterday right away^^

Just a suggestion: When I'm reviewing a PR that might get merged while doing that, I tend do request a review from myself to signal that I'm currently looking at it (or intend to do so).

src/http/request.cr

src/http/common.cr

asterite · 2019-07-29T18:28:47Z

By the way, I benchmarked the simple http server on the samples directory before and after this change with ab. It's just a silly example but it can show whether this change had an effect on the overall http roundtrip.

Doing:

ab -k -c 100 -n 200000 127.0.0.1:8080/

Before:

Time taken for tests:   1.918 seconds
Requests per second:    104299.05 [#/sec] (mean)
Time per request:       0.959 [ms] (mean)
Time per request:       0.010 [ms] (mean, across all concurrent requests)
Transfer rate:          10287.31 [Kbytes/sec] received

After:

Time taken for tests:   1.716 seconds
Requests per second:    116551.95 [#/sec] (mean)
Time per request:       0.858 [ms] (mean)
Time per request:       0.009 [ms] (mean, across all concurrent requests)
Transfer rate:          11495.85 [Kbytes/sec] received

And I'm almost sure if we change Hash to have an open addressing implementation it could go up to 121750.26 requests per second (just a number out of the top of my head 😉).

By comparison, doing the same benchmark against a simple server in Go gives these results:

Time taken for tests:   2.767 seconds
Requests per second:    72285.75 [#/sec] (mean)
Time per request:       1.383 [ms] (mean)
Time per request:       0.014 [ms] (mean, across all concurrent requests)
Transfer rate:          10800.51 [Kbytes/sec] received

However, Go handles parallelism and when we'll have parallelism the performance will get a bit worse, but on the other hand a single request doing expensive CPU won't be able to stop the server from receiving other requests.

RX14 · 2019-07-29T18:40:20Z

@asterite could you test with wrk instead in the future? It gives much better and more realistic results, and is the industry standard http benchmarking tool these days.

asterite · 2019-07-29T19:06:45Z

@RX14 Sure! Here it it:

wrk -t12 -c400 -d30s http://127.0.0.1:8080/

(I don't know if those values are good, I just copied them from their github repo)

Before this PR:

Running 30s test @ http://127.0.0.1:8080/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.51ms  750.62us  14.12ms   80.25%
    Req/Sec     9.31k   687.19    26.23k    96.78%
  3339715 requests in 30.10s, 321.69MB read
  Socket errors: connect 0, read 244, write 0, timeout 0
Requests/sec: 110942.98
Transfer/sec:     10.69MB

After this PR:

Running 30s test @ http://127.0.0.1:8080/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.26ms  603.93us   9.90ms   82.14%
    Req/Sec     9.92k   594.01    14.69k    90.38%
  3562875 requests in 30.10s, 343.18MB read
  Socket errors: connect 0, read 239, write 0, timeout 0
Requests/sec: 118355.73
Transfer/sec:     11.40MB

With a "hypothetical" Hash with open addressing:

Running 30s test @ http://127.0.0.1:8080/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.13ms  446.90us  12.34ms   81.88%
    Req/Sec    10.33k   434.48    12.97k    83.14%
  3699575 requests in 30.00s, 356.35MB read
  Socket errors: connect 0, read 237, write 0, timeout 0
Requests/sec: 123302.09
Transfer/sec:     11.88MB

Go:

Running 30s test @ http://127.0.0.1:8080/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.78ms    7.57ms 336.03ms   94.45%
    Req/Sec     8.50k     1.07k   45.00k    96.03%
  3046719 requests in 30.10s, 374.82MB read
  Socket errors: connect 0, read 250, write 0, timeout 0
Requests/sec: 101219.02
Transfer/sec:     12.45MB

So the results are very different from ab but we can see each optimization is a bit noticeable. Also Go does pretty good too.

RX14 · 2019-07-29T19:23:13Z

@asterite is that Go with GOMAXPROCS=1 or is go actually using all cores and still losing?

RX14 · 2019-07-29T19:23:57Z

Also, are you going to address the review?

asterite · 2019-07-29T19:33:18Z

@asterite is that Go with GOMAXPROCS=1 or is go actually using all cores and still losing?

No, that's without specifying GOMAXPROCS (so I guess using all cores).

If I pass GOMAXPROCS=1 I get:

Running 30s test @ http://127.0.0.1:8080/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    18.14ms   88.69ms   2.00s    98.73%
    Req/Sec     3.14k   339.80     8.62k    85.96%
  1127120 requests in 30.10s, 138.66MB read
  Socket errors: connect 0, read 397, write 0, timeout 118
Requests/sec:  37442.65
Transfer/sec:      4.61MB

So I guess if with parallelism we can be close to Go numbers it'll be more than good ( /cc @waj @bcardiff )

Also, are you going to address the review?

Yes, but in a couple of hours.

crystal-lang/crystal#8002

asterite added performance kind:refactor topic:stdlib:networking labels Jul 27, 2019

straight-shoota reviewed Jul 27, 2019

View reviewed changes

src/http/headers.cr Outdated Show resolved Hide resolved

src/http/request.cr Show resolved Hide resolved

src/http/request.cr Show resolved Hide resolved

src/http/request.cr Outdated Show resolved Hide resolved

asterite force-pushed the http-optimizations branch 4 times, most recently from 454f744 to d8e2294 Compare July 27, 2019 11:53

straight-shoota reviewed Jul 27, 2019

View reviewed changes

spec/std/http/headers_spec.cr Outdated Show resolved Hide resolved

asterite force-pushed the http-optimizations branch from 1ddca5b to 023be0a Compare July 27, 2019 13:27

j8r reviewed Jul 27, 2019

View reviewed changes

src/http/common.cr Outdated Show resolved Hide resolved

src/http/request.cr Outdated Show resolved Hide resolved

src/http/common.cr Outdated Show resolved Hide resolved

src/http/request.cr Outdated Show resolved Hide resolved

asterite force-pushed the http-optimizations branch from 023be0a to 258cd80 Compare July 27, 2019 13:39

asterite force-pushed the http-optimizations branch from 258cd80 to 569d813 Compare July 27, 2019 14:18

straight-shoota reviewed Jul 27, 2019

View reviewed changes

src/http/request.cr Outdated Show resolved Hide resolved

asterite added 2 commits July 27, 2019 20:10

HTTP::Request: optimize parsing of header lines

614d0f9

We try to use `IO#peek` and read header lines directly from there, avoiding an extra String allocation for the entire request line. Then we avoid allocating strings for common header field names like `Host` and `Content-Length`).

asterite force-pushed the http-optimizations branch from 569d813 to 614d0f9 Compare July 27, 2019 23:11

straight-shoota approved these changes Jul 28, 2019

View reviewed changes

straight-shoota added this to the 0.30.0 milestone Jul 29, 2019

straight-shoota merged commit fe7e663 into crystal-lang:master Jul 29, 2019

RX14 reviewed Jul 29, 2019

View reviewed changes

src/http/request.cr Show resolved Hide resolved

src/http/request.cr Show resolved Hide resolved

src/http/request.cr Show resolved Hide resolved

src/http/common.cr Show resolved Hide resolved

src/http/common.cr Show resolved Hide resolved

straight-shoota mentioned this pull request Jul 29, 2019

HTTP::Server long url 400 Bad Request #7838

Closed

asterite mentioned this pull request Jul 29, 2019

HTTP::Request: follow up fixes #8009

Merged

asterite deleted the http-optimizations branch July 30, 2019 12:00

asterite mentioned this pull request Jul 30, 2019

Hash now uses an open addressing algorithm #8017

Merged

mamantoha added a commit to mamantoha/crest that referenced this pull request Jul 31, 2019

fix headers for Crystal 0.30.0

be05000

crystal-lang/crystal#8002

straight-shoota mentioned this pull request Aug 8, 2019

HTTP::Headers gets downcase #8060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Http optimizations #8002

Http optimizations #8002

asterite commented Jul 27, 2019

asterite commented Jul 27, 2019

straight-shoota commented Jul 27, 2019

asterite commented Jul 27, 2019

straight-shoota commented Jul 29, 2019

RX14 commented Jul 29, 2019

straight-shoota commented Jul 29, 2019

asterite commented Jul 29, 2019

RX14 commented Jul 29, 2019

asterite commented Jul 29, 2019

RX14 commented Jul 29, 2019 •

edited

Loading

RX14 commented Jul 29, 2019

asterite commented Jul 29, 2019

Http optimizations #8002

Http optimizations #8002

Conversation

asterite commented Jul 27, 2019

asterite commented Jul 27, 2019

straight-shoota commented Jul 27, 2019

asterite commented Jul 27, 2019

straight-shoota commented Jul 29, 2019

RX14 commented Jul 29, 2019

straight-shoota commented Jul 29, 2019

asterite commented Jul 29, 2019

RX14 commented Jul 29, 2019

asterite commented Jul 29, 2019

RX14 commented Jul 29, 2019 • edited Loading

RX14 commented Jul 29, 2019

asterite commented Jul 29, 2019

RX14 commented Jul 29, 2019 •

edited

Loading