Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworked header lines logic #968

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

Vlix
Copy link
Contributor

@Vlix Vlix commented Jan 4, 2024

Before submitting your PR, check that you've:

  • Bumped the version number

After submitting your PR:

  • Update the Changelog.md file with a link to your PR
  • Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

Had a go at making the header lines parsing more efficient.

I want to get some feedback on this first, and I'll want to check the benchmarks as well to see if this doesn't make anything worse. (Maybe add some benchmarks too)

I'm also wondering if we want to support multiline functionality in headers, still? The RFC 7230 states it deprecates this functionality. This RFC is from almost 10 years ago. It would make the parsing a lot easier.

@kazu-yamamoto
Copy link
Contributor

Let's throw away the multiple line support to make the code simpler.

@Vlix
Copy link
Contributor Author

Vlix commented Jan 9, 2024

If we do that, we'll just quickly split on any (/r)/n and let the validity checking be done by the header parser, right?

@kazu-yamamoto
Copy link
Contributor

I think so.

@Vlix Vlix mentioned this pull request Jan 14, 2024
@Vlix Vlix force-pushed the reworked-headerLines-logic branch from 5092f23 to 479b9a2 Compare January 14, 2024 21:42
@Vlix
Copy link
Contributor Author

Vlix commented Jan 14, 2024

This is rebased on top of vlix:format-files-with-cpp #973, so first merge in that one to get a better diff of this PR.

After this we don't accept multiline headers anymore, so should we bump the major version?
Also, we're not actually rejecting requests with invalidly formatted headers, we just add them as ("string", ""). So should we throw an InvalidRequest exception ASAP, or do we want to defer that decision to users of warp?

@Vlix Vlix force-pushed the reworked-headerLines-logic branch from 7f38acf to b017ce1 Compare January 14, 2024 22:00
@Vlix
Copy link
Contributor Author

Vlix commented Jan 15, 2024

Hmmm, I'll probably have to rebase on master again for the formatting commits to be excluded from this PR 🤔 will do that later today, hopefully.

@kazu-yamamoto
Copy link
Contributor

I have done it already on my local branch.
I felt that stash for all commits are beneficial for reviewing.

@Vlix Vlix force-pushed the reworked-headerLines-logic branch from b017ce1 to 40e51a2 Compare January 16, 2024 23:46
@Vlix Vlix marked this pull request as ready for review January 17, 2024 01:16
@Vlix
Copy link
Contributor Author

Vlix commented Jan 17, 2024

So only two issues before we might merge this:

  • Should this be a major version bump? (i.e. 3.4.0)
  • Should we throw an exception on bad headers? Since they have specs in the RFC and we're just letting them go through. (with illegal bytes, or with missing colons, we don't abort at all)

@kazu-yamamoto
Copy link
Contributor

  • Should this be a major version bump? (i.e. 3.4.0)

Probably yes.

  • Should we throw an exception on bad headers? Since they have specs in the RFC and we're just letting them go through. (with illegal bytes, or with missing colons, we don't abort at all)

#972 provides such an exception. We might want to integrate it into warp.

@Vlix
Copy link
Contributor Author

Vlix commented Jan 17, 2024

I'll take a look at it. Hopefully this weekend.

I do feel it's better to abort early, though we should keep the code fast for correct requests, if those are ever at odds.

@Vlix Vlix marked this pull request as draft January 17, 2024 01:50
@Vlix
Copy link
Contributor Author

Vlix commented Jan 20, 2024

Added benchmarks and adjusted according to results.

The number after the bench name is the amount of chunks the request is split into.
So it's obvious that smaller/more chunks take longer, given the same size request.

In short:

3.3.13          3.4.0           3.4.0 ByteString.Builder

benchmarked parsing request/new parsing 4
time 1.728 μs   time 1.039 μs   time 2.240 μs

benchmarked parsing request/new parsing 10
time 2.519 μs   time 1.830 μs   time 5.322 μs

benchmarked parsing request/new parsing 25
time 4.798 μs   time 3.648 μs   time 9.501 μs

benchmarked parsing request/new parsing 100
time 12.45 μs   time 10.64 μs   time 18.19 μs
In full:
<warp 3.3.13>                                            <warp 3.4.0 ByteString>                                  <warp 3.4.0 ByteString.Builder>

warp> benchmarks                                         warp> benchmarks                                         warp> benchmarks
Running 1 benchmarks...                                  Running 1 benchmarks...                                  Running 1 benchmarks...
Benchmark parser: RUNNING...                             Benchmark parser: RUNNING...                             Benchmark parser: RUNNING...
benchmarked parsing request/new parsing 4                benchmarked parsing request/new parsing 4                benchmarked parsing request/new parsing 4
time                 1.728 μs   (1.687 μs .. 1.783 μs)   time                 1.039 μs   (1.027 μs .. 1.062 μs)   time                 2.240 μs   (2.185 μs .. 2.334 μs)
                     0.995 R²   (0.991 R² .. 0.999 R²)                        0.998 R²   (0.996 R² .. 1.000 R²)                        0.996 R²   (0.991 R² .. 0.999 R²)
mean                 1.743 μs   (1.724 μs .. 1.799 μs)   mean                 1.042 μs   (1.037 μs .. 1.051 μs)   mean                 2.416 μs   (2.377 μs .. 2.475 μs)
std dev              95.93 ns   (51.89 ns .. 194.6 ns)   std dev              22.63 ns   (14.73 ns .. 32.97 ns)   std dev              155.7 ns   (121.4 ns .. 219.8 ns)

benchmarked parsing request/new parsing 10               benchmarked parsing request/new parsing 10               benchmarked parsing request/new parsing 10
time                 2.519 μs   (2.468 μs .. 2.598 μs)   time                 1.830 μs   (1.809 μs .. 1.859 μs)   time                 5.322 μs   (5.050 μs .. 5.584 μs)
                     0.998 R²   (0.996 R² .. 1.000 R²)                        0.999 R²   (0.998 R² .. 1.000 R²)                        0.990 R²   (0.985 R² .. 0.997 R²)
mean                 2.494 μs   (2.481 μs .. 2.519 μs)   mean                 1.815 μs   (1.810 μs .. 1.824 μs)   mean                 5.251 μs   (5.195 μs .. 5.324 μs)
std dev              56.04 ns   (32.73 ns .. 86.59 ns)   std dev              21.77 ns   (13.37 ns .. 35.44 ns)   std dev              217.5 ns   (181.3 ns .. 290.0 ns)

benchmarked parsing request/new parsing 25               benchmarked parsing request/new parsing 25               benchmarked parsing request/new parsing 25
time                 4.798 μs   (4.746 μs .. 4.853 μs)   time                 3.648 μs   (3.567 μs .. 3.743 μs)   time                 9.501 μs   (9.324 μs .. 9.775 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)                        0.994 R²   (0.986 R² .. 0.999 R²)                        0.997 R²   (0.994 R² .. 1.000 R²)
mean                 4.804 μs   (4.781 μs .. 4.850 μs)   mean                 3.647 μs   (3.600 μs .. 3.762 μs)   mean                 9.743 μs   (9.665 μs .. 9.849 μs)
std dev              106.3 ns   (65.26 ns .. 173.5 ns)   std dev              233.2 ns   (90.28 ns .. 422.4 ns)   std dev              309.9 ns   (237.0 ns .. 482.5 ns)

benchmarked parsing request/new parsing 100              benchmarked parsing request/new parsing 100              benchmarked parsing request/new parsing 100
time                 12.45 μs   (12.23 μs .. 12.83 μs)   time                 10.64 μs   (10.50 μs .. 10.87 μs)   time                 18.19 μs   (17.89 μs .. 18.61 μs)
                     0.998 R²   (0.995 R² .. 1.000 R²)                        0.998 R²   (0.994 R² .. 1.000 R²)                        0.998 R²   (0.995 R² .. 1.000 R²)
mean                 12.31 μs   (12.27 μs .. 12.40 μs)   mean                 10.65 μs   (10.59 μs .. 10.75 μs)   mean                 18.49 μs   (18.36 μs .. 18.66 μs)
std dev              210.4 ns   (97.23 ns .. 392.0 ns)   std dev              261.1 ns   (156.1 ns .. 383.1 ns)   std dev              480.0 ns   (323.8 ns .. 741.7 ns)

Benchmark parser: FINISH                                 Benchmark parser: FINISH                                 Benchmark parser: FINISH

kazu-yamamoto added a commit to kazu-yamamoto/wai that referenced this pull request Jan 22, 2024
@kazu-yamamoto
Copy link
Contributor

Rebased, added PackIntSpec to the cabal and merged.

I will release a new version of warp soon to support a new major version of http2.

@Vlix
Copy link
Contributor Author

Vlix commented Jan 22, 2024

Rebased, added PackIntSpec to the cabal and merged.

I will release a new version of warp soon to support a new major version of http2.

Cool 👍 I'll be chipping away at this PR from time to time to hopefully improve the parsing of requests.

@Vlix
Copy link
Contributor Author

Vlix commented Jan 22, 2024

Wait, why did you close this PR?

@kazu-yamamoto
Copy link
Contributor

My bad.
I did not read the last line, sorry.

@kazu-yamamoto kazu-yamamoto reopened this Jan 22, 2024
@Vlix
Copy link
Contributor Author

Vlix commented May 21, 2024

Note to self:

  • Check Ocramz safe parserequestbodyex #963 #972 for ideas to implement in core wai
  • sanitizeHeaders seems to also be a "multiline header" compatibility addition (for responses).
    This isn't necessary anymore according to RFC9110 and any newline should just be replaced with a space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants