-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of HTTPHeaders.subscript(canonicalForm:)
#1952
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I've left a note in the diff.
Can I also ask you to add an allocation counter test for this? Perhaps in a separate PR so we get a nice before-and-after? Re-using either or both of the perf tests would be fine.
case UInt8(ascii: " "), | ||
UInt8(ascii: "\t"), | ||
UInt8(ascii: "\r"), | ||
UInt8(ascii: "\n"): | ||
return true | ||
|
||
default: | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently two are missing:
- Vertical tab
\v
- Form feed
\f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm disinclined to make this more complex. We're only really trying to catch the OWS
rule from the HTTP -semantics draft, which strictly forbids newline characters, and I don't think we really expect to see them, so arguably we could even remove the \r
and \n
cases. VTAB and form-feed should also be disregarded by this loop for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me tbh, although it says SHOULD
which I'm sure someone somewhere interprets as "not necessary"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, you can put a VTAB in there: the mistake would be for us to strip it, not for us to fail to. If your VTAB makes it to the app the app is naturally going to flip its lid: what we shouldn't do is treat it as though it was accidentally equivalent to SP or HTAB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we follow up with a PR to remove \r
and \n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or perhaps a good first issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think follow-up, happy to have you do it.
Do you expect a difference or is this just for completeness? (I ask because there is no difference) |
Completeness. We should actually add a test case that has a long header field name (more than 15 characters long both before and after the slice), because I had to do a lot of code reading to convince myself that allocating a |
Motivation: When getting the canonical form of header values any ascii whitespace is stripped from the values. The current implementation does this by doing equality checks on `Character` which is quite expensive. Modifications: - Update the `Substring.trimWhitespace()` function to trim on a UTF8 view Result: Retrieving the canonical form of header values is cheaper, significantly so when values contain whitespace.
e0fb43b
to
dd59d5c
Compare
@swift-nio-bot test perf please |
performance reportbuild id: 77 timestamp: Mon Sep 13 16:54:05 UTC 2021 results
comparison
significant differences found |
Relevant bits:
|
Motivation: Follow up to apple#1952 (comment) The OWS rule from the HTTP semantics draft only considers 'SP' and 'HTAB' to be whitespace. We also (unnecessarily) consider CR and LF to be whitespace. Modifications: - Remove CR and LF from the characters we consider to be whitespace - Update tests Result: We no longer consider CR or LF to be whitespace when trimming whitespace when producing the canonical form of header values.
…ues (#1954) Motivation: Follow up to #1952 (comment) The OWS rule from the HTTP semantics draft only considers 'SP' and 'HTAB' to be whitespace. We also (unnecessarily) consider CR and LF to be whitespace. Modifications: - Remove CR and LF from the characters we consider to be whitespace - Update tests Result: We no longer consider CR or LF to be whitespace when trimming whitespace when producing the canonical form of header values.
Motivation:
When getting the canonical form of header values any ascii whitespace
is stripped from the values. The current implementation does this by
doing equality checks on
Character
which is quite expensive.Modifications:
Substring.trimWhitespace()
function to trim on a UTF8view
Result:
Retrieving the canonical form of header values is cheaper, significantly
so when values contain whitespace.
On my machine: