-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Possible easy performance improvement: tuple/list literals -> set literals #7737
Comments
I suspect that there's no reason not to use set literals in those places (even if they're iterated over for some reason, a set is still an iterable). Any interest in preparing a PR or two? |
I'm unfortunately totally not set up to develop on Synapse; this discovery was the result of an idle browse through the code, and I don't expect to have the time any time soon to get things set up properly for testing and measuring the changes (especially as I use NixOS, which tends to make it a bit more work to get non-Nix development environments going). So I could prepare a PR, but it will probably take a while before I get around to it, and it looks like a relatively fast change for someone who already has a functioning testing/measuring environment set up anyway :)
While true, it's possible that an internal conversion back to a sequence has a higher runtime cost than is being gained by improving lookup performance. I don't really do Python, so I have no idea how this is implemented internally. The performance differences here are individually measured in nanoseconds, so it's not impossible for such a normally-insignificant difference to make things worse in this particular situation. |
I looked briefly and only found a couple of instances. Curious what parts you had in mind? |
There are quite a few cases (too many to list here) where there's an existence lookup in a literal that's specified either a) directly in the This is the regex I used to find them: I haven't looked through the entire resultset, but I've gone through probably 40-50 results, and most of them are either a dict lookup or a list/tuple presence check. I've been using the Python extension for VS Code to identify and discard many of the dict checks (through its type inference). |
One downside of this approach is that it has the potential to raise a >>> x = []
>>> x in ("foo", "bar")
False
>>> x in {"foo", "bar"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list' So a better microbenchmark might compare |
This doesn't really have to do with type annotations. There's quite a few places where we check user input against a list of constants to ensure it is valid. If the user input is not validated first it can be a non-hashable type, which raises an error. I have a WIP branch that adds some workarounds for that though. |
It does have to do with type annotations and validation, once type annotations have been set in place, static analysis can root out the bugs that eventually could cause that situation you described ( While If possible, I'd be willing to do this after annotations and checking is in place. P.S: If a developer has to do |
I disagree. You don't know the incoming types of JSON data against APIs.
I agree, but it is the reality right now. |
Then that JSON should be checked against specification when received, before being passed through for processing. If it's an actual variant type that's allowed by spec, then a |
We're saying the same thing. My point is that until that is done, this issue is harder to fix. |
Is there a tracking issue for the validation work that this depends on? |
I mentioned #8351, but that doesn't inherently fix the JSON spec validation side of it. |
This doesn't feel terribly actionable. We'd welcome PRs improve performance in this way, but I don't think it's a wholesale project we are likely to schedule. |
Description
In many places in the Synapse code, some variation of the following code exists:
These use tuple and list literals respectively, while the resulting tuples/lists are only ever used for presence checking of particular entries. Some quick microbenchmarking suggested, however, that set literals would be significantly faster here (relatively speaking), for both hits and misses:
Even for very small collections:
... pretty much only being slower - and even then, only marginally slower - when literally the first element in the collection is a hit:
While I have not analyzed it in detail, I strongly suspect that at least some of these checks are in a hot path, where they could provide a significant performance improvement - and I expect that blindly changing all non-iterated list/tuple literals to set literals across the codebase, could provide a significant performance improvement with very little work.
It appears that the performance benefit comes from list/tuple literals with constant values being compiled to tuples, whereas set literals with constant values are compiled to
frozenset
s, paying the entire set-building cost at compile time while incurring no runtime overhead.Steps to reproduce
N/A
Version information
Current
develop
HEAD.The text was updated successfully, but these errors were encountered: