Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of HPRtree #1012

Merged
merged 12 commits into from
Jan 2, 2024
Merged

Improve performance of HPRtree #1012

merged 12 commits into from
Jan 2, 2024

Conversation

msbarry
Copy link
Contributor

@msbarry msbarry commented Nov 4, 2023

While profiling performance of planetiler I noticed that STRtree in MCIndexNoder was one of the bottlenecks. I looked into HPRtree and saw there were a few opportunities to improve performance using some ideas from the flatbush library:

  • compute hilbert indexes only once for each item while sorting
  • limit the sort so it just sorts items into the correct node, but doesn't bother sorting within the node
  • extract item bounds and values into arrays for faster lookups during query

This version performs faster in some of the microbenchmarks (especially when result sets are large) but when I swap out STRtree with this improved HPRtree planetiler spends about 30% less time in MCIndexNoder.computeNodes, isValid checks, and about 10% less time in the most expensive operations planetiler uses: GeometryPrecisionReducer.reduce and bufferUnionUnbuffer.

Signed-off-by: Mike Barry <msb5014@gmail.com>
Signed-off-by: Mike Barry <msb5014@gmail.com>
Signed-off-by: Mike Barry <msb5014@gmail.com>
Signed-off-by: Mike Barry <msb5014@gmail.com>
@dr-jts
Copy link
Contributor

dr-jts commented Nov 6, 2023

The code is certainly nicer with generics. Do you think the compile warnings are going to be obnoxious for downstream users?

Generics have been asked for for a while for index containers, so maybe now is the time to add them.

@msbarry
Copy link
Contributor Author

msbarry commented Nov 6, 2023

I don't think they would be too bad for consumers if it means they can get rid of instanceof checks and casting and don't cause any compile failures.

I could go and update the other spatial index classes too. Think I should do that in this PR or a separate one? It might touch a lot of files.

@dr-jts
Copy link
Contributor

dr-jts commented Nov 6, 2023

I could go and update the other spatial index classes too. Think I should do that in this PR or a separate one? It might touch a lot of files.

No, that should be done in a separate PR. Actually I'm still wondering if SpatialIndex generics are too much for this PR. I hate to lose all the work you've done - but perhaps you can move it to a different PR?

@msbarry
Copy link
Contributor Author

msbarry commented Nov 6, 2023

No worries. Sounds good to keep this one focused on the performance improvements. I'll move generics out into another pr

Signed-off-by: Mike Barry <msb5014@gmail.com>
Signed-off-by: Mike Barry <msb5014@gmail.com>
@msbarry
Copy link
Contributor Author

msbarry commented Nov 7, 2023

For the generics PR, there are a bunch of things that it could include but it starts to get big pretty fast:

parameterizing the external API of index classes

  • SpatialIndex and its subclasses (small)
  • ItemVisitor and its subclasses (small)
  • every other index in jts.index package (medium)

parameterizing the internal implementation of index classes

  • STRtree/AbstractSTRtree/SIRtree (medium)
  • KDTree (medium)
  • each other is probably small-medium

updating places in the code that use these indexes to benefit from the parameterized versions:

  • STRtree (medium/large)
  • KDTree (medium/large)
  • Quadtree (medium)
  • each other one is probably small

What do you think the scope of an initial PR should be?

@dr-jts
Copy link
Contributor

dr-jts commented Nov 7, 2023

That's a great list, @msbarry.

What do you think the scope of an initial PR should be?

Ideally - all of it! Or, at least SpatialIndex implementors and usage.

@dr-jts
Copy link
Contributor

dr-jts commented Nov 22, 2023

Sorry, @msbarry , I've been delayed on getting on to merging this PR.

Why does switching to using two arrays instead of ArrayList<Item> improve performance?

@msbarry
Copy link
Contributor Author

msbarry commented Dec 2, 2023

@dr-jts thanks for getting back, I think it's because ArrayList<Item> does more memory reads scattered across the heap when processing the items in one node: item -> envelope then all of the doubles in the envelope are nearby, and item -> data for any match.

The biggest gain comes from double[] itemBounds because it can do one hop to find the doubles to check against, and they are nearby in ram for a given node.

Moving to Object[] itemValues is a smaller improvement since it only saves one memory hop. It makes HPRTree run roughly 10x faster than flatbush js library on their posted benchmarks but that's probably because the benchmarks match a lot of items and don't do anything with them so the JVM doesn't even need to dereference the item value. In a more real-world scenario where you're looking for a needle in a haystack and then doing something with I'd imagine the difference would be smaller.

That being said JVM performance often defies my intuition and this represents the fastest I could get the benchmarks and planetiler code that uses JTS a lot to run through mostly trial and error 😄

@dr-jts
Copy link
Contributor

dr-jts commented Dec 5, 2023

Moving to Object[] itemValues is a smaller improvement since it only saves one memory hop. It makes HPRTree run roughly 10x faster than flatbush js library on their posted benchmarks

@msbarry did you port the Flatbush benchmark code? If so it would be nice to include that in this PR. Performance tests go in this package, and can use the performance test harness framework via PerformanceTestCase.

@msbarry
Copy link
Contributor Author

msbarry commented Dec 7, 2023

@msbarry did you port the Flatbush benchmark code? If so it would be nice to include that in this PR. Performance tests go in this package, and can use the performance test harness framework via PerformanceTestCase.

Sure thing - I ported the tests over and have them run against STRTree and HPRtree. Here are the results from my 2021 m1 macbook pro in comparison to flatbush on node.js 20.3.0:

  flatbush js strtree hprtree before hprtree after
build 1m rectangles 180ms 1001ms 1103ms 241ms
query 0.01% 4.5ms 14ms 11ms 4ms
query 1% 37ms 165ms 82ms 12ms
query 10% 295ms 1227ms 510ms 61ms
Raw hprtree/strtree benchmark output

master d26cef9

HPRTree Build time = 1103 ms
STRTree Build time = 1017 ms
----- Query size: 1
HPRTree query result items = 101786
runQueriesHPR : 11 ms
STRTree query result items = 101786
runQueriesSTR : 15 ms
----- Query size: 10
HPRTree query result items = 3046824
runQueriesHPR : 82 ms
STRTree query result items = 3046824
runQueriesSTR : 173 ms
----- Query size: 31
HPRTree query result items = 25770594
runQueriesHPR : 510 ms
STRTree query result items = 25770594
runQueriesSTR : 1247 ms

this branch 0001c16

HPRTree Build time = 241 ms
STRTree Build time = 1001 ms
----- Query size: 1
HPRTree query result items = 101786
runQueriesHPR : 4 ms
STRTree query result items = 101786
runQueriesSTR : 14 ms
----- Query size: 10
HPRTree query result items = 3046824
runQueriesHPR : 12 ms
STRTree query result items = 3046824
runQueriesSTR : 165 ms
----- Query size: 31
HPRTree query result items = 25770594
runQueriesHPR : 61 ms
STRTree query result items = 25770594
runQueriesSTR : 1227 ms
Raw flatbush benchmark output
❯ node bench          
1000000 rectangles
node size: 16

+ flatbush: 178.922ms
index size: 38,400,092
+ 1000 searches 10%: 294.68ms
+ 1000 searches 1%: 37.187ms
+ 1000 searches 0.01%: 4.476ms
1000 searches of 100 neighbors: 18.683ms
1 searches of 1000000 neighbors: 111.913ms
100000 searches of 1 neighbors: 482.469ms

rbush: 843.683ms
1000 searches 10%: 640.872ms
1000 searches 1%: 155.409ms
1000 searches 0.01%: 17.523ms
1000 searches of 100 neighbors: 47.962ms
1 searches of 1000000 neighbors: 271.478ms
100000 searches of 1 neighbors: 1.212s

Since Flatbush builds in ~180ms but HPRtree takes ~250ms there may be more room for improvement when building... It looks like of those 250ms, it's:

  • 111ms in quickSortItemsIntoNodes (105ms of that in hoarePartition)
  • 75ms in prepareItems (flatbush doesn't need to do this since it only stores envelopes and returns the item index when querying)
  • 30ms in computeLeafNodeBounds
  • 18ms in HilbertEncoder.encode

@msbarry
Copy link
Contributor Author

msbarry commented Jan 1, 2024

Hello, just checking in here, are there any other tests to run or changes to make before merging this?

@dr-jts
Copy link
Contributor

dr-jts commented Jan 1, 2024

Hello, just checking in here, are there any other tests to run or changes to make before merging this?

No, I think this looks good. I'll merge soon.

@dr-jts dr-jts merged commit 59f6482 into locationtech:master Jan 2, 2024
2 checks passed
@jodygarnett jodygarnett added this to the 1.20.0 milestone Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants