Add tree walking functions #199

dgleich · 2024-07-07T11:53:48Z

This is an initial take at how to setup the tree walking codes.

This addresses #194.

I need to add more documentation still, but this should be enough to get some initial feedback before writing more docs.

…e type

dgleich · 2024-07-09T07:04:01Z

I switched the names to ...

leafpoints, leaf_points_indices, treeindex ... I'm also thinking that using treeroot would be more consistent with these than root...

KristofferC · 2024-07-09T08:40:57Z

Random thought, would it make sense to implement the https://github.com/JuliaCollections/AbstractTrees.jl interface for the trees in this package?

dgleich · 2024-07-09T10:53:21Z

Good question. Let me think....

dgleich · 2024-07-09T11:45:54Z

On a related note, currently, we can implement a parent function for BallTree nodes because it stores the associated regions explicitly, but I don't see an obvious way to do this for KDTree nodes without storing the min and max values for the dimension that was split in the tree.

There are some key advantages to having a parent function, e.g. then you can do tree traversal and iteration without a stack. (And I think some of the abstract trees methods would need this...)

On the other hand, for pure NearestNeighbors functions, storing this extra information is unneeded.

How amenable would you be to adding a split_minmax value to the KDTree struct that stores that information?

This would just store a tuple of values for the boundaries of the dimension that is split so they can be restored via a parent call.

KristofferC · 2024-07-11T11:38:11Z

src/tree_ops.jl

@@ -12,6 +12,175 @@ function show(io::IO, tree::NNTree{V}) where {V}
    print(io,   "  Reordered: ", tree.reordered)
 end

+struct NNTreeNode{T <: NNTree, R}
+    index::Int
+    tree::T


I have a general "philosophy" that storing something big (a full KD Tree) in something that is conceptually small (a tree node) is often a mistake.

As you traverse the tree you will create all these nodes that will all contain the same tree. What do you think about dropping the tree field and instead require a user to provide the tree a an argument to the traverse functions?

This is a good question and good rationale. My own experience has been that Julia is very good at optimizing the codes when the types are immutable, so I doubt it is really creating different copies if you use it in a function.

My argument for the current organization is that node ids are tied to the tree and so this makes it so that you don't have an additional argument hanging out everywhere..., it makes it easy and simple to write codes that do the right thing and get the answer right. But as I said, I hadn't considered your particular perspective here.

Is there a test we could do to resolve if this is an issue? (i.e to convince me that your perspective is correct, or for me to convince you it isn't a problem to store the tree and the compiler really is smart enough?)

Maybe, vectors of nodes would be bad for including the tree? But we do we ever actually need them?

Another argument for keeping it linked is that the AbstractTrees interface is 'node' oriented, so you define children, parent, etc. on a node level; which would require keeping the tree as part of the struct.

Okay -- you are right -- this does make a big difference. I took a trivial walk the list and count up the sizes of the leaves code that is just going to benchmark the traversal... (total number of points 100k) By storing the NNTree variable it takes ~131 μs. If I just do it by raw calls with node ids and passing the tree as a parameter to the function, it takes ~29 μs. But... if I store a ref to the tree rather than the full tree structure, then I get all the functionality and it takes ~45 μs. I think the latter is worth doing. So I'll implement that and update the pull request. Not that all of this skips the region computations for the KDTree, so that will shorten the difference.

But... if I store a ref to the tree rather than the full tree structure

I don't fully understand what that means.

Here's the updated structure. This stores a pointer to the tree information instead of a copy of all the information.

struct NNTreeNode{TreeRef <: Ref{ <: NNTree }, R} index::Int treeref::TreeRef region::R end

The issue I have with the iterate analogy is that iterate is designed to execute within a single function context -- and has some nice syntax to hide the complexity and different types of objects -- whereas most of the tree walking functions are designed to execute recursively, where there is no such affordance that I know of. So you'll have to pass the tree structure to any subfunction -- as well as the node structure.

The current design is just designed to be easy to use; it's also feasible to adapt to the AbstractTrees.jl interface (although I haven't done that yet...) where they do the same thing with parent/children/etc. functions.

But it seems like you are still leaning against it enough though there is minimal overhead, is that correct?

Just to be precise, the interface you would like is:

children(T::Tree, n::Node) -> (nl::Node, nr::Node) parent(T::Tree, n::Node) -> (p::Node) region(n::Node) -> leaf_points(T::Tree, n::Node) -> something that iterates over points in the leaf node etc...

where node is something simple like:

struct Node{R} index::Int region::R end

Just a quick nudge on this question of interface. Would love to get this wrapped up in the next week or so before some obligations for school starts.

Okay, since I had a moment, I just implemented the interface above. As a check, we can do non-recursive exploration of the tree using the current children, parent, next/prev sibling structure, see, the e.g. points iterator...

Hi, sorry for the slow response here and sorry for being a bit "annoying" with trying to figure out the "best" interface to use.

A reason for this is that this is my first Julia package so it holds a bit of a special place in my heart and I have also worked quite a bit to reduce memory footprint and improve performance.

I can add your package so it is tested as part of the CI here (and you could then at any time also implement whatever tree walking interface you want there and it will not be broken, or at least it can be updated if changes are made here that would be incompatible with it).

KristofferC · 2024-07-11T11:41:08Z

As a check of the functionality here it would be nice to reimplement https://github.com/KristofferC/NearestNeighbors.jl/blob/master/examples/balltree_illustration.ipynb using these official traverse functions. Doesn't strictly have to be done here but it would serve as somewhat of a use case check.

dgleich added 4 commits June 26, 2024 21:17

Add walking functions

30a1ac0

Working code that passes tests

857c66a

Update naming of functions.

536607b

Implement the range query with the new tree functions for the BallTre…

4204675

…e type

KristofferC reviewed Jul 11, 2024

View reviewed changes

dgleich added 2 commits July 22, 2024 17:12

Add parent function and tree refs instead of trees

da92557

Update to new interface with explicit tree driving

5eb006e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tree walking functions #199

Add tree walking functions #199

dgleich commented Jul 7, 2024 •

edited

Loading

dgleich commented Jul 9, 2024

KristofferC commented Jul 9, 2024 •

edited

Loading

dgleich commented Jul 9, 2024

dgleich commented Jul 9, 2024

KristofferC Jul 11, 2024

dgleich Jul 12, 2024

dgleich Jul 18, 2024

KristofferC Jul 23, 2024

dgleich Jul 23, 2024

dgleich Jul 24, 2024

dgleich Jul 24, 2024

dgleich Jul 29, 2024

dgleich Jul 31, 2024

KristofferC Aug 1, 2024

KristofferC commented Jul 11, 2024 •

edited

Loading

Add tree walking functions #199

Are you sure you want to change the base?

Add tree walking functions #199

Conversation

dgleich commented Jul 7, 2024 • edited Loading

dgleich commented Jul 9, 2024

KristofferC commented Jul 9, 2024 • edited Loading

dgleich commented Jul 9, 2024

dgleich commented Jul 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KristofferC commented Jul 11, 2024 • edited Loading

dgleich commented Jul 7, 2024 •

edited

Loading

KristofferC commented Jul 9, 2024 •

edited

Loading

KristofferC commented Jul 11, 2024 •

edited

Loading