Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose clustering functions to C API #823

Open
mwtoews opened this issue Feb 13, 2023 · 4 comments
Open

Expose clustering functions to C API #823

mwtoews opened this issue Feb 13, 2023 · 4 comments
Assignees
Labels
Enhancement New feature or feature improvement.
Milestone

Comments

@mwtoews
Copy link
Contributor

mwtoews commented Feb 13, 2023

This is a enhancement reminder request to expose clustering functions from #688 to the C API.

Following the previous discussion, there are several strategies to implement this, whether it is via some form of GeometryCollection or array(s) of geometries and/or cluster IDs, etc.

@mwtoews mwtoews added the Enhancement New feature or feature improvement. label Feb 13, 2023
@dbaston
Copy link
Member

dbaston commented Feb 20, 2023

Any thoughts on an API that would work well for shapely?

@mwtoews
Copy link
Contributor Author

mwtoews commented Feb 20, 2023

Generally, arrays of ints representing cluster IDs would probably work best. Shapely is now array-oriented via the NumPy C-API.

I'm guessing the user would first collect their input geometry array:

input = GEOSGeom_createCollection(GEOS_GEOMETRYCOLLECTION, geoms, n);

perhaps an alternative non-owning variant of GeometryCollection, e.g. GeometryArray could be used instead:

input = GEOSGeometryArray_create(geoms, n);

After input geometries are collected, then use a new function that could be re-used by a few different clustering methods with optional distance parameter:

extern int GEOS_DLL *GEOSClusterFinder_create(
    const GEOSGeometry* input,
    const int method, /* e.g. DBSCAN, ClusterWithin, or other method enumerated by `enum GEOSClusterMethods` */
    const double distance, /* if needed by method, otherwise ignore */
    int* clusterIds);

where clusterIds is an array the same size as the input geometry array. As for ID values, perhaps reserve 0 for "no cluster assigned", otherwise positive 1, 2, 3,...
I'm not sure about the ownership of clusterIds, but it could be created by the users, since it has a known size. A follow-up GEOSClusterFinder_destroy() would be expected at the end (and possibly GEOSGeometryArray_destroy() depending on how input is done).

@caspervdw and @jorisvandenbossche might have thoughts on the best approach too.

@dr-jts
Copy link
Contributor

dr-jts commented Feb 22, 2023

+1 for this.

The propose mechanism of providing input geometries and returning cluster information sounds in the right direction. I'd suggest just having different functions for each clustering method, using the same calling pattern. This allows different parameter(s) for clustering methods, if required. Also makes documentation more straightforward.

@dr-jts
Copy link
Contributor

dr-jts commented Feb 22, 2023

Just to be clear, there is currently no such structure as GEOSGeometryArray. But such a structure would have sme advantages over GeometryCollection:

  • it can allow null or empty elements
  • perhaps it's implementation could be simpler?

I've been working on functions that operate on Simple Polygonal Coverages, and this kind of structure would be useful to define a C API for them as well.

@dbaston dbaston self-assigned this Aug 31, 2024
@dbaston dbaston added this to the 3.13.0 milestone Aug 31, 2024
@pramsey pramsey modified the milestones: 3.13.0, 3.14.0 Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or feature improvement.
Projects
None yet
Development

No branches or pull requests

4 participants