Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] re-design of AND versus OR queries #255

Open
mpadge opened this issue Nov 30, 2021 · 11 comments
Open

[FEATURE] re-design of AND versus OR queries #255

mpadge opened this issue Nov 30, 2021 · 11 comments

Comments

@mpadge
Copy link
Member

mpadge commented Nov 30, 2021

@Mashin6 raised valid concerns in #252, supported by @stragu, that the current design in counter-intuitive:

OR

add_osm_features (features = c (...))

AND

add_osm_feature() |>
    add_osm_feature() |> ...

This needs to be improved, for which suggestions are requested!

@stragu
Copy link
Contributor

stragu commented Dec 3, 2021

Similar to what @Mashin6 said in the mentioned issue, I believe it would make a lot more sense for newcomers to:

  1. build a dataset by "getting" the results of various key-pair combinations, i.e. an OR operation, using get_osm_features(key = "xxx", value = "xxx") |> get_osm_features(key = "xxx", value = "xxx")
  2. use several key-value pairs that all need to be matched for the returned features, i.e. an AND operation, using get_osm_features(match_all = c(...))

The name get_osm_features() is chosen so it doesn't get confused with the existing add_osm_feature() and add_osm_features() functions (which could be deprecated over several osmdata versions, as the new get_osm_features() would replace them). The argument match_all leaves no space for misinterpretation.

That would fit better the way most people would read the code, replacing the pipe with "then":

  1. "get these OSM features, and then also get these OSM features, etc."
  2. "Add the OSM features that match_all these key-value pairs."

Adapting Mashin6's hypothetical example, a sample command would look like this:

opq() |>
   get_osm_features(match_all = c("natural = tree", "leaf_type = broadleaved")) |>
   get_osm_features(match_all = c("leisure = park", "name ~ public")) |>
   get_osm_features(key = "amenity", value = "bench")

... which would return an opq object that would retrieve broad leaved trees, benches, and parks with "public" in their name.

In the meantime, I'm wondering if aliasing add_osm_feature() and add_osm_features() with filter_osm_feature() and filter_osm_features() respectively would help clarifying how they behave, given that even the documentation draws a parallel with dplyr::filter() ?

@stragu
Copy link
Contributor

stragu commented Dec 3, 2021

I'm also thinking an ellipsis argument could be used instead of a match_all, if it is clear enough that all these values will be combined with "AND". You could then simplify the same command with:

opq() |>
   get_osm_features("natural = tree", "leaf_type = broadleaved") |>
   get_osm_features("leisure = park", "name ~ public") |>
   get_osm_features(key = "amenity", value = "bench")

Edit: hmm that could easily be misinterpreted as an OR.... e.g. people trying things like get_osm_features("building = house", "building = residential")

...or should there be two unambiguous arguments match_all and match_either, so users have the option not to pipe for OR operations?

@mpadge
Copy link
Member Author

mpadge commented Dec 3, 2021

Thanks @stragu, this will all start coming together in a branch after I've addressed #252. It's going to be difficult to do this well, but it is clearly pretty important, so please keep an eye on this issue and help along the way whenever you can. Thanks!

@Mashin6
Copy link
Contributor

Mashin6 commented Dec 4, 2021

This name change looks like a good strategy for transition.

@urswilke
Copy link

urswilke commented Dec 9, 2021

I just played a bit around with the package some years ago - so please don't consider my thoughts as important...

Building on @stragu's comments, I was just wondering, if you could pass named arguments to the ellipsis in the style of get_osm_features(amenity = "bench", leisure = "park")

I think it's completely OK that passing multiple corresponds to a logical AND, as I'm used to this by dplyr::filter().

A similar idea was to approach it as expressions that return a logical vector like dplyr::filter(), i.e.
get_osm_features(amenity == "bench" & leisure == "park").
However, that probably doesn't make sense. So I'm not sure if it makes sense to imitate this syntax...

Anyways thank you for the package! Really hope to use it more one day.

@jmaspons
Copy link
Collaborator

Is there any consensus about the API or somebody working on it?

The easier thing could be to modify the existing functions and make add_osm_feature add the new filters to the existing ones in the query (paste(q$features, new_features, collapse=" ")), and add_osm_features to add new items to q$features (c(q$features, new_features)).

I can try this approach and see what else is needed to make it work.

@jmaspons
Copy link
Collaborator

I found that:

  • add_osm_feature collapse features from add_osm_features if called first
  • if add_osm_feature is called first, features get eliminated after calling add_osm_features

I would like to make possible to chain add_osm_feature and add_osm_features regardless of the order.

@jmaspons
Copy link
Collaborator

Got the code working at jmaspons@2ccf6d1

@Mashin6
Copy link
Contributor

Mashin6 commented Dec 29, 2022

Thanks, that's a useful fix. Though I don't think it addresses the main point of this issue.
If someone unaware of how the insides of Overpass work would try to use add_osm_feature repeatedly, they would most likely expect retrieving more features. But currently they would receive less features with every subsequent use of that function, because it simply applies more filters.

IMO the solution would be to soft deprecate both add_osm_feature and add_osm_features for backwards compatibility and instead establish a new fuction e.g. already suggested get_osm_features.

  • Its parameters would work as filters (logical AND) get_osm_features(c(x=y, z=w, q=r))nwr[x=y][z=w][q=r];
  • And its every subsequent use would work as logical OR: get_osm_features(c(x=y, z=w, q=r)) |> get_osm_features(c(a=b, c=d))nwr[x=y][z=w][q=r]; nwr[a=b][c=d];

@jmaspons
Copy link
Collaborator

Yes, I agree that the current naming is not very explicit and can be improved, but the docs are quite clear. The problem I see is that the proposed solution is not as powerful, as it doesn't allow to chain features with AND or OR in arbitrary order. For example, is not possible to chain and OR followed but an AND as in

waterarea<- list(waterway='riverbank',
    landuse=c('reservoir', 'water', 'basin', 'salt_pond'),
    natural=c('lake', 'water'),
    amenity='swimming_pool',
    leisure = 'swimming_pool') # minzoom: 12
waterway_low<- list(waterway=c('river','canal')) # minzoom: 8, maxzoom: 12
placenames_small<- list(place=c('suburb','village','locality','hamlet','quarter','neighbourhood','isolated_dwelling','farm')) # minzoom: 10; maxzoom: 17
mainroad_label<- list(highway=c('primary', 'secondary', 'tertiary')) # minzoom: 12
leisure_label<- list(leisure=c('park', 'sports_centre', 'stadium', 'pitch')) # minzoom: 14

objects_osm<- c(waterarea, waterway_low, placenames_small, mainroad_label, leisure_label)

areaPPCC<- getbb("Països Catalans", format_out="data.frame")
consulta<- opq(areaPPCC, out="tags center", osm_types="nwr", timeout=500) %>%
  add_osm_features(features=objects_osm, value_exact=TRUE) %>%
  add_osm_feature(key=c("name", "!name:ca"))

With your proposal, @Mashin6, the code above would require much more verbose code and, in current main, is not possible (would require the bonus from jmaspons@8736310 , using paste_features to parse the features list also in add_osm_features )

A better solution should include different functions or a function with different parameters for AND and OR additions

@Mashin6
Copy link
Contributor

Mashin6 commented Jan 2, 2023

It's certainly an interesting use case. I don't know how often users run queries that involve adding the same set of filters to all statements. Personally would try to approach it with lapply or for loop, but having a shorthand function that does that seem like a good idea.
In that case I would suggest to introduce it as a brand new function with a less confusing name e.g. filter_all_features or add_osm_filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants