Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source filters #2141

Closed
ivanmalagon opened this issue Jun 14, 2018 · 18 comments
Closed

Source filters #2141

ivanmalagon opened this issue Jun 14, 2018 · 18 comments
Assignees
Labels
Milestone

Comments

@ivanmalagon
Copy link
Contributor

Context

After seeing the customers using CARTO.js v4 in the wild, we've realized that they struggle when it comes to filtering sources. Right now, they have to add helper functions to translate their widgets filters to a source node. This doesn't scalate.

We need to add another object to ease source nodes filtering:

  • Filter by category, like in SELECT * FROM wadus WHERE hood IN ['brooklyn', 'queens].
  • Filter by range, like in SELECT * FROM wadus WHERE price >= 34 AND price <= 100

Source nodes should be allowed to be filtered by several filters. The way to do it and what do we allow (ANDs, ORs) is to be discussed in the task of API definition.

Tasks

  • Define the API: define the API first and ask the team for review. Respect the existing API style.
  • Implement filtering to source nodes. Use wrapping* to enable filtering in complex SQL source nodes.
  • Specs.
  • Add documentation:
    • Reference docs in JSDoc.
    • New example.
    • Revamp / upgrade existing references to filtering. The most likely place to add help about this is in dataviews docs.
  • Add example in examples section.

### Remarks

  • Respect existing API:
@ivanmalagon ivanmalagon added this to the v4.1 milestone Jun 14, 2018
@jesusbotella
Copy link
Contributor

jesusbotella commented Jun 25, 2018

As I read the issue here, I came up with several questions.

From what I understood, we are looking for a new Object that can serve as a filter for a source object. It is an object that creates a filter that can be attached to as many sources as we want, but only to sources (as opposed to the BoundingBox filter that is attached to Dataviews), right?.

In the original comment, there are only two possible filters mentioned, the category one and the range one, there are some other SQL filters such as LIKE, or IS NULL. Don't we want to support them, or is it that they aren't that useful?

@jesusbotella jesusbotella self-assigned this Jun 25, 2018
@ivanmalagon
Copy link
Contributor Author

@jesusbotella
Copy link
Contributor

jesusbotella commented Jun 25, 2018

I have been thinking about this all day long and here's what I came up with.

When reading the API reference and per my previous usages of carto.js, I saw that we tend to create instances of various classes to handle all the behaviour of the library, and I think it is better to do it that way rather than using plain old javascript objects. So I followed that path when creating the API of the source filters.

To start, I thought about creating a new filter class named carto.filter.SQL, which will handle all the state of the current instance of the filter, allowing the users to add or remove filters as they wish.

So the specification of the methods of the new class could be:

carto.filter.SQL

  • getFilters()
  • addFilter(carto.filter.SQL.BaseFilter)
  • addFilters(carto.filter.SQL.BaseFilter[])
  • removeFilter(carto.filter.SQL.BaseFilter)
  • removeFilters(carto.filter.SQL.BaseFilter)

These are the base methods that I thought but we can add as much methods as we want, meanwhile they're useful.

The next phase is to be able to add new filters to our filter. (Cpt. Obvious to the rescue 😂)

So, I thought that the most useful ones would be a range filter (any filter that has to do with numbers) and a category filter. The names may not be as accurate as possible but we can change them.

carto.filter.SQL.Range and carto.filter.SQL.Category.
The parameters in these filters will allow you to specify a column and the data that you want to apply the filter with.

The constructor's specification of both filters will be very similar.

new carto.filter.SQL.Range('age', { gte: 26, lte: 50 })
new carto.filter.SQL.Category('customerType', {
  in: ['Premium', 'Flagship'], not_in: ['Standard']
});

The first parameter will contain the column name, the second will contain the filters that we want to apply to the column rows, and an optional third parameter with options like including null values in the matched set or not.

As you can see, there are some filters named gte, lte, in, not_in. The idea is to map the different SQL operators to a readable string, and use them in that object to filter by the different values. So gte will be >=, lte will be <=, etc... And by values I mean, number, strings, any other SQL source...

By this way, we can define the Filters tree, contained in a carto.filter.SQL instance.

The key in here is that every range and category filter will have a .getSQL() method to generate the WHERE clause of the filters below it in the tree. And we can generate the SQL query by traversing the tree and invoking the same method on every instance and concatenating all the strings until we have the final SQL clause.

The powerful thing about this is that as we have a tree to hold all filters, we can change any of them and it will be updated and we can trigger a reload of the SQL query as if we were changing its content.

To make this work with a SQL source, we'll need to add the SQL filter to the source by invoking addFilter(filter) in the source that you want to add the filter to.

But I think that examples are worth more than thousand words, so here they are:
⚠️⚠️⚠️ Updated in #2141 (comment) ⚠️⚠️⚠️

Basic Example with one range filter

const sqlFilter = new carto.filter.SQL();
const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });
sqlFilter.addFilter(olderThan26);

Basic Example with one filter and multiple conditions

const sqlFilter = new carto.filter.SQL();
const between26and50 = new carto.filter.SQL.Range('age', { gte: 26, lte: 50 });
sqlFilter.addFilter(between26and50);

Example with two filters

const sqlFilter = new carto.filter.SQL();

const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });
const premiumCustomers = new carto.filter.SQL.Category('customerType', {
  in: ['Premium', 'Flagship'], not_in: ['Standard'], like: 'Premium%'
});

sqlFilter.addFilters([olderThan26, premiumCustomers]);

Example with subquery category filtering

const sqlFilter = new carto.filter.SQL();

const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });

const premiumCustomersSource = new carto.source.SQL("SELECT * FROM customercategories WHERE name IN ('Premium', 'Flagship')");
const premiumCustomers = new carto.filter.SQL.Category('customerType', { in: premiumCustomersSource });

sqlFilter.addFilters([olderThan26, premiumCustomers]);

Example with OR filtering with several filters

const sqlFilter = new carto.filter.SQL();

const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });

const premiumCustomersSource = new carto.source.SQL("SELECT * FROM customercategories WHERE name IN ('Premium', 'Flagship')");
const premiumCustomers = new carto.filter.SQL.Category('customerType', { in: premiumCustomersSource });

sqlFilter.addFilter(carto.filter.SQL.OR([olderThan26, premiumCustomers]));

Complex example extracted from SalesQuest

I'd put these example without splitting into different variables for easiness of writing 😂

const sqlFilter = new carto.filter.SQL();

sqlFilter
  .addFilter(new carto.filter.SQL.Category('customerIndustry', { in: ['Banking & Finance', 'Real State'] }))
  .addFilter(
    new carto.filter.SQL.Range('customerLastContact',
      { lte: new Date("2018-06-24T23:59:59Z"), gte: new Date("2017-01-01T00:00:00Z") },
      { includeNull: true }
    )
  )
  .addFilter(new carto.filter.SQL.Range('customerLastRevenue', { gte: 0 })
  .addFilter(new carto.filter.SQL.Category('customerType', { in: ['Premium', 'Flagship'], not_in: ['Standard'] })
  .addFilter(new carto.filter.SQL.Category('opportunityType', { in: ['New Business', 'Expansion', "Renewal"] })

Updating filters

const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });

olderThan26.updateAttributes({ in: anotherSQLSource });

Removing filters

const sqlFilter = new carto.filter.SQL();
const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });
sqlFilter.addFilter(olderThan26);

sqlFilter.removeFilter(olderThan26);

@jesusbotella
Copy link
Contributor

jesusbotella commented Jun 25, 2018

We might need some AND filter to allow more complex situations like this one:

sqlFilter.addFilter(
  carto.filter.SQL.OR([
    carto.filter.SQL.Category(),
    carto.filter.SQL.AND([
      carto.filter.SQL.Category(),
      carto.filter.SQL.Range()
    ])
  ])
)

We need to define which operators we want to allow in the filters, but I'd add all the ones that might be compelling, like equal, not equal. We can add as many operators as we want.

I updated the previous comment so it's better to read it in GitHub rather than in the mail :)

@ivanmalagon
Copy link
Contributor Author

Very good proposal. 👍

First thoughts:

  • I don't find any need to have a carto.filter.SQL object. Its only purpose is to gather all filters but, in the end, it has to be added to a carto.source. We have addFilter already in dataviews so it seems reasonable to add addFilter to carto.source and save one object.
  • What happens when we call multiple times to addFilter? I'm for adding filters like an AND. It's the most intuitive behaviour.
  • I find using SQL.Source as an attribute for filtering cumbersome and error prone. We're using an object meant for another usage (as source of layers) as a query holder. And the SQL must be well formed in order to not break the whole filter. Let's keep it simple and not use queries for filtering.
  • For each filter, instead of updateAttributes let's add:
    • setXXX for each individual options (as we already do in dataviews)
    • setOptions in case someone wants to set several options at the same time.
    • resetOptions to erase all filters.

@jesusbotella
Copy link
Contributor

jesusbotella commented Jun 26, 2018

  • Yes, it may make sense to use the SQL filter directly with carto.source, it may be more intuitive than using it separately. I did it to separate concerns between the source and the source filters.

  • As we've talked in person, I thought about linking filters with AND in .addFilter()because it is more common and intuitive. If anyone wants to make an OR filter at the first level, they'll have to use carto.filter.SQL.OR.

  • I thought about using a SQL.source as a simple way to introduce subqueries in our filters, but you're right, it will lead to unexpected errors for users. So we can remove it. You can still use subqueries without SQL.source by setting them as string like this:

new carto.filter.SQL.Category('customerType', { in: "SELECT * FROM customercategories WHERE name IN ('Premium', 'Flagship')"});
  • I wasn't very convinced about updateAttributes, so your proposal is much better that mine :). Let's use that one!

Let me update the examples and I'll post them here as soon as I finish them.

I have another concern related to filters. I want to add comparison filters for numbers, like equal or not equal, but they don't fit into the current carto.filter.SQL.Range filter when I thought about it. Would it be better to create a carto.filter.SQL.Number filter?

@jesusbotella
Copy link
Contributor

jesusbotella commented Jun 26, 2018

Well, after giving a second round of thought, we decided to remove .resetOptions, and make .setOptions override all the properties.

So, all the new shiny examples are:

Basic Example with one range filter

const sqlSource = new carto.source.SQL('SELECT * FROM fake_table');
const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });
sqlSource.addFilter(olderThan26);

Basic Example with one filter and multiple conditions

const sqlSource = new carto.source.SQL('SELECT * FROM fake_table');
const between26and50 = new carto.filter.SQL.Range('age', { gte: 26, lte: 50 });
sqlSource.addFilter(between26and50);

Example with two filters

const sqlSource = new carto.source.SQL('SELECT * FROM fake_table');

const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });
const premiumCustomers = new carto.filter.SQL.Category('customerType', {
  in: ['Premium', 'Flagship'], not_in: ['Standard'], like: 'Premium%'
});

sqlSource.addFilters([olderThan26, premiumCustomers]);

Example with OR filtering containing several filters

const sqlSource = new carto.source.SQL('SELECT * FROM fake_table');

const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });
const premiumCustomers = new carto.filter.SQL.Category('customerType', { in: ['Premium', 'Flagship'] });

sqlSource.addFilter(carto.filter.SQL.OR([olderThan26, premiumCustomers]));

Complex example extracted from SalesQuest

I'd put this example without splitting into different variables for easiness of writing 😂

const sqlSource = new carto.source.SQL('SELECT * FROM fake_table');

sqlSource
  .addFilter(new carto.filter.SQL.Category('customerIndustry', { in: ['Banking & Finance', 'Real State'] }))
  .addFilter(
    new carto.filter.SQL.Range('customerLastContact',
      { lte: new Date("2018-06-24T23:59:59Z"), gte: new Date("2017-01-01T00:00:00Z") },
      { includeNull: true }
    )
  )
  .addFilter(new carto.filter.SQL.Range('customerLastRevenue', { gte: 0 })
  .addFilter(new carto.filter.SQL.Category('customerType', { not_in: ['Standard'] })
  .addFilter(new carto.filter.SQL.Category('opportunityType', { in: ['New Business', 'Expansion', "Renewal"] })

Updating all filter properties

const ageFilter = new carto.filter.SQL.Range('age', { gte: 26 });

ageFilter.setOptions({ lte: 26, gte: 35 });

Updating single filter property while maintaining other properties

const ageFilter = new carto.filter.SQL.Range('age', { gte: 26 });

olderThan26.set('lte', 26);
olderThan26.set('gte', 35);

Removing filters

const sqlSource = new carto.source.SQL('SELECT * FROM fake_table');
const olderThan26 = new carto.filter.SQL.Range('age', { gte: 26 });
sqlSource.addFilter(olderThan26);

sqlSource.removeFilter(olderThan26);

@jsanz
Copy link
Contributor

jsanz commented Jun 26, 2018

@jesusbotella chiming in summoned by @ivanmalagon 😄

examples look OK to me but I have a couple comments from the SalesQuest example:

  • The customerLastContact filter accepts two objects, but I don't understand the second with includeNull purpose, how is that used on a range comparison?
  • The category filter for a column does not need to accept in and not_in conditions as they are mutually exclusive right?

Sorry if this looks a bit obvious but apart from that, this extension is very likely going to make developers work with widgets way simpler! \o/

@Jesus89
Copy link
Member

Jesus89 commented Jun 26, 2018

Nice job! ✨

I have a little proposal about namespaces and naming: carto.filter.SQL. SQL here could be a bit confusing for new users because you think in carto.source type SQL, but it works also with Dataset. Maybe it would be nicer to remove SQL from the filters namespace:

const olderThan26 = new carto.filter.Range('age', { gte: 26 });
const premiumCustomers = new carto.filter.Category('customerType', { in: ['Premium', 'Flagship'] });

The issue here is that there are other filters, like carto.filter.BoundingBox that only works for dataviews, instead of the new filters that only works in sources. Since BoundingBox filters do not contain Dataview in their namespace I think a nice CartoError could do the trick for all cases:

dataview.addFilter(new carto.filter.Range(...)) -> CartoError
source.addFilter(new carto.filter.BoundingBox(...)) -> CartoError

@oriolbx
Copy link
Contributor

oriolbx commented Jun 26, 2018

Nice!

@jesusbotella
Copy link
Contributor

Thank you very very much for your feedback!

@jsanz Yes, you are totally right. I messed it up with that example 😂.
The includeNull property is meant to include rows with null values so that you can filter by that range and include rows that contain null in the selected column. I don't know if I explained myself correctly. I don't know if it is useful for your queries or not 🤷‍♂️ . I included that because I saw it in the SalesQuest example I got.

Yes, that filters are mutually exclusive so it makes no sense. I'll modify the example right away. It was just to show any other example but it makes no sense in reality.

@Jesus89 You're right. The thing is that I didn't realize that the filters can work with Dataset sources as well 😂, that's why I included the SQL namespace. I thought about throwing a CartoError when an incorrect filter is passed in parameters, too. 😉

@oriolbx

@ivanmalagon
Copy link
Contributor Author

Thank you all for your feedback. +1 to the @Jesus89 suggestion.

Bandera! 🇯🇲 We're good to start the implementation

@jesusbotella
Copy link
Contributor

Thank you! 💪

@jsanz
Copy link
Contributor

jsanz commented Jun 26, 2018

@jesusbotella then I understand the field IS NULL as another filter to combine because that can be interesting for both numeric and categoric fields, right?

Related with that, apart from the AND and OR operators, are you considering also NOT? that would bring more flexibility than just the not_in for the category widgets. I'm imagining something on a custom widget like Reverse selection which is something very common to see.

@jesusbotella
Copy link
Contributor

@jsanz Yes! I thought so at least 😃.

I didn't think about that, but might be interesting as well. But as it would be difficult to implement as an operator like AND or OR, what do you think about adding that as an option like includeNull?. It would be something like reverseConditions.

@jsanz
Copy link
Contributor

jsanz commented Jun 26, 2018 via email

@stevelewis99
Copy link

This looks totally awesome, everyone. VAMOS!

@rubenmoya
Copy link
Contributor

Available in version 4.0.16 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants