Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support input validators in dashboard native filter #13551

Closed
cccs-jc opened this issue Mar 10, 2021 · 12 comments
Closed

Support input validators in dashboard native filter #13551

cccs-jc opened this issue Mar 10, 2021 · 12 comments
Labels
enhancement:committed Enhancement planned / committed by a contributor need:followup Requires followup

Comments

@cccs-jc
Copy link
Contributor

cccs-jc commented Mar 10, 2021

We use Superset in the context of Security information and event management (SIEM). Our data often include IP values which users want to filter by adhoc IP ranges (using CIDR notation).

For example when the user inputs a filter value of 192.168.0.0/24 this filter is converted into a range query on the server side. We leverage superset's SQL_QUERY_MUTATOR to convert the IP range filter from an SQL statement like

WHERE IP IN '192.168.0.0/24'
to
((IP >= 3232235520) AND (IP<= 3232235775))

The IP column is number for performance reasons.

This works quite well however there is no validation of the text provided by the user. Ideally we would like to validate the input in the client a bit like is done in the explorer UI.

It's possible to validate inputs in the chart explore UI since the viz populates the control panel and is free to customize the input control. However there is no way to specify a validator in the dashboard native filter.

There could be a registry of validator functions that the dashboard native filter would allow you to choose from when creating native filters.

Is this a feature being developed? Are there alternatives?

@junlincc junlincc added the enhancement:request Enhancement request submitted by anyone from the community label Mar 11, 2021
@junlincc
Copy link
Member

@cccs-jc sorry, I need more context to understand the use case. could you share video or gif to demonstrate your work flow?

@junlincc junlincc added the need:more-info Requires more information from author label Mar 11, 2021
@cccs-jc
Copy link
Contributor Author

cccs-jc commented Mar 12, 2021

Use Case

We use Superset as a Security information and event management (SIEM). Network security logs are frequently used in our analysis.
Users search security logs using IP values (e.g.: 192.168.0.7) or for entire networks using CIDR notation (e.g.: 192.168.0.0/24). The CIDR is a well know networking concept expressing a range of IPs 192.168.0.0 to 192.168.0.255 in this case.

Users search for an IP on a particular column.

search-ipsrc.mp4

Users search for a range of IPs on a particular column.

search-cidr.mp4

Users search for IP or range of IPs on either the source or destination columns.

search-ip.mp4

For performance reasons our IP values are physically stored as INTEGER in RDBMS. This is what a typical network IP flow table might look like

table

However, users think of IPs in terms dot notation or CIDR notation, not in terms of numbers. We thus provide a conversion mechanism to hide this implementation detail from our users. Results are thus presented to the user in tables like this one.

result-table

How we achieve this with Superset

Rendering

In superset we’ve modelled the flow table by typing the IP_SRC and IP_DST columns as IPV4. We use this typing information when rendering the values on the client side (in our custom visualization). Our rendering also generates a hyperlink to a page giving further information about the given IP.

Query Mutator

We leverage superset's SQL_QUERY_MUTATOR to convert the IP strings provided by the user into the corresponding number value for the RDBMS.
A query criterion of SRC_IP = ‘2.2.2.2’ is mutated into SRC_IP = 33686018
To query both SRC_IP or DST_IP we have introduced a “filter column” named IP. This column does not exist in the RDBMS. It’s only used as a filter (like Looker’s bind_filters feature https://docs.looker.com/reference/view-params/explore_source).
A query criterion of IP IN '2.2.2.2' is mutated to ( ("SRC_IP" = 33686018) OR ("DST_IP" = 33686018) )

Superset Model

We use the column data type to render IPV4 columns in dot notation.

result-table

We mark the IP (filter column) as not a dimension. Because the IP column is not a dimension it is not available option in the “COLUMNS” box of the Chart explorer. This is good because it’s not actually a column it’s just a filter.
pick-columns-to-display

However, the IP filter column is available in the ad-hoc filter. Which is great.

use-ipfilter-in-adhoc-filter

The SQL_QUERY_MUTATOR not only converts IP strings to numbers but can also handle more complex scenarios like querying on either the SRC_IP or DST_IP columns.
For example, given the configuration above, superset would produce this query
FROM FLOW WHERE IP IN '2.2.2.2'
Which our SQL_QUERY_MUTATOR transforms into
FROM FLOW WHERE ( ("SRC_IP" = 33686018) OR ("DST_IP" = 33686018) )
Before it is sent to the RDBMS.

On the client side we have a custom visualization which renders columns of type IPV4 in dot notation and generates hyperlinks.

results-rendered-as-dot-notation

IP Filter integration in a Dashboard

The IP filter column works the same inside Dashboards. Since IP is a column it can be used in a filter box or in the new native filter support. The IP filter column can also handle CIDR notation (IP range query).

ip-filter-works-in-dashboard

FROM FLOW
WHERE IP IN '2.2.2.0/24'

Is mutated into a range query on either SRC_IP or DST_IP columns
FROM FLOW
WHERE
( ("SRC_IP" >= 33686016 AND "SRC_IP" <= 33686271) OR
("DST_IP" >= 33686016 AND "DST_IP" <= 33686271) )

Gaps and improvements

Our current implementation is working well but the user experience needs some improvement.

Inconsistent SQL operators

The IP filter column is really a filtering function. It accepts one or more IPs and generates a query which applies a filter on either SRC_IP or DST_IP columns. Only the “IN” and “=” operator really make sense for the IP filter column. However, there is no means to control which operators to show the user.

only-valid-operators

The same argument can be made for port number. String operators “LIKE”, “IS NOT NULL”, “IS NULL” are not applicable for the port number column.

port-number-column

It would be desirable for the ad-hoc filter UI and for the dashboard filters to only present to the user the applicable operators. The dataset model could be extended to support a list of applicable operators per column.

No input validation

We would like to be able to validate the values entered for the IP filter. It should be of the form 0.0.0.0 or 0.0.0.0/32.

invalid-input

There are validators in superset to validate numerical and empty values. It would be great if we could create new ones for IPv4, IPv6 etc. Examples in other business lines might be credit card number, social insurance number, validating wildcard search expressions etc.
The same argument applies to the port number column. It should only be possible to enter a number value and in this specific case a value between 0 and 65536 (2^32).
The dataset model could be extended to support a list of applicable input validators for a given column.

The ad-hoc filter UI and the dashboard filters could use this information and apply the appropriate validation based on the chosen column.

@cccs-jc
Copy link
Contributor Author

cccs-jc commented Mar 16, 2021

@junlincc as you see above being able to validate inputs and configure available operators on a column/filter basis would be greatly beneficial. Any recommendations, are there any efforts towards these features. Do you have any questions?

@junlincc
Copy link
Member

Hi @cccs-jc thank you for providing such detailed description of your cases. I agree this is currently a limitation, but more as an edge case. We probably can include the enhancement in 2022 roadmap when we refractor and redesign the ad hoc filter component. I will revisit this issue next month and let you know. If you already have a solution, feel free to open a PR. we will make sure it gets reviewed. 🙏

@junlincc junlincc added need:followup Requires followup and removed need:more-info Requires more information from author labels Mar 17, 2021
@cccs-jc
Copy link
Contributor Author

cccs-jc commented Mar 17, 2021

I would like to make progress on this earlier than 2022. My team and I are willing to contribute to Superset however we need guidance on how best to provide hooks into Superset for customization. We don't want to hack support for our features directly into the code. But given proper hooks we can implement our features ourselves. @villebro has offered to help us create a custom filter for the dashboard. Hopefully we could apply some of that knowledge to the adhoc filter. In fact since this post we were able to modify the adhoc filter to validate inputs but only for our own visualization. Ideally we would like to be able to override the validation of the adhoc filter globally.

@junlincc
Copy link
Member

@villebro let's allocate half a day to enable @cccs-jc 's team in Q2. Thanks for expressing the interest in contributing high quality code to Superset 🙏

@junlincc junlincc added enhancement:committed Enhancement planned / committed by a contributor and removed enhancement:request Enhancement request submitted by anyone from the community labels Apr 21, 2021
@jhult
Copy link
Contributor

jhult commented Oct 29, 2021

Any news on this?

@cccs-rc
Copy link
Contributor

cccs-rc commented Nov 1, 2021

There's some work going on. We've tied this in with a notion of custom business types (SIP to come). We're working in a fork to prototype some stuff right now. But we're eager to get at least a first iteration of this into superset behind a feature flag.

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 17, 2022
@jhult
Copy link
Contributor

jhult commented Apr 20, 2022

@cccs-rc, any news on this?

@stale stale bot removed the inactive Inactive for >= 30 days label Apr 20, 2022
@cccs-rc
Copy link
Contributor

cccs-rc commented Apr 25, 2022

@cccs-rc, any news on this?

The SIP for Advanced Types has passed. The PR is here but not yet approved/merged.

This won't add validation specifically to dashboard native filters, but it will allow you to add some validation for advanced types in the Explore view. If/how to extend this to dashboard native filters is on our list of things for future work.

@rusackas
Copy link
Member

This one sounds safe to close, and is more feature request than bug report anyway, so I'll close it. Please re-open or start a fresh Issue/PR if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement:committed Enhancement planned / committed by a contributor need:followup Requires followup
Projects
None yet
Development

No branches or pull requests

5 participants