feat(SIP-39): Async query support for charts #11499

robdiciuccio · 2020-10-30T01:17:25Z

SUMMARY

Adds support for running chart queries in async workers, as proposed by SIP-39, under a new GLOBAL_ASYNC_QUERIES feature flag.

Formalizes async event publishing using Redis Streams, with this implementation using a new /api/v1/async_event/ endpoint for polling.
Adds async event middleware to superset-frontend to poll for events when the feature flag is enabled and there are chart components in a loading state.
Adds a new JWT token cookie (name configurable) to the client for validating async requests.
This PR affects charts using both the new /api/v1/chart/data API and the legacy /superset/explore_json endpoint in both dashboards and Explore (behind GLOBAL_ASYNC_QUERIES feature flag).
Abstracts and moves set_and_log_cache out of viz.py.

Remaining issues:

Multi-line chart not rendering correctly (component will need updating as it does its own data fetching)

TODO in separate follow-up PRs:

Query cancellation for async queries (see limitations in SIP-39)
Support async operation for CSV, raw results and sample data
Add support for websocket event handling on the frontend
Update SQL Lab async query execution to use new async architecture (backwards compatible)

TEST PLAN

Enable the GLOBAL_ASYNC_QUERIES feature flag in your test environment and ensure correct chart rendering in both dashboards and Explore
Ensure synchronous chart rendering with the feature flag disabled functions correctly.

ADDITIONAL INFORMATION

Has associated issue: [SIP-39] Global Async Query Support #9190
Changes UI
Requires DB Migration.
Confirm DB Migration upgrade and downgrade tested.
Introduces new feature or API
Removes existing feature or API

codecov-io · 2020-10-30T01:28:10Z

Codecov Report

Merging #11499 (df673cf) into master (a4b06d2) will decrease coverage by 13.14%.
The diff coverage is 23.52%.

@@             Coverage Diff             @@
##           master   #11499       +/-   ##
===========================================
- Coverage   67.56%   54.42%   -13.15%     
===========================================
  Files         942      432      -510     
  Lines       45812    15253    -30559     
  Branches     4395     3891      -504     
===========================================
- Hits        30955     8302    -22653     
+ Misses      14752     6951     -7801     
+ Partials      105        0      -105

Flag	Coverage Δ
cypress	`54.42% <23.52%> (-0.01%)`	⬇️
javascript	`?`
python	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset-frontend/src/SqlLab/actions/sqlLab.js	`24.57% <ø> (-37.40%)`	⬇️
...rontend/src/SqlLab/components/ShareSqlLabQuery.jsx	`35.48% <ø> (-58.07%)`	⬇️
superset-frontend/src/chart/chartReducer.js	`57.14% <0.00%> (-11.71%)`	⬇️
superset-frontend/src/components/AsyncSelect.jsx	`74.07% <ø> (-22.23%)`	⬇️
...t-frontend/src/dashboard/actions/dashboardState.js	`65.56% <ø> (-1.99%)`	⬇️
...rset-frontend/src/dashboard/actions/datasources.js	`57.89% <ø> (ø)`
...et-frontend/src/dashboard/actions/sliceEntities.js	`75.00% <ø> (ø)`
...ntend/src/dashboard/components/PropertiesModal.jsx	`59.45% <ø> (-25.37%)`	⬇️
...-frontend/src/datasource/ChangeDatasourceModal.tsx	`8.92% <ø> (-65.32%)`	⬇️
...erset-frontend/src/datasource/DatasourceEditor.jsx	`53.30% <ø> (-20.29%)`	⬇️
... and 805 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4b06d2...df673cf. Read the comment docs.

etr2460

fyi, once this is implemented on the frontend, you'll want to bypass all the domain sharding logic as it'll no longer be needed and it probably wouldn't work with the new cookies anyway

robdiciuccio · 2020-11-11T00:05:54Z

superset/utils/cache.py

+    return f"{key_prefix}{hash}"
+
+
+def set_and_log_cache(


@bkyryliuk I'm starting to move/centralize the caching logic, and have not included the CacheKey write that is present in the viz.py implementation because 1) I don't understand the use case, and 2) we can't afford the performance hit of writing to the metadata DB in the same method that writes to a K/V store (we're currently using Redis). Can you elaborate on CacheKey and how it's used?

Sure, cache key tables stores the mapping between datasources and cache keys to enable the invalidation by datasource though api endpoint. In our use case we have external ETL systems that are updating tables that are powering the superset charts and we are using this model and API end point to evict / invalidate cache records for the tables that were updated in our ETL systems. It is fairly critical for us to keep data in superset fresh and trustworthy. This endpoint has fairly low traffic as is hit only on the explore actions compared to the logging calls.

I would prefer to keep it, it would be fine to hide it behind the feature flag is you expect to experience performance issues

I added a STORE_CACHE_KEYS_IN_METADATA_DB config value to enable this.

villebro

First pass comments. Can't wait to see this merged!

superset-frontend/src/chart/Chart.jsx

superset/common/query_context.py

superset/config.py

superset/common/query_context.py

craig-rueda · 2020-12-10T18:44:34Z

superset/charts/commands/data.py

+        return job_metadata
+
+    def set_query_context(self, form_data: Dict[str, Any]) -> QueryContext:
+        self._form_data = form_data


I suggest doing this "init" logic in the ctor, as these set_xxx methods may be missed, and adding extra functionality down the road will require all usages of this CMD to be updated to call these

QueryContext was created via the constructor originally, but varying use cases required breaking it out into it's own method.

craig-rueda · 2020-12-10T18:52:53Z

superset/utils/async_query_manager.py

+                    {"channel": async_channel_id, "user_id": user_id}
+                )
+
+                response.set_cookie(


Can we just use FLASK_JWT_EXTENDED here? (https://flask-jwt-extended.readthedocs.io/en/stable/installation/) All of this functionality (cookie/header handling) is handled by them, and as far as I know FAB is adding it soon.

My initial intent was to use FLASK_JWT_EXTENDED for this, but it was incompatible for reasons I don't now remember.

Can we at least provide a means by which we can extend/override this functionality that doesn't involve monkey patching? I'm thinking something like defining a "JWTManager" in superset which just does these things, and then instantiate it from a string inside the init_app method, which would allow folks to tune this behavior. Said manager would define a few hooks like "load_jwt" "store_jwt" etc. Also, it would deal with the basic skeleton of the JWT structure, which would be customizable this way

This JWT is narrowly-scoped in terms of functionality, and is intentionally separate from any (optional) application auth JWT usage. The main (currently only) use case here is to securely pass a channel ID between the client, Flask app and (future) websocket server. What kind of customization to you foresee being required here?

Looking back at FLASK_JWT_EXTENDED, I think the main issue here was that it doesn't support multiple tokens/cookies in the app, if JWT is being used as the main auth mechanism.

I wonder have you looked into Authlib before? It seems to have terrific documentation and could be incorporated for future API oauth work, too.

Yep, we've evaluated Authlib at Preset in other contexts, but it feels like overkill for this use case.

craig-rueda · 2020-12-10T18:56:42Z

superset/utils/async_query_manager.py

+            return response
+
+    def generate_jwt(self, data: Dict[str, Any]) -> str:
+        encoded_jwt = jwt.encode(data, self._jwt_secret, algorithm="HS256")


This algo needs to be configurable. Also, the JWT contents here aren't following the JWT standard. Typically, sub is used to track the "id" of a user. Again, the lib mentioned above deals with this.

The JWT standard doesn't prescribe any required claim fields, but happy to use the sub field for user_id here (or just remove it altogether). Also happy to make the algo configurable, but there's a point of diminishing returns for configuration here, IMO.

Additional configuration can also be added in future PRs as needs are more clearly defined.

Right. The claim field is up to the implementor. A typical "bare bones" JWT would look something like:

{ "sub": "1234567890", "iat": 1516239022, "exp": 1516239022, "claims": { ... whatever stuff you want } }

4.1. Registered Claim Names
None of the claims defined below are intended to be mandatory to use or implement in all
cases, but rather they provide a starting point for a set of useful,
interoperable claims. Applications using JWTs should define which
specific claims they use and when they are required or optional.

craig-rueda · 2020-12-10T18:59:10Z

superset/utils/async_query_manager.py

+        return data
+
+    def parse_jwt_from_request(self, request: Request) -> Dict[str, Any]:
+        token = request.cookies.get(self._jwt_cookie_name)


Again, I'd delegate this to the lib. One other case that's missing here is Authorization header support

See comments above re: FLASK_JWT_EXTENDED

craig-rueda · 2020-12-10T19:02:36Z

superset/views/core.py

+            form_data = cached.get("form_data")
+            response_type = cached.get("response_type")
+
+            datasource_id, datasource_type = get_datasource_info(None, None, form_data)


I'm assuming perm checks happen in here?

Actually, no. Permission check happens in the check_explore_cache_perms method in the @etag_cache decorator ¯\_(ツ)_/¯

craig-rueda · 2020-12-10T19:14:00Z

Overall, looks good! One thing I noticed was a bit of hand-rolling around JWT handling. I suggest leaning on flask-jwt-extended for this, along with the ability for other teams to customize their "jwt manager" as to adhere to their internal use cases (flask-jwt-extended allows for customization of all things JWT related via a set of config options). I can see wanting to keep this functionality separate from other global JWT handling. At the very least, it would be great to make the security-related functionality pluggable so that it can be easily overridden and extended

robdiciuccio · 2020-12-11T00:20:06Z

OK @villebro @ktmud @dpgaspar @craig-rueda finally got all tests passing, and have addressed the majority of the feedback here. How are folks feeling about merging this MVP, with the understanding that this feature is still experimental and will continue to be iterated upon?

craig-rueda · 2020-12-11T00:28:59Z

Can we update the JWT structure at least?

ktmud · 2020-12-11T00:45:38Z

superset/config.py

@@ -327,6 +327,8 @@ def _try_json_readsha(  # pylint: disable=unused-argument
    "DISPLAY_MARKDOWN_HTML": True,
    # When True, this escapes HTML (rather than rendering it) in Markdown components
    "ESCAPE_MARKDOWN_HTML": False,
+    "GLOBAL_ASYNC_QUERIES": False,
+    "GLOBAL_ASYNC_QUERIES_OPTIONS": {"transport": "polling", "polling_delay": 250},


Ah, I really think we should keep feature flags booleans... What I imagined was to add the options as a global config value, add a separate export entry here, change initiFeatureFlags to something like initCommonBootstrap(), and add such global options there. Or, pass it to Redux via each page's getInitialState.

But if you find it easier to expose these in the feature flags, we can always refactor later.

You should be able to expose config vars to the frontend by adding them here, then they should be available in boostrap_data.common

Yep, agreed this it not ideal, and that Feature Flags should remain booleans. I'll refactor.

Co-authored-by: John Bodley <john.bodley@airbnb.com>

Co-authored-by: John Bodley <john.bodley@airbnb.com> (cherry picked from commit c306368)

* upstream/master: (55 commits) feat(explore): time picker enhancement (apache#11418) feat: update alert/report icons and column order (apache#12081) feat(explore): metrics and filters controls redesign (apache#12095) feat(alerts/reports): add refresh action (apache#12071) chore: add latest tag action (apache#11148) fix(reports): increase crontab size and alert fixes (apache#12056) Small typo fix in Athena connection docs (apache#12099) feat(queries): security perm simplification (apache#12072) feat(databases): security perm simplification (apache#12036) feat(dashboards): security permissions simplification (apache#12012) feat(logs): security permissions simplification (apache#12061) chore: Remove unused CodeModal (apache#11972) Fix typescript error (apache#12074) fix: handle context-dependent feature flags in CLI (apache#12088) fix: Fix "View in SQLLab" bug (apache#12086) feat(alert/report): add 'not null' condition option to modal (apache#12077) bumping superset ui to 15.18 and deckgl to 0.3.2 (apache#12078) fix: Python dependencies in apache#11499 (apache#12079) reset active tab on open (apache#12048) fix: improve import flow UI/UX (apache#12070) ...

robdiciuccio added 4 commits October 29, 2020 17:58

Generate JWT in Flask app

67f7d5f

Refactor chart data API query logic, add JWT validation and async worker

1e8c039

Add redis stream implementation, refactoring

a50750b

Add chart data cache endpoint, refactor QueryContext caching

64fbfae

superset-github-bot bot added the preset-io label Oct 30, 2020

pull-request-size bot added the size/XL label Oct 30, 2020

robdiciuccio mentioned this pull request Oct 30, 2020

feat(SIP-39): Websocket sidecar app #11498

Merged

6 tasks

etr2460 reviewed Nov 9, 2020

View reviewed changes

robdiciuccio added 2 commits November 10, 2020 12:12

Merge branch master

219ce77

Typing, linting, refactoring

e377b31

robdiciuccio commented Nov 11, 2020

View reviewed changes

robdiciuccio added 6 commits November 10, 2020 19:30

pytest fixes and openapi schema update

f7ac5b6

Merge branch master

3a16283

Enforce caching be configured for async query init

6805180

Async query processing for explore_json endpoint

d5eef4f

Add /api/v1/async_event endpoint

467c6bb

Async frontend for dashboards [WIP]

f867cd5

pull-request-size bot added size/XXL and removed size/XL labels Nov 24, 2020

robdiciuccio added 6 commits November 29, 2020 19:24

Chart async error message support, refactoring

8111ea6

Abstract asyncEvent middleware

3b41f16

Async chart loading for Explore

49b6b52

Merge branch master

0e5c09e

Pylint fixes

e2dc30e

asyncEvent middleware -> TypeScript, JS linting

276c84e

robdiciuccio mentioned this pull request Dec 2, 2020

feat: Support multiple queries per request #11880

Merged

6 tasks

Chart data API: enforce forced_cache, add tests

4566c01

villebro reviewed Dec 2, 2020

View reviewed changes

Add tests for explore_json endpoints

9eba0c4

craig-rueda reviewed Dec 10, 2020

View reviewed changes

Fix failing py tests

0ad7234

robdiciuccio added 3 commits December 10, 2020 13:15

asyncEvent_spec.js -> asyncEvent_spec.ts

c72b4c6

Refactor flakey Python 3.7 mock assertions

1fb7489

Fix another shared state issue in Py tests

024da76

ktmud reviewed Dec 11, 2020

View reviewed changes

Use 'sub' claim in JWT for user_id

887754b

craig-rueda approved these changes Dec 11, 2020

View reviewed changes

ktmud approved these changes Dec 11, 2020

View reviewed changes

robdiciuccio added 2 commits December 10, 2020 19:03

Refactor async middleware config

601ec51

Fixup: restore FeatureFlag boolean type

df673cf

robdiciuccio merged commit 4d32907 into apache:master Dec 11, 2020

robdiciuccio deleted the rd/async-queries-mvp branch December 11, 2020 04:22

john-bodley pushed a commit to john-bodley/superset that referenced this pull request Dec 16, 2020

fix: Python dependencies in apache#11499

02647df

john-bodley mentioned this pull request Dec 16, 2020

fix: Python dependencies in #11499 #12079

Merged

6 tasks

john-bodley added a commit that referenced this pull request Dec 16, 2020

fix: Python dependencies in #11499 (#12079)

c306368

Co-authored-by: John Bodley <john.bodley@airbnb.com>

john-bodley added a commit to airbnb/superset-fork that referenced this pull request Dec 16, 2020

fix: Python dependencies in apache#11499 (apache#12079)

cf67221

Co-authored-by: John Bodley <john.bodley@airbnb.com> (cherry picked from commit c306368)

robdiciuccio mentioned this pull request Jan 17, 2021

docs: Add docs for GLOBAL_ASYNC_QUERIES (SIP-39) #12573

Merged

6 tasks

This was referenced Feb 25, 2021

fix(celery): Reset DB connection pools for forked worker processes #13350

Merged

Global Async Queries doesn't work #12766

Closed

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(SIP-39): Async query support for charts #11499

feat(SIP-39): Async query support for charts #11499

robdiciuccio commented Oct 30, 2020 •

edited

Loading

codecov-io commented Oct 30, 2020 •

edited

Loading

etr2460 left a comment

robdiciuccio Nov 11, 2020

bkyryliuk Nov 11, 2020 •

edited

Loading

bkyryliuk Nov 11, 2020

robdiciuccio Dec 4, 2020

villebro left a comment

craig-rueda Dec 10, 2020

robdiciuccio Dec 10, 2020

craig-rueda Dec 10, 2020

robdiciuccio Dec 10, 2020

craig-rueda Dec 10, 2020

robdiciuccio Dec 10, 2020

ktmud Dec 10, 2020

robdiciuccio Dec 10, 2020 •

edited

Loading

craig-rueda Dec 10, 2020

robdiciuccio Dec 10, 2020

robdiciuccio Dec 10, 2020

craig-rueda Dec 11, 2020 •

edited

Loading

robdiciuccio Dec 11, 2020 •

edited

Loading

craig-rueda Dec 10, 2020

robdiciuccio Dec 10, 2020

craig-rueda Dec 10, 2020

robdiciuccio Dec 10, 2020

robdiciuccio Dec 10, 2020

craig-rueda commented Dec 10, 2020

robdiciuccio commented Dec 11, 2020

craig-rueda commented Dec 11, 2020

ktmud Dec 11, 2020

nytai Dec 11, 2020

robdiciuccio Dec 11, 2020

robdiciuccio Dec 11, 2020

feat(SIP-39): Async query support for charts #11499

feat(SIP-39): Async query support for charts #11499

Conversation

robdiciuccio commented Oct 30, 2020 • edited Loading

SUMMARY

TEST PLAN

ADDITIONAL INFORMATION

codecov-io commented Oct 30, 2020 • edited Loading

Codecov Report

etr2460 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkyryliuk Nov 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robdiciuccio Dec 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

craig-rueda Dec 11, 2020 • edited Loading

Choose a reason for hiding this comment

robdiciuccio Dec 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

craig-rueda commented Dec 10, 2020

robdiciuccio commented Dec 11, 2020

craig-rueda commented Dec 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robdiciuccio commented Oct 30, 2020 •

edited

Loading

codecov-io commented Oct 30, 2020 •

edited

Loading

bkyryliuk Nov 11, 2020 •

edited

Loading

robdiciuccio Dec 10, 2020 •

edited

Loading

craig-rueda Dec 11, 2020 •

edited

Loading

robdiciuccio Dec 11, 2020 •

edited

Loading