Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(SIP-39): Async query support for charts #11499

Merged
merged 50 commits into from
Dec 11, 2020

Conversation

robdiciuccio
Copy link
Member

@robdiciuccio robdiciuccio commented Oct 30, 2020

SUMMARY

Adds support for running chart queries in async workers, as proposed by SIP-39, under a new GLOBAL_ASYNC_QUERIES feature flag.

  • Formalizes async event publishing using Redis Streams, with this implementation using a new /api/v1/async_event/ endpoint for polling.
  • Adds async event middleware to superset-frontend to poll for events when the feature flag is enabled and there are chart components in a loading state.
  • Adds a new JWT token cookie (name configurable) to the client for validating async requests.
  • This PR affects charts using both the new /api/v1/chart/data API and the legacy /superset/explore_json endpoint in both dashboards and Explore (behind GLOBAL_ASYNC_QUERIES feature flag).
  • Abstracts and moves set_and_log_cache out of viz.py.

Remaining issues:

  • Multi-line chart not rendering correctly (component will need updating as it does its own data fetching)

TODO in separate follow-up PRs:

  • Query cancellation for async queries (see limitations in SIP-39)
  • Support async operation for CSV, raw results and sample data
  • Add support for websocket event handling on the frontend
  • Update SQL Lab async query execution to use new async architecture (backwards compatible)

TEST PLAN

  • Enable the GLOBAL_ASYNC_QUERIES feature flag in your test environment and ensure correct chart rendering in both dashboards and Explore
  • Ensure synchronous chart rendering with the feature flag disabled functions correctly.

ADDITIONAL INFORMATION

  • Has associated issue: [SIP-39] Global Async Query Support #9190
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@codecov-io
Copy link

codecov-io commented Oct 30, 2020

Codecov Report

Merging #11499 (df673cf) into master (a4b06d2) will decrease coverage by 13.14%.
The diff coverage is 23.52%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #11499       +/-   ##
===========================================
- Coverage   67.56%   54.42%   -13.15%     
===========================================
  Files         942      432      -510     
  Lines       45812    15253    -30559     
  Branches     4395     3891      -504     
===========================================
- Hits        30955     8302    -22653     
+ Misses      14752     6951     -7801     
+ Partials      105        0      -105     
Flag Coverage Δ
cypress 54.42% <23.52%> (-0.01%) ⬇️
javascript ?
python ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset-frontend/src/SqlLab/actions/sqlLab.js 24.57% <ø> (-37.40%) ⬇️
...rontend/src/SqlLab/components/ShareSqlLabQuery.jsx 35.48% <ø> (-58.07%) ⬇️
superset-frontend/src/chart/chartReducer.js 57.14% <0.00%> (-11.71%) ⬇️
superset-frontend/src/components/AsyncSelect.jsx 74.07% <ø> (-22.23%) ⬇️
...t-frontend/src/dashboard/actions/dashboardState.js 65.56% <ø> (-1.99%) ⬇️
...rset-frontend/src/dashboard/actions/datasources.js 57.89% <ø> (ø)
...et-frontend/src/dashboard/actions/sliceEntities.js 75.00% <ø> (ø)
...ntend/src/dashboard/components/PropertiesModal.jsx 59.45% <ø> (-25.37%) ⬇️
...-frontend/src/datasource/ChangeDatasourceModal.tsx 8.92% <ø> (-65.32%) ⬇️
...erset-frontend/src/datasource/DatasourceEditor.jsx 53.30% <ø> (-20.29%) ⬇️
... and 805 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4b06d2...df673cf. Read the comment docs.

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, once this is implemented on the frontend, you'll want to bypass all the domain sharding logic as it'll no longer be needed and it probably wouldn't work with the new cookies anyway

return f"{key_prefix}{hash}"


def set_and_log_cache(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkyryliuk I'm starting to move/centralize the caching logic, and have not included the CacheKey write that is present in the viz.py implementation because 1) I don't understand the use case, and 2) we can't afford the performance hit of writing to the metadata DB in the same method that writes to a K/V store (we're currently using Redis). Can you elaborate on CacheKey and how it's used?

Copy link
Member

@bkyryliuk bkyryliuk Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, cache key tables stores the mapping between datasources and cache keys to enable the invalidation by datasource though api endpoint. In our use case we have external ETL systems that are updating tables that are powering the superset charts and we are using this model and API end point to evict / invalidate cache records for the tables that were updated in our ETL systems. It is fairly critical for us to keep data in superset fresh and trustworthy. This endpoint has fairly low traffic as is hit only on the explore actions compared to the logging calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep it, it would be fine to hide it behind the feature flag is you expect to experience performance issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a STORE_CACHE_KEYS_IN_METADATA_DB config value to enable this.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass comments. Can't wait to see this merged!

superset-frontend/src/chart/Chart.jsx Outdated Show resolved Hide resolved
superset/common/query_context.py Outdated Show resolved Hide resolved
superset/common/query_context.py Show resolved Hide resolved
superset/config.py Outdated Show resolved Hide resolved
superset/common/query_context.py Show resolved Hide resolved
return job_metadata

def set_query_context(self, form_data: Dict[str, Any]) -> QueryContext:
self._form_data = form_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest doing this "init" logic in the ctor, as these set_xxx methods may be missed, and adding extra functionality down the road will require all usages of this CMD to be updated to call these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryContext was created via the constructor originally, but varying use cases required breaking it out into it's own method.

{"channel": async_channel_id, "user_id": user_id}
)

response.set_cookie(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use FLASK_JWT_EXTENDED here? (https://flask-jwt-extended.readthedocs.io/en/stable/installation/) All of this functionality (cookie/header handling) is handled by them, and as far as I know FAB is adding it soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial intent was to use FLASK_JWT_EXTENDED for this, but it was incompatible for reasons I don't now remember.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least provide a means by which we can extend/override this functionality that doesn't involve monkey patching? I'm thinking something like defining a "JWTManager" in superset which just does these things, and then instantiate it from a string inside the init_app method, which would allow folks to tune this behavior. Said manager would define a few hooks like "load_jwt" "store_jwt" etc. Also, it would deal with the basic skeleton of the JWT structure, which would be customizable this way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JWT is narrowly-scoped in terms of functionality, and is intentionally separate from any (optional) application auth JWT usage. The main (currently only) use case here is to securely pass a channel ID between the client, Flask app and (future) websocket server. What kind of customization to you foresee being required here?

Looking back at FLASK_JWT_EXTENDED, I think the main issue here was that it doesn't support multiple tokens/cookies in the app, if JWT is being used as the main auth mechanism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder have you looked into Authlib before? It seems to have terrific documentation and could be incorporated for future API oauth work, too.

Copy link
Member Author

@robdiciuccio robdiciuccio Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we've evaluated Authlib at Preset in other contexts, but it feels like overkill for this use case.

return response

def generate_jwt(self, data: Dict[str, Any]) -> str:
encoded_jwt = jwt.encode(data, self._jwt_secret, algorithm="HS256")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algo needs to be configurable. Also, the JWT contents here aren't following the JWT standard. Typically, sub is used to track the "id" of a user. Again, the lib mentioned above deals with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JWT standard doesn't prescribe any required claim fields, but happy to use the sub field for user_id here (or just remove it altogether). Also happy to make the algo configurable, but there's a point of diminishing returns for configuration here, IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional configuration can also be added in future PRs as needs are more clearly defined.

Copy link
Member

@craig-rueda craig-rueda Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The claim field is up to the implementor. A typical "bare bones" JWT would look something like:

{
  "sub": "1234567890",
  "iat": 1516239022,
  "exp": 1516239022,
  "claims": {
     ... whatever stuff you want
   }
}

Copy link
Member Author

@robdiciuccio robdiciuccio Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.1. Registered Claim Names
None of the claims defined below are intended to be mandatory to use or implement in all
cases, but rather they provide a starting point for a set of useful,
interoperable claims. Applications using JWTs should define which
specific claims they use and when they are required or optional.

return data

def parse_jwt_from_request(self, request: Request) -> Dict[str, Any]:
token = request.cookies.get(self._jwt_cookie_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd delegate this to the lib. One other case that's missing here is Authorization header support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above re: FLASK_JWT_EXTENDED

form_data = cached.get("form_data")
response_type = cached.get("response_type")

datasource_id, datasource_type = get_datasource_info(None, None, form_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming perm checks happen in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no. Permission check happens in the check_explore_cache_perms method in the @etag_cache decorator ¯\_(ツ)_/¯

@craig-rueda
Copy link
Member

Overall, looks good! One thing I noticed was a bit of hand-rolling around JWT handling. I suggest leaning on flask-jwt-extended for this, along with the ability for other teams to customize their "jwt manager" as to adhere to their internal use cases (flask-jwt-extended allows for customization of all things JWT related via a set of config options). I can see wanting to keep this functionality separate from other global JWT handling. At the very least, it would be great to make the security-related functionality pluggable so that it can be easily overridden and extended

@robdiciuccio
Copy link
Member Author

OK @villebro @ktmud @dpgaspar @craig-rueda finally got all tests passing, and have addressed the majority of the feedback here. How are folks feeling about merging this MVP, with the understanding that this feature is still experimental and will continue to be iterated upon?

@craig-rueda
Copy link
Member

Can we update the JWT structure at least?

@@ -327,6 +327,8 @@ def _try_json_readsha( # pylint: disable=unused-argument
"DISPLAY_MARKDOWN_HTML": True,
# When True, this escapes HTML (rather than rendering it) in Markdown components
"ESCAPE_MARKDOWN_HTML": False,
"GLOBAL_ASYNC_QUERIES": False,
"GLOBAL_ASYNC_QUERIES_OPTIONS": {"transport": "polling", "polling_delay": 250},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I really think we should keep feature flags booleans... What I imagined was to add the options as a global config value, add a separate export entry here, change initiFeatureFlags to something like initCommonBootstrap(), and add such global options there. Or, pass it to Redux via each page's getInitialState.

But if you find it easier to expose these in the feature flags, we can always refactor later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to expose config vars to the frontend by adding them here, then they should be available in boostrap_data.common

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agreed this it not ideal, and that Feature Flags should remain booleans. I'll refactor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@robdiciuccio robdiciuccio merged commit 4d32907 into apache:master Dec 11, 2020
@robdiciuccio robdiciuccio deleted the rd/async-queries-mvp branch December 11, 2020 04:22
john-bodley pushed a commit to john-bodley/superset that referenced this pull request Dec 16, 2020
john-bodley added a commit that referenced this pull request Dec 16, 2020
Co-authored-by: John Bodley <john.bodley@airbnb.com>
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Dec 16, 2020
Co-authored-by: John Bodley <john.bodley@airbnb.com>
(cherry picked from commit c306368)
amitmiran137 pushed a commit to simcha90/incubator-superset that referenced this pull request Dec 18, 2020
* upstream/master: (55 commits)
  feat(explore): time picker enhancement (apache#11418)
  feat: update alert/report icons and column order (apache#12081)
  feat(explore): metrics and filters controls redesign (apache#12095)
  feat(alerts/reports): add refresh action (apache#12071)
  chore: add latest tag action (apache#11148)
  fix(reports): increase crontab size and alert fixes (apache#12056)
  Small typo fix in Athena connection docs (apache#12099)
  feat(queries): security perm simplification (apache#12072)
  feat(databases): security perm simplification (apache#12036)
  feat(dashboards): security permissions simplification (apache#12012)
  feat(logs): security permissions simplification (apache#12061)
  chore: Remove unused CodeModal (apache#11972)
  Fix typescript error (apache#12074)
  fix: handle context-dependent feature flags in CLI (apache#12088)
  fix: Fix "View in SQLLab" bug (apache#12086)
  feat(alert/report): add 'not null' condition option to modal (apache#12077)
  bumping superset ui to 15.18 and deckgl to 0.3.2 (apache#12078)
  fix: Python dependencies in apache#11499 (apache#12079)
  reset active tab on open (apache#12048)
  fix: improve import flow UI/UX (apache#12070)
  ...
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XXL 🚢 1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants