Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Allowing limit ordering by post-aggregation metrics #4646

Merged
merged 9 commits into from
Apr 3, 2018

Conversation

jeffreythewang
Copy link
Contributor

@jeffreythewang jeffreythewang commented Mar 19, 2018

This is an extension of this fix #4203 to account for post-aggregation metrics as well. I also did a bit of refactoring to make it clear that aggregations and post-aggregations are different things.

The old fix replaced the existing fields of the query, which caused issues for single phase queries that select multiple metrics, or metrics that are not the timeseries_limit_metric. I instead updated the existing fields for single-phase queries, and left replacement for two-phase queries.

@fabianmenges @GeorgeSirois

@codecov-io
Copy link

codecov-io commented Mar 20, 2018

Codecov Report

Merging #4646 into master will increase coverage by 0.12%.
The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4646      +/-   ##
==========================================
+ Coverage   72.22%   72.35%   +0.12%     
==========================================
  Files         204      204              
  Lines       15323    15343      +20     
  Branches     1180     1180              
==========================================
+ Hits        11067    11101      +34     
+ Misses       4253     4239      -14     
  Partials        3        3
Impacted Files Coverage Δ
superset/connectors/druid/models.py 81.61% <87.5%> (+2.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 069d61c...10f4c01. Read the comment docs.

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code is somewhat brittle, let's proceed with caution here.

@@ -925,7 +925,18 @@ def metrics_and_post_aggs(metrics, metrics_dict):
visited_postaggs.add(postagg_name)
DruidDatasource.resolve_postagg(
postagg, post_aggs, agg_names, visited_postaggs, metrics_dict)
return list(agg_names), post_aggs
aggs = DruidDatasource.get_aggregations(agg_names, metrics_dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by changing the method definition to:

@classmethod
def aggs_and_post_aggs(cls, metrics, metrics_dict):

you can then cls.get_aggregations

@@ -1078,11 +1082,10 @@ def run_query( # noqa / druid
metrics_dict = {m.metric_name: m for m in self.metrics}
columns_dict = {c.column_name: c for c in self.columns}

all_metrics, post_aggs = DruidDatasource.metrics_and_post_aggs(
aggregations, post_aggs = DruidDatasource.aggs_and_post_aggs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.aggs_and_post_aggs

@@ -925,7 +925,18 @@ def metrics_and_post_aggs(metrics, metrics_dict):
visited_postaggs.add(postagg_name)
DruidDatasource.resolve_postagg(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as comment bellow, should be @staticmethod and cls.resolve_postagg

aggs_dict, post_aggs_dict = DruidDatasource.aggs_and_post_aggs(
[timeseries_limit_metric],
metrics_dict)
pre_qry['aggregations'].update(aggs_dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear here as to why .update as opposed to direct affectations for both 1143 and 1144

Copy link
Contributor Author

@jeffreythewang jeffreythewang Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For two-phase queries, replacement is fine since the pre_qrys only purpose is to query for the top N groups. For single-phase queries, the pre_qry is the only query that runs, so the existing metrics that are queried for must be preserved as well.

This situation may not be as obvious since the Table View (in my experience, the view most frequently used for single-phase queries) rarely uses the Sort By field and instead just sorts via the Table UI. I've written a few tests for the scenarios where single-phase queries run with an order and limit specified.

@jeffreythewang
Copy link
Contributor Author

jeffreythewang commented Mar 20, 2018

I've added the fix for groupby queries as well, and only scoped the query field updates (with .update as opposed to field replacements) to single phase queries.

@jeffreythewang jeffreythewang changed the title Allowing limit ordering by post-aggregation metrics [BugFix] Allowing limit ordering by post-aggregation metrics Mar 20, 2018
@fabianmenges
Copy link
Contributor

This looks good to me.

@jeffreythewang
Copy link
Contributor Author

I've rebased this onto latest master. I've made the assumption that you cannot order-by an adhoc metric.

@mistercrunch
Copy link
Member

@fabianmenges do you have the rights to merge yet?! Do the honors!

@fabianmenges
Copy link
Contributor

Sadly, I don't have the rights yet.

@mistercrunch
Copy link
Member

@fabianmenges we've had to push on Apache folks quite a bit for the last round of committers to get their access, let's push again and hopefully it becomes easier for the next wave

@mistercrunch mistercrunch merged commit 8be0bde into apache:master Apr 3, 2018
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
…4646)

* Allowing limit ordering by post-aggregation metrics

* don't overwrite og dictionaries

* update tests

* python3 compat

* code review comments, add tests, implement it in groupby as well

* python 3 compat for unittest

* more self

* Throw exception when get aggregations is called with postaggs

* Treat adhoc metrics as another aggregation
timifasubaa pushed a commit to timifasubaa/incubator-superset that referenced this pull request May 31, 2018
…4646)

* Allowing limit ordering by post-aggregation metrics

* don't overwrite og dictionaries

* update tests

* python3 compat

* code review comments, add tests, implement it in groupby as well

* python 3 compat for unittest

* more self

* Throw exception when get aggregations is called with postaggs

* Treat adhoc metrics as another aggregation
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
…4646)

* Allowing limit ordering by post-aggregation metrics

* don't overwrite og dictionaries

* update tests

* python3 compat

* code review comments, add tests, implement it in groupby as well

* python 3 compat for unittest

* more self

* Throw exception when get aggregations is called with postaggs

* Treat adhoc metrics as another aggregation
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants