-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Allowing limit ordering by post-aggregation metrics #4646
[BugFix] Allowing limit ordering by post-aggregation metrics #4646
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4646 +/- ##
==========================================
+ Coverage 72.22% 72.35% +0.12%
==========================================
Files 204 204
Lines 15323 15343 +20
Branches 1180 1180
==========================================
+ Hits 11067 11101 +34
+ Misses 4253 4239 -14
Partials 3 3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code is somewhat brittle, let's proceed with caution here.
superset/connectors/druid/models.py
Outdated
@@ -925,7 +925,18 @@ def metrics_and_post_aggs(metrics, metrics_dict): | |||
visited_postaggs.add(postagg_name) | |||
DruidDatasource.resolve_postagg( | |||
postagg, post_aggs, agg_names, visited_postaggs, metrics_dict) | |||
return list(agg_names), post_aggs | |||
aggs = DruidDatasource.get_aggregations(agg_names, metrics_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by changing the method definition to:
@classmethod
def aggs_and_post_aggs(cls, metrics, metrics_dict):
you can then cls.get_aggregations
superset/connectors/druid/models.py
Outdated
@@ -1078,11 +1082,10 @@ def run_query( # noqa / druid | |||
metrics_dict = {m.metric_name: m for m in self.metrics} | |||
columns_dict = {c.column_name: c for c in self.columns} | |||
|
|||
all_metrics, post_aggs = DruidDatasource.metrics_and_post_aggs( | |||
aggregations, post_aggs = DruidDatasource.aggs_and_post_aggs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.aggs_and_post_aggs
superset/connectors/druid/models.py
Outdated
@@ -925,7 +925,18 @@ def metrics_and_post_aggs(metrics, metrics_dict): | |||
visited_postaggs.add(postagg_name) | |||
DruidDatasource.resolve_postagg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as comment bellow, should be @staticmethod
and cls.resolve_postagg
superset/connectors/druid/models.py
Outdated
aggs_dict, post_aggs_dict = DruidDatasource.aggs_and_post_aggs( | ||
[timeseries_limit_metric], | ||
metrics_dict) | ||
pre_qry['aggregations'].update(aggs_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear here as to why .update
as opposed to direct affectations for both 1143 and 1144
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For two-phase queries, replacement is fine since the pre_qry
s only purpose is to query for the top N groups. For single-phase queries, the pre_qry
is the only query that runs, so the existing metrics that are queried for must be preserved as well.
This situation may not be as obvious since the Table View (in my experience, the view most frequently used for single-phase queries) rarely uses the Sort By field and instead just sorts via the Table UI. I've written a few tests for the scenarios where single-phase queries run with an order and limit specified.
I've added the fix for |
This looks good to me. |
c1d663e
to
10f4c01
Compare
I've rebased this onto latest master. I've made the assumption that you cannot order-by an adhoc metric. |
@fabianmenges do you have the rights to merge yet?! Do the honors! |
Sadly, I don't have the rights yet. |
@fabianmenges we've had to push on Apache folks quite a bit for the last round of committers to get their access, let's push again and hopefully it becomes easier for the next wave |
…4646) * Allowing limit ordering by post-aggregation metrics * don't overwrite og dictionaries * update tests * python3 compat * code review comments, add tests, implement it in groupby as well * python 3 compat for unittest * more self * Throw exception when get aggregations is called with postaggs * Treat adhoc metrics as another aggregation
…4646) * Allowing limit ordering by post-aggregation metrics * don't overwrite og dictionaries * update tests * python3 compat * code review comments, add tests, implement it in groupby as well * python 3 compat for unittest * more self * Throw exception when get aggregations is called with postaggs * Treat adhoc metrics as another aggregation
…4646) * Allowing limit ordering by post-aggregation metrics * don't overwrite og dictionaries * update tests * python3 compat * code review comments, add tests, implement it in groupby as well * python 3 compat for unittest * more self * Throw exception when get aggregations is called with postaggs * Treat adhoc metrics as another aggregation
This is an extension of this fix #4203 to account for post-aggregation metrics as well. I also did a bit of refactoring to make it clear that aggregations and post-aggregations are different things.
The old fix replaced the existing fields of the query, which caused issues for single phase queries that select multiple metrics, or metrics that are not the
timeseries_limit_metric
. I instead updated the existing fields for single-phase queries, and left replacement for two-phase queries.@fabianmenges @GeorgeSirois