Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handle temporal columns in group bys #16795

Merged
merged 2 commits into from
Sep 23, 2021

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Sep 22, 2021

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Query before was invalid:

SELECT FLOOR("__time" TO DAY) AS "__time",
       COUNT(*) AS "count"
FROM "druid"."wikipedia"
GROUP BY FLOOR("__time" TO DAY)
ORDER BY "count" DESC
LIMIT 100;

SELECT FLOOR("__time" TO DAY) AS "__timestamp",
       FLOOR("__time" TO DAY) AS "__time",
       COUNT(*) AS "count"
FROM "druid"."wikipedia"
WHERE FLOOR("__time" TO DAY) = '2016-06-27T00:00:00.000Z' -- THIS FAILS
GROUP BY FLOOR("__time" TO DAY),
         FLOOR("__time" TO DAY)
ORDER BY "count" DESC
LIMIT 10000;

Query after:

SELECT FLOOR("__time" TO DAY) AS "__time",
       COUNT(*) AS "count"
FROM "druid"."wikipedia"
GROUP BY FLOOR("__time" TO DAY)
ORDER BY "count" DESC
LIMIT 100;

SELECT FLOOR("__time" TO DAY) AS "__timestamp",
       FLOOR("__time" TO DAY) AS "__time",
       COUNT(*) AS "count"
FROM "druid"."wikipedia"
WHERE FLOOR("__time" TO DAY) = TIME_PARSE('2016-06-27T00:00:00+00:00') -- THIS IS CORRECT
GROUP BY FLOOR("__time" TO DAY),
         FLOOR("__time" TO DAY)
ORDER BY "count" DESC
LIMIT 10000;

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just needs a rebase

Comment on lines +104 to +117
@classmethod
def epoch_to_dttm(cls) -> str:
"""
Convert from number of seconds since the epoch to a timestamp.
"""
return "MILLIS_TO_TIMESTAMP({col} * 1000)"

@classmethod
def epoch_ms_to_dttm(cls) -> str:
"""
Convert from number of milliseconds since the epoch to a timestamp.
"""
return "MILLIS_TO_TIMESTAMP({col})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I didn't know these weren't defined yet for Druid!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I implemented epoch_to_dttm while working on a solution, and decided to leave it there even if I wasn't using it, and add epoch_ms_to_dttm for completeness.

@john-bodley
Copy link
Member

john-bodley commented Oct 30, 2021

@betodealmeida and @villebro I don't believe this logic works for all cases. You could have a temporal field which is encoded as a string per the Python date format which shouldn't be converted to a TIMESTAMP.

Isn't the correct logic to use the type of the column rather than assuming it's a TIMESTAMP? Furthermore self.db_engine_spec.convert_dttm("TIMESTAMP", dttm) could return None (even for the TIMESTAMP target type). I was thinking the logic should be of the form,

if column_map[dimension].is_temporal and and isinstance(value, str):
    if result := self.db_engine_spec.convert_dttm(
        column_map[dimension].type, 
        dateutil.parser.parse(value),
    ):
        value = text(result)

@john-bodley
Copy link
Member

@betodealmeida I have proposed a fix in #17312.

opus-42 pushed a commit to opus-42/incubator-superset that referenced this pull request Nov 14, 2021
* feat: handle temporal columns in group bys

* Rebase
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 28, 2021
* feat: handle temporal columns in group bys

* Rebase
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.4.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 1.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants