Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(druid): Standardizing time grain transformations #17050

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Oct 9, 2021

SUMMARY

The Apache Druid time grain transformations uses a mix of FLOOR, TIME_FLOOR, and TIMESTAMPADD UDFs. This PR merely standardizes these to consistently use the TIME_FLOOR and TIME_SHIFT functions which both leverage the ISO 8601 standard for defining periods.

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

"PT5S": "TIME_FLOOR({col}, 'PT5S')",
"PT30S": "TIME_FLOOR({col}, 'PT30S')",
"PT1M": "FLOOR({col} TO MINUTE)",
"PT1M": "TIME_FLOOR({col}, 'PT1M')",
"PT5M": "TIME_FLOOR({col}, 'PT5M')",
"PT10M": "TIME_FLOOR({col}, 'PT10M')",
"PT15M": "TIME_FLOOR({col}, 'PT15M')",
"PT0.5H": "TIME_FLOOR({col}, 'PT30M')",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro I've never understood why this is PT0.5H ("Half hour") as opposed to PT30M ("30 minute"). I was thinking of doing a pass to change these—which will require a database migration. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wondered about this before, as well as why P0.25Y is used in Superset instead of P3M. I tried looking at ISO-8601 to see if there's any guidance there, but I can't find anything. So I think they might be used interchangeably. However, since the decimal character varies from country to country, I think we'd be better off replacing PT0.5H and P0.25Y with PT30M and P3M respectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because originally it was called just "half an hour", so PT0.5H mad sense. It was a direct translation, and I was just looking for a way to standardize the intervals across DB engine specs.

@codecov
Copy link

codecov bot commented Oct 9, 2021

Codecov Report

Merging #17050 (4e8cb2a) into master (7c1c89c) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #17050   +/-   ##
=======================================
  Coverage   76.91%   76.91%           
=======================================
  Files        1031     1031           
  Lines       55163    55163           
  Branches     7501     7501           
=======================================
+ Hits        42428    42430    +2     
+ Misses      12483    12481    -2     
  Partials      252      252           
Flag Coverage Δ
hive 81.47% <ø> (ø)
mysql 81.92% <ø> (ø)
postgres 81.93% <ø> (ø)
presto 81.80% <ø> (+<0.01%) ⬆️
python 82.43% <ø> (+<0.01%) ⬆️
sqlite 81.60% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/druid.py 86.27% <ø> (ø)
superset/db_engine_specs/presto.py 90.37% <0.00%> (+0.41%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c1c89c...4e8cb2a. Read the comment docs.

@john-bodley john-bodley force-pushed the john-bodley--druid-standardize-time-grains branch from 94a8f88 to f87b171 Compare October 9, 2021 19:30
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If/when a migration is done, I'd also replace P0.25Y with P3M.

"PT5S": "TIME_FLOOR({col}, 'PT5S')",
"PT30S": "TIME_FLOOR({col}, 'PT30S')",
"PT1M": "FLOOR({col} TO MINUTE)",
"PT1M": "TIME_FLOOR({col}, 'PT1M')",
"PT5M": "TIME_FLOOR({col}, 'PT5M')",
"PT10M": "TIME_FLOOR({col}, 'PT10M')",
"PT15M": "TIME_FLOOR({col}, 'PT15M')",
"PT0.5H": "TIME_FLOOR({col}, 'PT30M')",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wondered about this before, as well as why P0.25Y is used in Superset instead of P3M. I tried looking at ISO-8601 to see if there's any guidance there, but I can't find anything. So I think they might be used interchangeably. However, since the decimal character varies from country to country, I think we'd be better off replacing PT0.5H and P0.25Y with PT30M and P3M respectively.

@john-bodley john-bodley merged commit 5e85f48 into apache:master Oct 12, 2021
opus-42 pushed a commit to opus-42/incubator-superset that referenced this pull request Nov 14, 2021
* chore(druid): Standardizing time grain transformations

* Update druid_tests.py

* Update druid_tests.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants