Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Clustering for Snowflake #1689

Merged
merged 5 commits into from
Aug 20, 2019
Merged

Implement Clustering for Snowflake #1689

merged 5 commits into from
Aug 20, 2019

Conversation

bastienboutonnet
Copy link
Contributor

@bastienboutonnet bastienboutonnet commented Aug 17, 2019

Aim

Brings clustering to snowflake.

Table materialisations will leverage an order by via wrapping the SQL in a select * from ( {{sql}} ) order by ( {{cluster_by_keys}}
Incremental materialisation will create table as above and followed by an alter statement alter table {{relation}} cluster by ({{cluster_by_keys}}) to leverage Snowflake's automatic clustering.
This approach was discussed in #634

This is a duplicate of PR #1591 that I had to re open due to killing my fork in the meantime... All discussion can be found there though.

@cla-bot
Copy link

cla-bot bot commented Aug 17, 2019

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @bastienboutonnet

@bastienboutonnet
Copy link
Contributor Author

Btw @drewbanin CLA is now signed...

@drewbanin drewbanin self-requested a review August 19, 2019 13:09
@drewbanin
Copy link
Contributor

Thanks @bastienboutonnet! Kicking off the tests now :)

@cla-bot check

@cla-bot cla-bot bot added the cla:yes label Aug 19, 2019
@cla-bot
Copy link

cla-bot bot commented Aug 19, 2019

The cla-bot has been summoned, and re-checked this pull request!

{%- if cluster_by_keys is not none and cluster_by_keys is string -%}
{%- set cluster_by_keys = [cluster_by_keys] -%}
{%- endif -%}
{%- set cluster_by_string = cluster_by_keys|join(", ")-%}
Copy link
Contributor

@drewbanin drewbanin Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this tries to call join on none when the cluster_by config is unset. Can we try something like:

{%- if cluster_by_keys is not none and cluster_by_keys is string -%}
    {%- set cluster_by_keys = [cluster_by_keys] -%}
    {%- set cluster_by_string = cluster_by_keys|join(", ")-%}
{% else %}
    {%- set cluster_by_keys = [] -%}
    {%- set cluster_by_string = none -%}
{%- endif -%}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then below, we'll want to do:

{% if cluster_by_string is not none -%}
    alter table {{relation}} cluster by ({{cluster_by_string}});
{%- endif -%}

Copy link
Contributor Author

@bastienboutonnet bastienboutonnet Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh damn, that was really stupid of me...

But actually the whole logic is pretty wild here... I was doing really stupid stuff but it should work after my last commit. Let me know if you don't agree with that flow. I tested it and it works both for list of strings, a list of one string or just one string.

I also noticed a corner case. We're allowing alter table {{relation}} resume recluster; to run on the condition that this config argument be set to true. The issue with that is that if the table was not created as a clustered table (i.e. that no cluster_by_keys were provided) then Snowflake will throw an error that the table is not clustered.

I changed this piece of logic as well. Let me know what you think.

Co-Authored-By: Drew Banin <drew@fishtownanalytics.com>
@drewbanin
Copy link
Contributor

Merging this - thanks for your hard work in this contribution @bastienboutonnet!!

@drewbanin drewbanin merged commit e7a24a2 into dbt-labs:dev/0.14.1 Aug 20, 2019
@bastienboutonnet
Copy link
Contributor Author

pleasure as always!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants