Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for Azure Data Explorer (Kusto) db engine spec #17898

Merged
merged 15 commits into from
Jan 10, 2022

Conversation

Ceridan
Copy link
Contributor

@Ceridan Ceridan commented Dec 30, 2021

SUMMARY

Add support for Azure Data Explorer (Kusto). My team at @dodopizza has built and maintained the Kusto dialect for SQLAlchemy. It allows us to add a new Kusto engine spec.

Kusto can handle two dialects: SQL and KQL. Currently, we provide full support for the SQL dialect (SELECT queries, no DML) and experimental support for the KQL (work in progress). For more details, you may check sqlalchemy-kusto repo.

We are using SQL dialect in production at our company and working on full support for KQL dialect.

TESTING INSTRUCTIONS

To test manually you may connect to the Kusto database and run some queries in the SQL Lab against it.

Additionally, sqlalchemy-kusto includes dialect tests.

ADDITIONAL INFORMATION

  • Has associated issue: #10646
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API - new database support - Kusto.
  • Removes existing feature or API

@Ceridan Ceridan changed the title Add support for Azure Data Explorer (Kusto) db engine spec feat: Add support for Azure Data Explorer (Kusto) db engine spec Dec 30, 2021
@codecov
Copy link

codecov bot commented Dec 30, 2021

Codecov Report

Merging #17898 (4fa2914) into master (8ebec60) will increase coverage by 0.00%.
The diff coverage is 92.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #17898   +/-   ##
=======================================
  Coverage   67.10%   67.10%           
=======================================
  Files        1609     1610    +1     
  Lines       64897    64970   +73     
  Branches     6866     6866           
=======================================
+ Hits        43547    43600   +53     
- Misses      19484    19504   +20     
  Partials     1866     1866           
Flag Coverage Δ
hive 53.31% <65.33%> (-28.50%) ⬇️
mysql 82.21% <92.00%> (+0.02%) ⬆️
postgres 82.26% <92.00%> (+0.02%) ⬆️
presto 53.15% <65.33%> (-28.96%) ⬇️
python 82.71% <92.00%> (-0.03%) ⬇️
sqlite 81.95% <92.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/kusto.py 91.54% <91.54%> (ø)
superset/db_engine_specs/base.py 88.74% <100.00%> (+0.06%) ⬆️
superset/models/core.py 89.48% <100.00%> (-0.52%) ⬇️
superset/views/database/mixins.py 81.03% <0.00%> (-1.73%) ⬇️
superset/db_engine_specs/hive.py 85.71% <0.00%> (-1.55%) ⬇️
superset/db_engine_specs/presto.py 89.14% <0.00%> (-1.26%) ⬇️
superset/connectors/sqla/models.py 88.31% <0.00%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ebec60...4fa2914. Read the comment docs.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! One nit + one question. Also, wondering if you would be open to adding some unit tests for typical cases, especially for the overridden class methods, like is_select_query and parse_sql.

superset/db_engine_specs/kustokql.py Outdated Show resolved Hide resolved
superset/db_engine_specs/kustosql.py Outdated Show resolved Hide resolved
@Ceridan
Copy link
Contributor Author

Ceridan commented Jan 4, 2022

Thank you for the review and suggestion on how to improve this code! We will add unit tests and merge two dialects in a single file, and I will back to you.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great - a few last comments, but I think this is really close to being mergeable.

setup.py Outdated
@@ -140,6 +140,7 @@ def get_git_sha() -> str:
"hana": ["hdbcli==2.4.162", "sqlalchemy_hana==0.4.0"],
"hive": ["pyhive[hive]>=0.6.1", "tableschema", "thrift>=0.11.0, <1.0.0"],
"impala": ["impyla>0.16.2, <0.17"],
"kusto": ["sqlalchemy-kusto>=1.0.1"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, could we restrict to the current major version?

Suggested change
"kusto": ["sqlalchemy-kusto>=1.0.1"],
"kusto": ["sqlalchemy-kusto>=1.0.1, <2"],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good to me. We will add this restriction.

[
("tbl | limit 100", True),
("let foo = 1; tbl | where bar == foo", True),
(".show tables", False),
Copy link
Member

@villebro villebro Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, out of curiosity, is this actually a read-only query? I know that SHOW TABLES, EXPLAIN etc appears to currently be flagged as DML by parse_sql, but as this logic will now be contained in the db engine spec, this can now be easily altered. So maybe we'd like to change this to be flagged as read-only in the KQL spec? Also, could we add a more typical DML case here, like .set, .append or .drop? I assume they're the cases we want to make sure we're catching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your remark is valuable. Thank you. We will add more specific code.

("INSERT INTO tbl (foo) VALUES (1)", False),
],
)
def test_sql_is_readonly_query(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a few more tests for the SQL spec, like a few timegrains, convert_dttm etc, just to get coverage up? You can probably copy some template tests from the pre-existing unit tests.

@xneg
Copy link
Contributor

xneg commented Jan 10, 2022

@villebro Thank you for the review! We fixed your comments.

@villebro
Copy link
Member

@xneg awesome 👍 I started CI and this definitely LGTM if it passes CI. Again, to further protect this connector from the unlikely risk of regressions I'd suggest adding unit tests for all overridden methods (e.g. convert_dttm etc), but I'm fine merging as-is.

@xneg
Copy link
Contributor

xneg commented Jan 10, 2022

@villebro Thank you! We already added unit tests for all overridden methods except get_dbapi_exception_mapping.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@villebro
Copy link
Member

villebro Thank you! We already added unit tests for all overridden methods except get_dbapi_exception_mapping.

@xneg right you are - my apologies, I see that now! ✅

@villebro villebro merged commit d2d4f8e into apache:master Jan 10, 2022
shcoderAlex pushed a commit to casual-precision/superset that referenced this pull request Feb 7, 2022
…che#17898)

* Add two Kusto engine specs: KQL and SQL. Some minor changes in core code to support Kusto engine specs.

* Remove redundant imports and logging.

* docs: Kusto sqlalchemy docs

* fix: Fix mypy and linting errors

* fix: Handle Black vs Pylint checks

* fix: isort problem

* refactor: Merge kustosql and kustokql in the single kusto module

* test: Add tests for Kusto db spec

* feat: Schema override does not require in KQL anymore

* Removed redundant imports.

* Added ".show" queries to readonly query determination.

* Fixed some bugs.
Added tests for convert_dttm.

* Fixed major sqlalchemy-kusto version.

* Fixed by isort.

Co-authored-by: Eugene Bikkinin <xnegxneg@gmail.com>
Co-authored-by: k.tomak <k.tomak@dodopizza.com>
Co-authored-by: Eugene Bikkinin <e.bikkinin@dodopizza.com>
bwang221 pushed a commit to casual-precision/superset that referenced this pull request Feb 10, 2022
…che#17898)

* Add two Kusto engine specs: KQL and SQL. Some minor changes in core code to support Kusto engine specs.

* Remove redundant imports and logging.

* docs: Kusto sqlalchemy docs

* fix: Fix mypy and linting errors

* fix: Handle Black vs Pylint checks

* fix: isort problem

* refactor: Merge kustosql and kustokql in the single kusto module

* test: Add tests for Kusto db spec

* feat: Schema override does not require in KQL anymore

* Removed redundant imports.

* Added ".show" queries to readonly query determination.

* Fixed some bugs.
Added tests for convert_dttm.

* Fixed major sqlalchemy-kusto version.

* Fixed by isort.

Co-authored-by: Eugene Bikkinin <xnegxneg@gmail.com>
Co-authored-by: k.tomak <k.tomak@dodopizza.com>
Co-authored-by: Eugene Bikkinin <e.bikkinin@dodopizza.com>
@anshulsharmas
Copy link

Hi @xneg.. thanks for creating this amazing Kusto plugin.. I couldn't find the documentation for the connector though, I can see you had made commits but not sure why Kusto isn't listed here ? https://superset.apache.org/docs/intro

@xneg
Copy link
Contributor

xneg commented Apr 20, 2022

Hi @anshulsharmas, thank you.
I think Kusto isn't listed because this feature will be in v. 1.5.
https://github.com/apache/superset/blob/1.5.0rc1/CHANGELOG.md

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants