-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-117] Improve SQL parsing #26786
Comments
I think that this SIP and the SIP-115 can be perfectly merged together due the same theme of SQL parsing. |
To provide a brief overview of what we've got to. The main idea is not to create a new UI element for "PRE SQL" wich would not be very comfortable to use and also make it almost impossible to downgrade metadata DB without a sugnificant loss in functionality. So it was suggested to conduct all the necessary query parcing under the hood. |
@TechAuditBI perhaps you could make sure that all your requirements (e.g. WHEN clauses in MS-SQL) are supported by this proposal, and it would effectively replace SIP-115, so we can close that one? Also CC @john-bodley since we were talking about this stuff just this morning. |
Just to be clear, this proposal is to create a cleaner interface for SQL parsing that supports current functionality. I think the novel functionality from SIP-115 is needed and should be implemented in the classes being proposed in this SIP. Since we'd be using |
@betodealmeida I'm very supportive of consolidating our somewhat fragmented SQL parsing landscape. Personally I was a fan of I also gave an internal talk (back in October) about non-technical learnings from open source where I mentioned:
The TL;DR is I sense SQLGlot is the way to go. Out of interest would |
@john-bodley personally I'm a fan of Today |
Thank you for writing this SIP @betodealmeida. These are really welcome changes.
Can we add a Pylint check for this? |
+1 for sqlglot! |
Yeah, we can do this: https://github.com/apache/superset/pull/26803/files |
+1 |
SIP was approved on 2024-02-07. |
Closing the issue... not because it's done, but because the SIP was approved. Keep on truckin'! |
Any updates on what work remains here @betodealmeida? Might be good to give a "state of affairs" comment here. |
@rusackas this is still work in progress. I'm currently working on moving the RLS functionality. |
[SIP-117] Improve SQL parsing
Motivation
The current status of SQL parsing in Superset is not ideal. A few reasons:
a.
sqlparse
is the original library used for SQL parsing. It's non-validating and non-dialect specific, which brings a few challenges.b.
sqloxide
was introduced as an optional dependency, where performance was critical.c.
sqlglot
was introduced in fix(sqlparse): improve table parsing #26476 to fix security issues.superset.sql_parse.ParsedQuery
, but many places in the code callsqlparse
directly. This includes complex operations, like injecting RLS rules into SQL queries.ParsedQuery
is sometimes used for single statements, sometimes for multi-statement queries, which could lead to subtle bugs, since it has methods that assume a single statement.Superset needs robust and reliable SQL parsing, so it can understand and modify queries being run in programmatic ways. Today, many of the manipulations to SQL are done using string functions, because the
sqlparse
AST is too low-level. When queries are manipulated programmatically, like in RLS injection, the code is complex and hard to follow, for the same reason.Proposed Change
My proposal is to create 2 new classes that should provide a clean interface to SQL parsing:
(This is a simplified version, see #26767 for more details.)
These classes will provide:
engine
.Initially these classes will be implemented using
sqlglot
, since it's fast, easy to install (pure Python), and has support for several dialects. The interface should be agnostic enough that it should be easy to rewrite the classes using a different parsing library in the future, if we ever need to.New or Changed Public Interfaces
No public interfaces will be changed, but:
sqlparse
will be removed as a dependency.ParsedQuery
will be removed, and replaced bySQLScript
/SQLStatement
.sqlglot
formats SQL differently thansqlparse
.New dependencies
None.
Migration Plan and Compatibility
None.
Rejected Alternatives
None.
The text was updated successfully, but these errors were encountered: