Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SQL validation based on grammar element #3039

Merged
merged 14 commits into from
Sep 23, 2024

Conversation

ykmr1224
Copy link
Collaborator

@ykmr1224 ykmr1224 commented Sep 17, 2024

Description

  • Implement SQL validation based on grammar element
  • This will reject query using deny list based on grammar element
  • Deny list can be defined per datasource type
  • This change validation applies only to async-query

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
@@ -130,7 +130,7 @@ jacocoTestCoverageVerification {
}
limit {
counter = 'BRANCH'
minimum = 1.0
minimum = 0.9
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced since there was a branch which could not be covered due to Spark SQL grammar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add it to the exclude list above instead of reducing this on entire module?

HINTS,
INLINE_TABLE,
FILE,
CROSS_JOIN,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disable Join? why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because those joins are expensive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPL add JOIN support with ALL join types. https://github.com/opensearch-project/opensearch-spark/blob/main/docs/PPL-Join-command.md

We should align PPL and SQL capability.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Copy link

codecov bot commented Sep 20, 2024

Codecov Report

Attention: Patch coverage is 99.69880% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.43%. Comparing base (37188bd) to head (683126e).

Files with missing lines Patch % Lines
...sql/spark/validator/SQLQueryValidationVisitor.java 99.46% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3039      +/-   ##
============================================
+ Coverage     94.33%   94.43%   +0.10%     
- Complexity     5278     5387     +109     
============================================
  Files           519      528       +9     
  Lines         15002    15305     +303     
  Branches       1005     1010       +5     
============================================
+ Hits          14152    14454     +302     
  Misses          804      804              
- Partials         46       47       +1     
Flag Coverage Δ
sql-engine 94.43% <99.69%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ykmr1224
Copy link
Collaborator Author

Filed a issue for update/test spark version: #3041

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
penghuo
penghuo previously approved these changes Sep 23, 2024
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

dai-chen
dai-chen previously approved these changes Sep 23, 2024
@@ -130,7 +130,7 @@ jacocoTestCoverageVerification {
}
limit {
counter = 'BRANCH'
minimum = 1.0
minimum = 0.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add it to the exclude list above instead of reducing this on entire module?

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
@ykmr1224 ykmr1224 dismissed stale reviews from dai-chen and penghuo via b0e545e September 23, 2024 20:30
@ykmr1224 ykmr1224 merged commit a87893a into opensearch-project:main Sep 23, 2024
12 of 14 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 23, 2024
* Implement SQL validation based on grammar element

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add function types

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* fix style

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add security lake

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add File support

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Integrate into SparkQueryDispatcher

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix style

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Integration

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add comments

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comments

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Allow join types for now

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix style

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix coverage check

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
(cherry picked from commit a87893a)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@LantaoJin
Copy link
Member

This PR reminds me of another PR #2781 which is an implementation the Aggregate validation based on grammar. I haven't go through the whole details of this PR. Does this PR could resolve the problems described in #2781 (comment)? @ykmr1224 @penghuo

ykmr1224 pushed a commit that referenced this pull request Sep 26, 2024
* Implement SQL validation based on grammar element



* Add function types



* fix style



* Add security lake



* Add File support



* Integrate into SparkQueryDispatcher



* Fix style



* Add tests



* Integration



* Add comments



* Address comments



* Allow join types for now



* Fix style



* Fix coverage check



---------


(cherry picked from commit a87893a)

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@ykmr1224
Copy link
Collaborator Author

This PR reminds me of another PR #2781 which is an implementation the Aggregate validation based on grammar. I haven't go through the whole details of this PR. Does this PR could resolve the problems described in #2781 (comment)? @ykmr1224 @penghuo

No, I think #2781 is for query executed in SQL plugin, but this PR is for async-query. And it is rather prohibiting grammar elements available in Spark SQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants