Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Improve performance of TPC-H queries #391

Open
5 tasks
andygrove opened this issue May 6, 2024 · 9 comments
Open
5 tasks

[EPIC] Improve performance of TPC-H queries #391

andygrove opened this issue May 6, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

andygrove commented May 6, 2024

What is the problem the feature request solves?

This epic is for tracking progress on improving performance of Comet with our benchmarks derived from TPC-H.

Current status (September 2024)

Comet is not as fast as other DataFusion subprojects yet. All of these DataFusion subprojects are performing similar native execution, which demonstrates that DataFusion itself should not be the bottleneck.

Screenshot 2024-09-20 at 10 08 15 AM

Features needed to support all queries natively

We do not run all queries fully natively yet due to these missing features:

Performance issues that affect multiple queries

Per-Query Tracking

@andygrove andygrove added the enhancement New feature or request label May 6, 2024
@viirya
Copy link
Member

viirya commented May 6, 2024

BroadcastExchange should be supported, I think. We have CometBroadcastExchange.

We don't need to support AQEShuffleRead. It is a shuffle reader wrapper in Spark. It calls wrapped shuffle's execute or executeColumnar depending on it is columnar or not.

@viirya
Copy link
Member

viirya commented May 6, 2024

We don't need to support Execute CreateViewCommand too. It is a command exec operator.

@viirya
Copy link
Member

viirya commented May 6, 2024

Also CommandResult, which is only used to hold data from a command. CommandResult and Execute CreateViewCommand are not query execution operators.

@andygrove
Copy link
Member Author

Also CommandResult, which is only used to hold data from a command. CommandResult and Execute CreateViewCommand are not query execution operators.

Thanks. I saw those from the CREATE VIEW in q15 but I see from the Spark UI that the SELECT part of this query is already fully native. I have removed those from the list.

@andygrove
Copy link
Member Author

BroadcastExchange should be supported, I think. We have CometBroadcastExchange.

BroadcastExchange is not supported is the information that Comet provides for q8. I think part of this epic will be making these messages more informative.

@viirya
Copy link
Member

viirya commented May 7, 2024

For Sort merge join with a join condition, I added the support to DataFusion for a while but we've not incorporated the feature in Comet yet. I opened #398 to track it and I will work on it once #250 is merged and #248 is done.

@viirya
Copy link
Member

viirya commented May 7, 2024

BroadcastExchange is not supported is the information that Comet provides for q8. I think part of this epic will be making these messages more informative.

I will take a look at q8 and see why it is not enabled there.

@andygrove
Copy link
Member Author

I will take a look at q8 and see why it is not enabled there.

The error BroadcastExchange is not supported really means BroadcastExchange is not supported because the child operators are not supported

@viirya
Copy link
Member

viirya commented May 10, 2024

Please disable spark.comet.exec.broadcast.enabled which should not be used in normal query: #408 (comment)

@andygrove andygrove added this to the 0.2.0 milestone Jul 25, 2024
@andygrove andygrove removed this from the 0.2.0 milestone Aug 16, 2024
@andygrove andygrove changed the title [EPIC] Support native execution for all TPC-H queries [EPIC] Improve performance of TPC-H queries Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants