Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the project #38

Closed
MadL1me opened this issue Jun 4, 2024 · 2 comments
Closed

Questions about the project #38

MadL1me opened this issue Jun 4, 2024 · 2 comments

Comments

@MadL1me
Copy link

MadL1me commented Jun 4, 2024

Hi there! I stumbled upon this project from this discussion: apache/datafusion#970

First of all - thanks everybody for this repo! The federation support for DataFusion insanely relevant to me, and I was thinking about building similar thing, until I found this project. I have a few questions regarding the performance of certain situations, which are mostly relevant for me.

The first one is - how performant is the join of two remote tables, and how is it works? Are you doing smth like querying join eq operands, doing hash join in memory and fetching relevant tables? (for example, a join of two different PostgreSQL tables from different servers, a.k.a FDW). Or there's no optimisations yet in this regard?

Also, I was curious about the final goal of the project - would it be merged into mainstream of DataFusion repo, or it is expected to be in different repo and crate? Thanks in advance

@devinjdangelo
Copy link
Collaborator

Hi @MadL1me thank you for the interest!

The first one is - how performant is the join of two remote tables, and how is it works? Are you doing smth like querying join eq operands, doing hash join in memory and fetching relevant tables? (for example, a join of two different PostgreSQL tables from different servers, a.k.a FDW). Or there's no optimisations yet in this regard?

Federated joins are not optimized yet, see #23 for some discussion of this. Most queries will see a full table scan and datafusion will perform the join locally. It would be an awesome improvement to push down more work to the federated table providers.

Also, I was curious about the final goal of the project - would it be merged into mainstream of DataFusion repo, or it is expected to be in different repo and crate? Thanks in advance

If datafusion-federation as a whole matures and proves itself useful to a significant portion of the overall user-base, then yes it could be merged into the upstream repo. We could also continue pushing upstream small bits of functionality over time. We have actually already done this for the Plan->SQL code apache/datafusion#9494 .

@MadL1me
Copy link
Author

MadL1me commented Jun 5, 2024

Thanks for the info! I'll look forward to your progress in experimenting with query federations .

@MadL1me MadL1me closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants