Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: issue #8922 make row group test more readable #8941

Merged
merged 1 commit into from
Jan 23, 2024

Conversation

Lordworms
Copy link
Contributor

Which issue does this PR close?

Closes #8922.

Rationale for this change

Since the original test in datafusion/core/tests/parquet/row_group_pruning.rs is hard to read, I use a test structure mentioned by lamb in comment to refracture the test

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jan 22, 2024
@Lordworms Lordworms changed the title fix issue #8922 make row group test more readable fix: issue #8922 make row group test more readable Jan 22, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @Lordworms -- this is looking really good

There seems to be one difference in prune_int32_eq_large_in_list but otherwise I think this PR is ready to go.

cc @Ted-Jiang as this will likely conflict with #8930

0,
)
.await;
.with_expected_errors(Some(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the test is different -- the results are 0, 0, 1 rather than 0,1,0

Perhaps there is a difference between test_row_group_prune and test_prune_verbose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am confused about these two functions since they look and execute the same logic. I hope to get an answer about the differences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just double checked and the difference appears to be that the argument order was different to the two functions 🤦

So therefore I think this change makes sense

.with_scenario(Scenario::PeriodsInColumnNames)
.with_query( "SELECT \"name\", \"service.name\" FROM t WHERE \"service.name\" = 'frontend'")
.with_expected_errors(Some(0))
.with_pruned_by_stats(Some(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.with_pruned_by_stats(Some(1))
// prune out first and last row group
.with_pruned_by_stats(Some(1))

.with_scenario(Scenario::PeriodsInColumnNames)
.with_query( "SELECT \"name\", \"service.name\" FROM t WHERE \"name\" != 'HTTP GET / DISPATCH'")
.with_expected_errors(Some(0))
.with_pruned_by_stats(Some(2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.with_pruned_by_stats(Some(2))
// prune out first and last row group
.with_pruned_by_stats(Some(2))

.with_scenario(Scenario::PeriodsInColumnNames)
.with_query( "SELECT \"name\", \"service.name\" FROM t WHERE \"service.name\" = 'frontend' AND \"name\" != 'HTTP GET / DISPATCH'")
.with_expected_errors(Some(0))
.with_pruned_by_stats(Some(2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.with_pruned_by_stats(Some(2))
// prune out first and last row group
.with_pruned_by_stats(Some(2))

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @Lordworms

@Lordworms
Copy link
Contributor Author

an unused variable that I used to debug has not been deleted, I'll fix it soon

@alamb alamb merged commit 31b9b48 into apache:main Jan 23, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 23, 2024

FYI @Ted-Jiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make test_row_group_prune test more readable
2 participants