Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR]: Make some slt tests deterministic #8525

Merged
merged 1 commit into from
Dec 14, 2023
Merged

[MINOR]: Make some slt tests deterministic #8525

merged 1 commit into from
Dec 14, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

While working on another PR, I realized that some of the test results are not deterministic. This PR makes these test results deterministic.

What changes are included in this PR?

Are these changes tested?

Existing tests should work

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 13, 2023
Copy link
Contributor

@metesynnada metesynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Dec 13, 2023

Quite curious, if the test is not deterministic, why we did not fail on this test before?

Oh, I think probably because of set datafusion.execution.target_partitions = 1;

Maybe we can remove unneccessary set datafusion.execution.target_partitions = 1; before test, so we know what test need specific setting easily.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- the output order some of these queries likely depends both on the threading (e.g test_threads) as well as defails of how hash values are computed (and thus what the order of groups is), etc

@@ -38,17 +38,17 @@ LOCATION '../../testing/data/csv/aggregate_test_100.csv'
# Basic example: distinct on the first column project the second one, and
# order by the third
query TI
SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3;
SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3, c9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing the order by changes the results because DISTINCT ON basically picks the first values for each of the distinct values (of c1) in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

@mustafasrepo
Copy link
Contributor Author

Quite curious, if the test is not deterministic, why we did not fail on this test before?

Oh, I think probably because of set datafusion.execution.target_partitions = 1;

Maybe we can remove unneccessary set datafusion.execution.target_partitions = 1; before test, so we know what test need specific setting easily.

Yes, because setting is not random. Results are deterministicat each run. However, also in terms of query specifications some results are not certain. For instance for the query

SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3;

Both

a 5
b 4
c 2
d 1
e 3

and

a 4
b 4
c 2
d 1
e 3

are valid. However for query

SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3, c9;

only valid result is

a 4
b 4
c 2
d 1
e 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants