Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-update mechanism for dataframe test #10373

Open
jayzhan211 opened this issue May 4, 2024 · 6 comments
Open

Auto-update mechanism for dataframe test #10373

jayzhan211 opened this issue May 4, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented May 4, 2024

Is your feature request related to a problem or challenge?

While working on #10364, I found that changing the result in the rust test is quite painful.

Currently, we need to fix the string manually one by one

It would be nice if there is an easy way to update the test.

In sqllogictest, we can easily done it with --complete flag.

Describe the solution you'd like

Given the test, having a very easy way to auto-update result string

#[tokio::test]
async fn test_fn_upper() -> Result<()> {
    let expr = upper(col("a"));

    let expected = [
        "+---------------+",
        "| upper(test.a) |",
        "+---------------+",
        "| ABCDEF        |",
        "| ABC123        |",
        "| CBADEF        |",
        "| 123ABCDEF     |",
        "+---------------+",
    ];
    assert_fn_batches!(expr, expected);

    Ok(())
}

Approach 1

One possible solution is writing the result to the file, and comparing it with similar to sqllogictest, but since we need to call expr API, the API calls remain in the rust test, and only the output goes to output file.

Approach 2

Based on #8736
We can switch between SQL string and Expr and compare the result like sqllogictest does

run_query may be like

    let ctx = SessionContext::new();
    // one csv table per test file with the same name
    // so tests/data/example.csv is the table for tests in tests/data/example.slt
    let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
    // sql to expr
    let expr = ascii(col("a"));
    let df = df.select(vec![expr])?.collect().await?;
    // check the values like sqllogictest
    assert_eq!(df, "expected string");

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211 jayzhan211 added the enhancement New feature or request label May 4, 2024
@jayzhan211 jayzhan211 changed the title Auto-update mechanism for dataframe/expr test Auto-update mechanism for dataframe test May 4, 2024
@alamb
Copy link
Contributor

alamb commented May 8, 2024

Something we have used to great effect in influxdb is https://insta.rs/

You can then do the equivalent of sqllogictest --complete (even for results within files) with a command like

cargo insta review

Some downsides are that it is is yet another dependency (and to use it you need to install cargo install cargo-insta

@PsiACE
Copy link
Member

PsiACE commented Jun 19, 2024

I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile

UPDATE_GOLDENFILES=1 cargo test

If we decide to use a specific approach, I think I can try to take this issue.

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jun 20, 2024

I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile

UPDATE_GOLDENFILES=1 cargo test

If we decide to use a specific approach, I think I can try to take this issue.

We could give it a try!
As long the test is easy to maintain, I will take it.

The goal of this issue is to fix #10364 .

I stalled on the progress because #10364 is not a high priority issue (we can easily add alias to avoid the issue + it is probably solved in #11020), and since datafusion/example is not test, so I can not easily update it with cargo insta. Therefore, I just leave it there for now. I hope goldenfile could make datafusion/example easy to update too!

@PsiACE
Copy link
Member

PsiACE commented Jun 21, 2024

I think goldenfile can fulfill this requirement. I'll create some examples and submit a PR for evaluation. If everything works well, we can proceed with migrating the dataframe tests and examples. Please assign me this issue. cc @alamb @jayzhan211

@jayzhan211 jayzhan211 assigned jayzhan211 and PsiACE and unassigned jayzhan211 Jun 21, 2024
@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

Thanks @PsiACE and @jayzhan211

@jayzhan211
Copy link
Contributor Author

@PsiACE It would also be interesting that how much the size increase after the change #11105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants