Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port tests in csv_files.rs to sqllogictest #8251

Closed
wants to merge 24 commits into from

Conversation

Asura7969
Copy link
Contributor

@Asura7969 Asura7969 commented Nov 17, 2023

Which issue does this PR close?

Closes #8197 .

Rationale for this change

What changes are included in this PR?

2 parts:

  • Add quote & escape for create external table in sql(The api method is already supported)
  • Port tests in csv_files.rs to sqllogictest

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 17, 2023
@@ -73,6 +73,8 @@ impl TableProviderFactory for ListingTableFactory {
CsvFormat::default()
.with_has_header(cmd.has_header)
.with_delimiter(cmd.delimiter as u8)
.with_quote(cmd.options.get("quote").map_or(b'"', |x| x.as_bytes()[0]))
.with_escape(cmd.options.get("escape").map_or(None, |x| Some(x.as_bytes()[0])))
.with_file_compression_type(file_compression_type),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the best way, maybe we can open another issue to improve it

Copy link
Contributor

@korowa korowa Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'd prefer to file a separate issue for integrating QUOTE & ESCAPE into CreateExternalTable pipeline. My concern here is that it probably will be confusing for end-users to pass read options into clause reserved for write options (and without any effect on writing CSV).

WITH HEADER ROW
DELIMITER ','
OPTIONS ('quote' '~')
LOCATION '../../testing/data/csv/csv_custom_quote.csv';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is needed here, but I don't know how to create it.

The contents of the file(csv_custom_quote.csv) are as follows:

c1,c2
~id0~,~value0~
~id1~,~value1~
~id2~,~value2~
~id3~,~value3~
~id4~,~value4~
~id5~,~value5~
~id6~,~value6~
~id7~,~value7~
~id8~,~value8~
~id9~,~value9~

WITH HEADER ROW
DELIMITER ','
OPTIONS ('escape' '\')
LOCATION '../../testing/data/csv/csv_custom_escape.csv';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same

The contents of the file(csv_custom_escape.csv):

c1,c2
"id0","value\"0"
"id1","value\"1"
"id2","value\"2"
"id3","value\"3"
"id4","value\"4"
"id5","value\"5"
"id6","value\"6"
"id7","value\"7"
"id8","value\"8"
"id9","value\"9"

@alamb
Copy link
Contributor

alamb commented Nov 20, 2023

Thank you @Asura7969 . This PR appears to be failing CI, so marking it as draft until that is fixed.

@Asura7969
Copy link
Contributor Author

Thank you @Asura7969 . This PR appears to be failing CI, so marking it as draft until that is fixed.

The reason for this failure is that the test file is missing. Can you help me check test?

@korowa
Copy link
Contributor

korowa commented Nov 21, 2023

The reason for this failure is that the test file is missing. Can you help me check apache/arrow-testing#97?

I believe another way is to generate custom-delimited (or escaped) .csv while configuring test context -- reference example might be csv generation for joins.slt

@Asura7969
Copy link
Contributor Author

I believe another way is to generate custom-delimited (or escaped) .csv while configuring test context -- reference example might be csv generation for joins.slt

Thanks for your tip, I'll improve it

@Asura7969
Copy link
Contributor Author

I believe another way is to generate custom-delimited (or escaped) .csv while configuring test context -- reference example might be csv generation for joins.slt

This PR also involves two attribute(escape and quote) configurations of the csv file,when creating a table using sql,For example:

CREATE EXTERNAL TABLE custom_escape (
    c1  VARCHAR DEFAULT NULL,
    c2  VARCHAR DEFAULT NULL
)
STORED AS CSV
WITH HEADER ROW
DELIMITER ','
OPTIONS ('escape' '\')  -- this
LOCATION '../core/tests/data/custom_escape.csv';

Using the API to create similar tables supports escape and quote, but SQL does not support it. Your tip may not cover the SQL method.

Thanks again for the tips.

@Asura7969
Copy link
Contributor Author

I created the test file under the path datafusion/core/tests/data/. I don’t know if it is correct.

@korowa
Copy link
Contributor

korowa commented Nov 21, 2023

I created the test file under the path datafusion/core/tests/data/. I don’t know if it is correct.

I suppose, it's more neat to use enable_testdir method of TextContext for all temporary test data.

@Asura7969
Copy link
Contributor Author

I suppose, it's more neat to use enable_testdir method of TextContext for all temporary test data.

I used enable_testdir method,thanks for your advice

@Asura7969
Copy link
Contributor Author

cc @alamb

@alamb alamb marked this pull request as ready for review November 28, 2023 15:09
@alamb alamb marked this pull request as draft November 28, 2023 15:11
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Asura7969 -- this code looks great to me. Also thank you very much for filing #8310

In order to avoid losing test coverage, I think we should fix #8310 before merging this one.

Is that something you are able to try @Asura7969 ? Conveniently, you already have written the test for the code in this PR :)

@@ -104,6 +104,10 @@ impl TestContext {
info!("Registering metadata table tables");
register_metadata_tables(test_ctx.session_ctx()).await;
}
"csv_files.slt" => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Asura7969
Copy link
Contributor Author

Is that something you are able to try @Asura7969 ? Conveniently, you already have written the test for the code in this PR :)

I'll try it this week

@alamb
Copy link
Contributor

alamb commented Nov 29, 2023

Is that something you are able to try @Asura7969 ? Conveniently, you already have written the test for the code in this PR :)

I'll try it this week

For anyone following along this is done in #8351

@simicd
Copy link
Contributor

simicd commented Jan 17, 2024

Thanks for implementing #8351 @Asura7969! Since all tests for #8197 have been ported & with #8885 being merged this draft PR here can be closed?

@alamb
Copy link
Contributor

alamb commented Jan 17, 2024

Thanks for implementing #8351 @Asura7969! Since all tests for #8197 have been ported & with #8885 being merged this draft PR here can be closed?

I agree -- thank you @simicd

@alamb alamb closed this Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port tests in csv_files.rs to sqllogictest
4 participants