Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8525 ingest optional skip #8532

Merged
merged 24 commits into from
Apr 12, 2022
Merged

Conversation

lubitchv
Copy link
Contributor

What this PR does / why we need it: This PR adds optional parameter to native add file to dataset API. The parameter is tabIngest. By default it is true. If tabIngest is false then api call skips tabular ingest.

Which issue(s) this PR closes:

Closes #8525

Suggestions on how to test this:
Test testAddFileToDatasetTabIngest was added in DatasetsIT.
One can also test this api call using curl:
curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@test.sav" -F 'jsonData={"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "tabIngest":"false"}' http://localhost:8080/api/datasets/$DATASET_ID/add

@coveralls
Copy link

coveralls commented Mar 24, 2022

Coverage Status

Coverage increased (+0.009%) to 18.897% when pulling fe9e7e6 on lubitchv:8525-injest-optional-skip into c5d1df2 on IQSS:develop.

@pdurbin pdurbin self-assigned this Mar 28, 2022
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this pull request looks great. It's API-only but sets the stage for a future GUI to allow users to skip ingest (#2199).

I do have a few small items to look at. Please see the review. Thanks!

doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
src/test/java/edu/harvard/iq/dataverse/api/DatasetsIT.java Outdated Show resolved Hide resolved
@qqmyers
Copy link
Member

qqmyers commented Mar 28, 2022

FWIW: The current failure is just #8533 - not related to your PR.

@lubitchv lubitchv requested a review from pdurbin March 29, 2022 13:46
Comment on lines 2603 to 2613
String pathToFile = "src/test/resources/sav/dct.sav";
String jsonAsString = "{\"description\":\"My description.\",\"directoryLabel\":\"data/subdir1\",\"categories\":[\"Data\"], \"restrict\":\"false\", \"tabIngest\":\"false\"}";
Response r = UtilIT.uploadFileViaNative(datasetIdInt.toString(), pathToFile, jsonAsString, apiToken);
logger.info(r.prettyPrint());
assertEquals(200, r.getStatusCode());

pathToFile = "src/test/resources/sav/frequency-test.sav";
jsonAsString = "{\"description\":\"My description.\",\"directoryLabel\":\"data/subdir1\",\"categories\":[\"Data\"], \"restrict\":\"false\" }";
Response rTabIngest = UtilIT.uploadFileViaNative(datasetIdInt.toString(), pathToFile, jsonAsString, apiToken);
logger.info(rTabIngest.prettyPrint());
assertEquals(200, rTabIngest.getStatusCode());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to assert somehow if the file was ingested or not. I'm not sure about the easiest way. Maybe UtilIT.downloadTabularFile? Or check if there's a UNF?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the file name may be enough (making sure it's still ".sav", and not ".tab").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, I agree, we do need to confirm that the ingest was indeed skipped.
Also, the 2nd file upload in this method - where ingest is NOT being skipped - it may not be necessary, since we are already testing actual ingest elsewhere (in FilesIT.java?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I renamed the test and moved it to FilesIT. I also added the ingest check by checking label of file metadata.

@pdurbin pdurbin requested a review from landreev March 29, 2022 14:31
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run the code but I'm clicking Approve. Looks good. Thanks, @lubitchv!

However, I'm going to keep this PR in Review in case @landreev wants to take a look.

@lubitchv I did leave you a comment about adding extra assertions that the .sav file was or wasn't ingested, if you feel like it.

@@ -1301,6 +1301,7 @@ When adding a file to a dataset, you can optionally specify the following:
- A description of the file.
- The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset.
- Whether or not the file is restricted.
- Whether or not the file skips tabular ingest.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably say that it defaults to "true" (?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the phrase that if tabIngest is not specified then it defaults to true.

@landreev landreev self-assigned this Mar 29, 2022
@@ -2583,5 +2583,48 @@ public void testFilesUnchangedAfterDatasetMetadataUpdate() throws IOException {
.body("data.latestVersion.files[0].directoryLabel", equalTo("code"));

}

@Test
public void testAddFileToDatasetTabIngest() throws IOException, InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to rename this test method, to make it clear that it is specifically testing skipping ingest; to differentiate it from all the other ingest tests in FilesIT etc.
(Thinking about it, maybe this method needs to be moved to FilesIT.java too?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to testAddFileToDatasetSkipTabIngest and moved it to FilesIT.

@pdurbin pdurbin changed the title 8525 injest optional skip 8525 ingest optional skip Mar 29, 2022
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@landreev landreev removed their assignment Mar 29, 2022
Response r = UtilIT.uploadFileViaNative(datasetIdInt.toString(), pathToFile, jsonAsString, apiToken);
logger.info(r.prettyPrint());
assertEquals(200, r.getStatusCode());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep bugging you but, since this hasn't been merged yet, could you please insert another sleepForLock statement here? (Otherwise, if the ingest somehow ends up happening, despite the tabIngest:false above, we will likely fail to detect that - because the file name will be checked before it gets updated!)
It should be the same entry as in line 1820:

assertTrue("Failed test if Ingest Lock exceeds max duration " + pathToFile, UtilIT.sleepForLock(datasetIdInt, "Ingest", apiToken, UtilIT.MAXIMUM_INGEST_LOCK_DURATION));

Thanks. (I should've thought about this sooner of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added sleep for lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

@kcondon
Copy link
Contributor

kcondon commented Apr 8, 2022

@lubitchv would you mind refreshing from develop branch since we've had a version change?

@lubitchv
Copy link
Contributor Author

Updated

@kcondon kcondon self-assigned this Apr 11, 2022
@kcondon
Copy link
Contributor

kcondon commented Apr 11, 2022

@pdurbin
Copy link
Member

pdurbin commented Apr 11, 2022

@lubitchv just to give a little more detail on the fail test above:

org.xml.sax.SAXParseException:
Unexpected EOF in prolog
at [row,col {unknown-source}]: [1,0]
at edu.harvard.iq.dataverse.api.DatasetsIT.testRestrictFileExportDdi(DatasetsIT.java:2243)
Caused by: com.ctc.wstx.exc.WstxEOFException:
Unexpected EOF in prolog
at [row,col {unknown-source}]: [1,0]
at edu.harvard.iq.dataverse.api.DatasetsIT.testRestrictFileExportDdi(DatasetsIT.java:2243)

Something to do with exporting the DDI format, it seems.

@lubitchv
Copy link
Contributor Author

I run DatasetsIT.testRestrictFileExportDdi on my machine. It was fine. Maybe it is database issue?

@pdurbin
Copy link
Member

pdurbin commented Apr 12, 2022

@lubitchv you're right. DatasetsIT runs fine for me on my laptop as of ecc9ffc.

I just pushed a couple small doc changes to your branch to force Jenkins to run again. Let's check back later today to see if the tests pass.

@kcondon kcondon merged commit 9094b94 into IQSS:develop Apr 12, 2022
@pdurbin pdurbin added this to the 5.11 milestone Apr 12, 2022
@qqmyers
Copy link
Member

qqmyers commented Apr 12, 2022

FWIW: My suspicion is that the test fail here was related to gdcc/dataverse-ansible#231. Don turned off having the tests run with the languages and metadata languages features set to en or de-DE as of last night.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optional tabular ingest skipping
6 participants