-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tabular Ingest - identical file names with different extension, e.g. .xlsx and . csv #6991
Comments
@philippconzett do you think we could reproduce this issue with the files you provided at IQSS/dataverse-sample-data@f3ef7ee ? You made a nice detailed commit message which I'll copy and paste below:
|
We should've double-checked what Dataverse v5 (release candidate) is currently doing, before opening the issue. Because there is a chance this is already being handled the way we want. |
@pdurbin - Thanks for adding that extra text btw; it clarifies and disambiguates a lot. (I typed my comment above before seeing it). So yes, this is definitely about the special case of what happens when files (and filenames) get modified after the fact, by the tabular ingest process. When trying to reproduce, let's remember to test against the current develop/v5 draft branch, and not the currently-released v4.20. Because that functionality, that deals with "duplicates" and such has been modified since then. |
Thanks for creating this issue, and sorry for my late reply. I now see that you already have figured out that most parts of this issue are fixed in V5. I just would like to repeat once more the reason for the request: In DataverseNO, we require depositors to provide tabular data as tab-separated plain text files with the extension .txt (which is default in Excel with Norwegian settings). If they want, the also can provide the same content in the original file format, e.g. .xlsx or .ods. We also require the .txt file and the file in the original file format to have the same file name (expect for the file extension), because this makes it much easier for me to check once a year whether all files in DataverseNO are in a preferred file format (this check is part of our Preservation Plan). Ideally, we also would like the tab-separated .txt file to be properly ingested, but I guess this related to another issue. |
@philippconzett I'd feel remiss if I didn't mention that @donsizemore and I made and merged IQSS/dataverse-sample-data#20 yesterday because we were having trouble with the files you added. The files are still there and can be tested but what I think we were observing was that the text file couldn't be added because of it was a duplicate of a file that had been ingested. That's my theory anyway. And duplicate file handling will be more permissive once #6924 gets merged. |
This was merged so it's probably time to re-test. @philippconzett do you want to try? |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
@philippconzett Thank you for the suggestion in our community mtg chat. I took the liberty of creating a new issue here in GitHub. If you could, please provide any more details about your use case here. We briefly discussed this as a team and think this is something we can improve upon.
The text was updated successfully, but these errors were encountered: