Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS/S3 configuration and upload problem #5062

Closed
jmjamison opened this issue Sep 17, 2018 · 10 comments
Closed

AWS/S3 configuration and upload problem #5062

jmjamison opened this issue Sep 17, 2018 · 10 comments
Labels
Feature: File Upload & Handling Type: Bug a defect User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh Vote to Close: pdurbin

Comments

@jmjamison
Copy link
Contributor

I have a test dataverse on AWS using S3 storage. (http://54.67.118.35:8080/). We were unable to upload files.

The jvm dataverse.files.directory is default set to -Ddataverse.files.directory=/usr/local/dvn/data. Once that option was deleted we could upload and display files.

Although it appears to work I'm really worried about just deleting a jvm option without know what it should be set to.

Jamie Jamison
UCLA data science center

@jmjamison
Copy link
Contributor Author

I should have added I used the install directions for setting the storage to S3.

  1. Removed: Ddataverse.files.storage-driver-id=file
  2. Replaced with: Ddataverse.files.storage-driver-id=s3 and added the bucket name
  3. set up the .aws credentials

@pdurbin
Copy link
Member

pdurbin commented Sep 17, 2018

@jmjamison thanks for opening this issue and for all the chatter in various places:

It's weird to me that deleting that dataverse.files.directory JVM option had an effect but it seems like it's been a good fix for you, which is great.

One thing I wanted to point out is that the dataverse.files.directory JVM option is set to "/usr/local/dvn/data" when using the "dataverse-ansible" configs (see https://github.com/IQSS/dataverse-ansible/blob/a7251f975c913924c8bc493264d942c5e06b56a0/defaults/main.yml#L25 ) but if you use the regular Dataverse installation process the directory "/usr/local/glassfish4/glassfish/domains/domain1/files" is used instead, as described at http://guides.dataverse.org/en/4.9.2/installation/config.html#file-storage-local-filesystem-vs-swift-vs-s3 . I mention this just so that developers aren't confused about this when working on this issue. @kcondon @matthew-a-dunlap and I discussed this a bit this morning so the three of us, at least, are on the same page.

Thanks again. At minimum, I believe we need to fix up the docs. It's also quite possible you've found a bug.

@qqmyers
Copy link
Member

qqmyers commented Sep 18, 2018

FWIW - dataverse.files.directory is used to determine the temp dir where files are written before transfer to S3. (If not set the default is /tmp/files/temp). So having it set probably isn't a problem itself, but if that dir doesn't exist or isn't writable...
Maybe the fact that this dir affects the temp file location for no File IO providers is the thing to document?

@kcondon
Copy link
Contributor

kcondon commented Sep 18, 2018

I've confirmed through testing it behaves the way @qqmyers described. Additionally, if I configure a path the glassfish user does not have write access to, you can select files to upload, they appear to upload but then do not appear in the uploaded list on the upload files page -they disappear.

@poikilotherm
Copy link
Contributor

poikilotherm commented Oct 2, 2018

Hey guys,
curious if there is room for improvement here? This could hit us (FZJ) pretty badly in production if drives start to get full or other stuff happens.

IMHO there should be an error message that the files cannot be uploaded right now before the potentially large upload starts. At least there should be a clear error message.

Maybe add check at startup time that ensures a writeable path? And maybe an extra check before the upload starts?

Ideally there should also be a check if the file can be uploaded in terms of storage capacity within the temp folder, but this is another story.

@pdurbin
Copy link
Member

pdurbin commented Oct 2, 2018

@poikilotherm there is definitely room for improvement. Any interest in poking around in the code and maybe making a pull request?

@qqmyers
Copy link
Member

qqmyers commented Oct 2, 2018

FWIW: The permission issue that this issue started with should be a one-time thing, i.e. once you're setup correctly, users shouldn't be running into it. A check could be written, but it might still be overkill to run it for every upload.

Managing disk space would trickier in many ways. Since multiple users could be uploading in parallel and one user can drag files into the upload interface sequentially, attempts to avoid running out of space would have to reserve space and effectively release it if/when not needed and the GUI would need logic to stop additions to an upload but allow the existing files to complete. Having just been through the code to find places where temp files are being left (pre 4.9.3, when users hit cancel on an upload, or deleted some files before save, temp files were left) I know there are multiple places any reservation would have to be released. (There's still the cases where a user leaves the page without cancelling or saving or a network drop where I haven't yet tracked down how to delete temp files where reservations would also have to be cancelled). At a minimum, if you have limited disk space for temp files, 4.9.3 should be better at making sure it doesn't fill up.

Some good news perhaps is that I think the upload problem is visible if it happens and probably could be made more visible: files dragged into the upload pane are only transferred to the bottom if the ingest completes and writes the temp file. I did submit code a while back to catch and show non-20x responses from the server during upload - I don't know if that's already catching when there's a permission of out-of-space issue (it should if the response is not 201). If so, there's already a GUI display when the error occurs and one could add more logic to the javascript triggering that to decide whether the failure is one that should shut down other ongoing uploads (or warn the user to do a 'cancel', etc.).

@pdurbin
Copy link
Member

pdurbin commented Dec 1, 2023

@jmjamison hi! Are you still having this problem? Can we close this issue? Thanks.

@jmjamison
Copy link
Contributor Author

Apologies. I got side tracked to another (non-Dataverse) problem. I'll go ahead and close this.

@jmjamison
Copy link
Contributor Author

Apologies. I got side tracked to another (non-Dataverse) problem. I'll go ahead and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: File Upload & Handling Type: Bug a defect User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh Vote to Close: pdurbin
Projects
None yet
Development

No branches or pull requests

5 participants