Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: resumable uploads #298

Closed
stephenplusplus opened this issue Nov 11, 2014 · 11 comments · Fixed by #299
Closed

storage: resumable uploads #298

stephenplusplus opened this issue Nov 11, 2014 · 11 comments · Fixed by #299
Assignees
Labels
api: storage Issues related to the Cloud Storage API. 🚨 This issue needs some love. triage me I really want to be triaged.

Comments

@stephenplusplus
Copy link
Contributor

Big files take big time. Big time means big chances for failure. Currently, if something goes wrong during an upload at 95%, the user is forced to re-upload starting from 0%. Using the resumable upload capabilities of the storage API, we should be able to handle this for our users.

It's not exactly trivial, however. My current idea for a solution involves storing a configuration file on the user/server's drive, where we can store tokens/state of uploads. Yeoman happens to have a tool to enable this for us, https://github.com/yeoman/configstore

Unless there are any objections, or more ideas for solutions, I will try to have a PR this week with this functionality.

@ryanseys
Copy link
Contributor

What strategy will be taken if writing to disk is not possible due to
permission issues or some other arbitrary reason?

@stephenplusplus
Copy link
Contributor Author

I guess fallback to non-resumable behavior. What would make the most sense?

On Monday, November 10, 2014, Ryan Seys notifications@github.com wrote:

What strategy will be taken if writing to disk is not possible due to
permission issues or some other arbitrary reason?


Reply to this email directly or view it on GitHub
#298 (comment)
.

@ryanseys
Copy link
Contributor

Exposing the id and other necessary information used to me as a
developer may be nice so I can store it wherever I want (maybe in data
store? ;) )

On Monday, November 10, 2014, Stephen Sawchuk notifications@github.com
wrote:

I guess fallback to non-resumable behavior. What would make the most
sense?

On Monday, November 10, 2014, Ryan Seys <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

What strategy will be taken if writing to disk is not possible due to
permission issues or some other arbitrary reason?


Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/gcloud-node/issues/298#issuecomment-62487064>

.


Reply to this email directly or view it on GitHub
#298 (comment)
.

@silvolu
Copy link
Contributor

silvolu commented Nov 11, 2014

The boto approach might be interesting: the library will automatically retry x times on failures; a tracker file is optional and can be used to save/read the state.

@stephenplusplus
Copy link
Contributor Author

That boto approach makes sense. We should be able to do the same thing. If a tracker file is provided, we first read it to see if there's already data to use to resume the upload. If not, we store the state of the upload in it and are able to re-use it the next time through.

  • does the tracker file get written to if the upload is successful?
  • is there a standard format the tracker file is written in?

@silvolu
Copy link
Contributor

silvolu commented Nov 11, 2014

does the tracker file get written to if the upload is successful?

If the option is passed the file is written, and it's then removed if the upload is successful or if any non-retryable exception is raised.

is there a standard format the tracker file is written in?

Looks like is just a line in the tracker file that contains the URI.

@stephenplusplus
Copy link
Contributor Author

SGTM 👍

@stephenplusplus
Copy link
Contributor Author

After thinking more on this, I think it might be beneficial to still attempt to handle resumable uploads automatically, without a tracker file having to be provided:

fs.createReadStream("./photos.zip")
  .pipe(file.createWriteStream())
  .on("error", callback)

bucket.upload("./photos.zip", callback)
  • Start resumable upload
  • Store resumable ID in a config file (using yeoman/configstore)
  • If it succeeds, remove id from file
  • If it fails, emit error/callback with resumable ID / message that says "try again to resume"

Handling the streaming upload should be possible by counting bytes: https://cloud.google.com/storage/docs/concepts-techniques#unknownresumables

** edit: can't read. JSON docs: https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload#resumable **

@silvolu
Copy link
Contributor

silvolu commented Nov 11, 2014

After thinking more on this, I think it might be beneficial to still attempt to handle resumable uploads automatically, without a tracker file having to be provided:

Yeah, as mentioned the tracker file is optional.

Store resumable ID in a config file (using yeoman/configstore)

How is this different from using a tracker file? Can't we just store the ID in memory and on failure emit it?

@stephenplusplus
Copy link
Contributor Author

Yeah, as mentioned the tracker file is optional ... How is this different from using a tracker file? Can't we just store the ID in memory and on failure emit it?

By defaulting to "we'll store it in a file for you" but allowing "give us a file to store it in", I think that would create a confusing situation where it's not clear when it makes sense to provide your own file. The file is useless outside of gcloud-node, as far as I know. After playing with this today, I can't even think of a good reason to return the resumable ID to the user -- with it alone, it's not very handy. If they don't use it with gcloud-node, they need to somehow assemble their own request (authenticate the headers, read the last byte sent, discard unused bytes from the retried upload data stream, etc). I'd rather just handle everything, and emit an error as we currently do - if the upload fails for any reason. The user will try again, and we'll automatically pick up where we left off, without them having to think about it.

The configstore is preferred because the user won't even need to know there is a config file.

@silvolu
Copy link
Contributor

silvolu commented Nov 11, 2014

SGTM.

@stephenplusplus stephenplusplus added this to the 1.0.0 milestone Nov 14, 2014
@jgeewax jgeewax added the api: storage Issues related to the Cloud Storage API. label Feb 2, 2015
@jgeewax jgeewax modified the milestones: 1.0.0, Storage Stable Feb 2, 2015
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Apr 6, 2020
sofisl pushed a commit that referenced this issue Sep 15, 2022
BREAKING CHANGE: The library now supports Node.js v10+. The last version to support Node.js v8 is tagged legacy-8 on NPM.

New feature: methods with pagination now support async iteration.
sofisl pushed a commit that referenced this issue Sep 16, 2022
sofisl pushed a commit that referenced this issue Sep 27, 2022
sofisl pushed a commit that referenced this issue Nov 11, 2022
* chore: update codeowners

* 🦉 Updates from OwlBot

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Benjamin E. Coe <bencoe@google.com>
sofisl pushed a commit that referenced this issue Nov 11, 2022
sofisl pushed a commit that referenced this issue Nov 18, 2022
This PR was generated using Autosynth. 🌈

Synth log will be available here:
https://source.cloud.google.com/results/invocations/cc99acfa-05b8-434b-9500-2f6faf2eaa02/targets

- [ ] To automatically regenerate this PR, check this box.

Source-Link: googleapis/synthtool@799d8e6
sofisl pushed a commit that referenced this issue Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. 🚨 This issue needs some love. triage me I really want to be triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants