Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you elaborate on out of bound file transfer? #45

Closed
butonic opened this issue Mar 28, 2019 · 7 comments
Closed

Can you elaborate on out of bound file transfer? #45

butonic opened this issue Mar 28, 2019 · 7 comments

Comments

@butonic
Copy link
Contributor

butonic commented Mar 28, 2019

I am struggling to rebase our changes on top of the review branch. In the review branch you are planning to move the file up and download out of the cs3 APIs. Can you elaborate on how you plan to do the actual file transfer?

We will need to send the file stream from the ocdavsvc service to the actual storage provider. Do you want to open another htt2 connection for that? or use the existing one to multipex binary chunks over it?

AFAIR we will always have the ocdavsvc or another gateway component in front of the actual storage provider ... so what is your vision on this?

@butonic
Copy link
Contributor Author

butonic commented Mar 28, 2019

Instead of initiating a file up or download I think it makes sense to allow passing in a reference, a chunk of a file or a list of small files. Similar to the Opaque property you introduced in other messages as well.
A reference would work like the proposed initiate response, a chunk could be used for directly uploading chunks and a list o files could be used to aggregate small files into one request (or send a single smaller file)

@moscicki
Copy link

moscicki commented Mar 28, 2019

This requires some discussion indeed.

Let me explain the basic idea first: we could just stay with data upload and download via gRPC for simplicity to start with but with a future outlook in mind, we know that gRPC is not excellent for transferring large files as (streamed) repeated messages (essentially the max reasonable limit would be the size of the gRPC message which IIRC is 3M). Hence, for all data intensive transfer workflows the idea is to redirect to an HTTP(s) endpoint. As a result of this call you'd get a URL with some constrained validity (e.g. you need to start the transfer within next N seconds).

I see that there could be an opportunity to inline small files directly into the payload of the gRPC indeed for optimization. The question is how small is small. Looks like it should not be controversial to say that it would make sense for payloads in KB range.

For things in MB range we already enter the realm of the higher-level chunking (as we know it in owncloud).

Chunking itself is interesting: I think with current usage by the sync client we could say, that it mainly serves to provide resumable upload, right? It looks to me that no standard HTTP resumable upload exists (https://stackoverflow.com/questions/20969331/standard-method-for-http-partial-upload-resume-upload#20978266) ? Another usage is parallel upload: is this really used and makes a difference? If yes, then I would be in favour to provide a different gRPC call to cater for that use-case (possibly the same API for both parallel and resumable).

The bulk operations (bundling many independent files) would definitely deserve a different approach because these operations have complex return status (some but not all files may fail to upload for various reasons). I think this requires careful consideration but perhaps may be considered a second-order optimization and a different (set of) calls?

@butonic
Copy link
Contributor Author

butonic commented Mar 28, 2019

You had me worried there for a second. What about listing large directories (100k files), but the default grpc message size of 4 MB does not affect streams. It only affects individual chunks. Someone actually tested this (although on a loopback device): https://ops.tips/blog/sending-files-via-grpc/

His finding is that 1k seems to be a good chunk size. But plain HTTP2 seems to be twice as fast, if with a little more variance in latency.

Another, but older (2016) related post is https://andrewjesaitis.com/2016/08/25/streaming-comparison/

Thinking ...

@butonic
Copy link
Contributor Author

butonic commented Apr 4, 2019

Ok, best for now would be to at least keep the grpc streaming based file transfer or provide an example how you would do it. While grpc streaming might be sub optimal it makes implementing the api a lot easier IMO. And I think we can iterate on it after we got sharing and search in the protocol. I would prefer a more feature complete protocol, rather then trying prematurely optimize performance.

@butonic
Copy link
Contributor Author

butonic commented Apr 8, 2019

@labkode I was under the impression that clients should be able to talk directly to the services using the cs3 apis? auth aside, shouldn't we then have some api call to stream bytes between services?

@labkode
Copy link
Member

labkode commented Apr 18, 2019

@butonic @moscicki the out-of-band file transfer mechanism is implemented in the review branch, it also includes checksum negotiation for protecting the upload with the checksums offered by the server. I'll update the docs so is easy for everyone to understand it and to test it.

@butonic
Copy link
Contributor Author

butonic commented Apr 26, 2019

examples helped me, thx

@butonic butonic closed this as completed Apr 26, 2019
abaldacchino pushed a commit to abaldacchino/reva that referenced this issue Aug 2, 2023
abaldacchino pushed a commit to abaldacchino/reva that referenced this issue Aug 3, 2023
gmgigi96 pushed a commit to gmgigi96/reva that referenced this issue Aug 10, 2023
abaldacchino pushed a commit to abaldacchino/reva that referenced this issue Aug 24, 2023
abaldacchino pushed a commit to abaldacchino/reva that referenced this issue Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants