Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable request body streaming #12

Closed
wants to merge 14 commits into from
Closed

Enable request body streaming #12

wants to merge 14 commits into from

Conversation

janko
Copy link
Member

@janko janko commented May 2, 2017

This pull request allows multipart data to be streamed into the request body using the new streaming API in HTTP.rb (httprb/http#409). The advantage of streaming is that, when File parts are backed by files on the filesystem, multipart data doesn't have to be loaded whole into memory before its sent as the request body, but rather only small parts will be loaded into memory at a time.

First I needed to modify FormData::File to support any IO object, not just File or StringIO, so that the logic can be simpler. I think this feature is useful in general, as explained in httprb/http#409 (comment).

I also modified the File object to be opened in binary mode, to prevent any conversions of carriage returns and line feeds by different operating systems. This is not related to this PR, but I noticed it and wanted to do it so that I don't forget.

FormData::File has implicitly supported Pathname objects (which is used in tests), but the previous refactoring has forced me to make this support explicit. I think that's good, since some folks might be relying on it, and it's nice to support Pathname for file paths in general.

Finally, I modified all components to be backed by an IO object, so that they can all implement a common IO interface (FormData::Readable), enabling them to be composed using CompositeIO. Most of the CompositeIO implementation was borrowed from multipart-post. I needed to include #rewind in the interface so that #to_s can be called multiple times.

janko added 9 commits May 2, 2017 00:02
Previously we allowed only File and StringIO objects as an input to
FormData::File, but we can generalize that to any IO object that
responds to #read and #size (which includes Tempfile,
ActionDispatch::Http::UploadedFile etc).
That way different operating systems won't attempt to convert newline
characters to their internal representation, instead the file content
will always be retrieved byte-for-byte as is.
Previously Pathname was implicitly supported, though extracting filename
wasn't working. With the recent refactoring this stopped working, so we
make the Pathname support explicit.
By changing all components to use an IO object as a base, we can
implement a common IO interface for all components, which delegates to
the underlying IO object.

This enables streaming multipart data into the request body, avoiding
loading the whole multipart data into memory when File parts are backed
by File objects.

See httprb/http#409 for the new streaming API.
By delegating handling strings to CompositeIO we can remove a lot of the
StringIO.new clutter when instantiating CompositeIO objects.
@janko
Copy link
Member Author

janko commented May 2, 2017

I'm not able to further reduce cyclomatic complexity of CompositeIO#read, implementation of Ruby's #read behaviour is simply complex 😛. Unless you have suggestions, is it ok that I ignore the Rubocop offense for that method?

@tarcieri
Copy link
Member

tarcieri commented May 2, 2017

@janko-m sure, or just increase the global cyclomatic complexity rating. That looks fine for what it is, which means the global limit is probably a bit too low.

@ixti ixti self-assigned this May 2, 2017
@ixti
Copy link
Member

ixti commented May 2, 2017

I'll review both http pr and this one over this week (give me couple of days - away from home atm)

janko added 2 commits May 3, 2017 18:50
We increase it for FormData::CompositeIO#read.
This way we're not creating a new string for each chunk read, instead
each chunk will be read into an existing string object (a "buffer"),
replacing any previous content.
Copy link
Member

@ixti ixti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need a bit more time on review.

class CompositeIO
# @param [Array<IO>] ios Array of IO objects
def initialize(*ios)
@ios = ios.flatten.map { |io| io.is_a?(String) ? StringIO.new(io) : io }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should fail if io is neither String nor IO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, CompositeIO seems to be be internal class, and I don't see any point in allowing passing *any amount of args... Instead it should simply accept a single argument as flat array, thus here we will just map it.

@@ -34,30 +32,19 @@ def content_type
#
# @return [Integer]
def content_length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth get rid of this method and use alias instead?

@janko
Copy link
Member Author

janko commented May 8, 2017

@ixti Updated 😃

ixti pushed a commit that referenced this pull request May 8, 2017
This is a squashed commit of [PR#12][] with some tiny cleanups
applied on top of that.

[PR#12]: #12

Allow any IO object in FormData::File
-------------------------------------

Previously we allowed only File and StringIO objects as an input
to `FormData::File`, but we can generalize that to any IO object that
responds to `#read` and `#size` (which includes `Tempfile`,
`ActionDispatch::Http::UploadedFile` etc).

Open File for given path in binary mode
---------------------------------------

That way different operating systems won't attempt to convert newline
characters to their internal representation, instead the file content
will always be retrieved byte-for-byte as is.

Officially support Pathname in FormData::File.new
-------------------------------------------------

Previously Pathname was implicitly supported, though extracting
filename wasn't working. With the recent refactoring this stopped
working, so we make the Pathname support explicit.

Make all components into IO objects
-----------------------------------

By changing all components to use an IO object as a base, we can
implement a common IO interface for all components, which delegates to
the underlying IO object.

This enables streaming multipart data into the request body, avoiding
loading the whole multipart data into memory when File parts are backed
by File objects.

See httprb/http#409 for the new streaming API.

Make CompositeIO convert strings to StringIOs
---------------------------------------------

By delegating handling strings to CompositeIO we can remove a lot of the
StringIO.new clutter when instantiating CompositeIO objects.

Use a buffer when reading IO files in CompositeIO
-------------------------------------------------

This way we're not creating a new string for each chunk read, instead
each chunk will be read into an existing string object (a "buffer"),
replacing any previous content.
@ixti
Copy link
Member

ixti commented May 8, 2017

Merged via 5688433

@ixti ixti closed this May 8, 2017
@janko
Copy link
Member Author

janko commented May 8, 2017

@ixti Thank you 🎉

Once new version form_data.rb is released with this, I can go ahead an update HTTP.rb to utilize the multipart streaming (basically just remove the Multipart#to_s call), and bump the form_data.rb version in the gemspec.

@janko janko deleted the enable-request-body-streaming branch May 8, 2017 11:33
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Dec 9, 2017
This version is required by newer ruby-http gem 3.0.0.

Upstream changes: (from CHANGES.md)

## 2.0.0 (2017-10-01)

* [#17](httprb/form_data#17)
  Add CRLF character to end of multipart body.
  [@mhickman][]


## 2.0.0.pre2 (2017-05-11)

* [#14](httprb/form_data#14)
  Enable streaming for urlencoded form data.
  [@janko-m][]


## 2.0.0.pre1 (2017-05-10)

* [#12](httprb/form_data#12)
  Enable form data streaming.
  [@janko-m][]
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Mar 17, 2018
## 2.1.0 (2018-03-05)

* [#21](httprb/form_data#21)
  Rewind content at the end of `Readable#to_s`.
  [@janko-m][]

* [#19](httprb/form_data#19)
  Fix buffer encoding.
  [@HoneyryderChuck][]


## 2.0.0 (2017-10-01)

* [#17](httprb/form_data#17)
  Add CRLF character to end of multipart body.
  [@mhickman][]


## 2.0.0.pre2 (2017-05-11)

* [#14](httprb/form_data#14)
  Enable streaming for urlencoded form data.
  [@janko-m][]


## 2.0.0.pre1 (2017-05-10)

* [#12](httprb/form_data#12)
  Enable form data streaming.
  [@janko-m][]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants