Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-working example of how we could expose chunk boundaries #20

Merged
merged 6 commits into from
Nov 7, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1007,3 +1007,41 @@ easier for you to construct this header, it provides:
.. ipython:: python

h11.PRODUCT_ID


Chunked Transfer Encoding Delimiters
------------------------------------

.. versionadded:: 0.7.0

HTTP/1.1 allows for the use of Chunked Transfer Encoding to frame request and
response bodies. This form of transfer encoding allows the implementation to
provide its body data in the form of length-prefixed "chunks" of data.

RFC 7230 is extremely clear that the breaking points between chunks of data are
non-semantic: that is, users should not rely on them or assign any meaning to
them. This is particularly important given that RFC 7230 also allows
intermediaries such as proxies and caches to change the chunk boundaries as
they see fit, or even to remove the chunked transfer encoding entirely.

However, for some applications it is valuable or essential to see the chunk
boundaries because the peer implementation has assigned meaning to them. While
this is again the specification, if you do really need access to this
information h11 makes it available to you in the form of the
:data:`Data.chunk_start` and :data:`Data.chunk_end` properties of the
:class:`Data` event.

:data:`Data.chunk_start` is set to ``True`` for the first :class:`Data` event
for a given chunk of data. :data:`Data.chunk_end` is set to ``True`` for the
last :class:`Data` event that is emitted for a given chunk of data. h11
guarantees that it will always emit at least one :class:`Data` event for each
chunk of data received from the remote peer, but due to its internal buffering
logic it may return more than one. It is possible for a single :class:`Data`
event to have both :data:`Data.chunk_start` and :data:`Data.chunk_end` set to
``True``, in which case it will be the only :class:`Data` event for that chunk
of data.

Again, it is *strongly encouraged* that you avoid relying on this information
if at all possible. This functionality should be considered an escape hatch for
when there is no alternative but to rely on the information, rather than a
general source of data that is worth relying on.
4 changes: 4 additions & 0 deletions docs/source/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ vNEXT (????-??-??)
:func:`Connection.next_event` (see `issue #8
<https://github.com/njsmith/h11/issues/8>`).

* Added :data:`Data.chunk_start` and :data:`Data.chunk_end` properties to the
:class:`Data` event. These provide the user information about where chunk
delimiters are in the data stream from the remote peer when chunked transfer
encoding is in use.

v0.6.0 (2016-10-24)
-------------------
Expand Down
17 changes: 16 additions & 1 deletion h11/_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,8 +214,23 @@ class Data(_EventBundle):
which calling :func:`len` returns the number of bytes that will be
written -- see :ref:`sendfile` for details.

.. attribute: chunk_start

A marker that indicates whether this data object is from the start of a
chunked transfer encoding chunk. This field is ignored when when a Data
event is provided to :meth:`Connection.send`: it is only valid on events
emitted from :meth:`Connection.next_event`.

.. attribute: chunk_end

A marker that indicates whether this data object is the last for a given
chunked transfer encoding chunk. This field is ignored when when a Data
event is provided to :meth:`Connection.send`: it is only valid on events
emitted from :meth:`Connection.next_event`.

"""
_fields = ["data"]
_fields = ["data", "chunk_start", "chunk_end"]
_defaults = {"chunk_start": False, "chunk_end": False}


# XX FIXME: "A recipient MUST ignore (or consider as an error) any fields that
Expand Down
8 changes: 7 additions & 1 deletion h11/_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,14 +245,20 @@ def __call__(self, buf):
if self._bytes_in_chunk == 0:
self._reading_trailer = True
return self(buf)
chunk_start = True
else:
chunk_start = False
assert self._bytes_in_chunk > 0
data = buf.maybe_extract_at_most(self._bytes_in_chunk)
if data is None:
return None
self._bytes_in_chunk -= len(data)
if self._bytes_in_chunk == 0:
self._bytes_to_discard = 2
return Data(data=data)
chunk_end = True
else:
chunk_end = False
return Data(data=data, chunk_start=chunk_start, chunk_end=chunk_end)


class Http10Reader(object):
Expand Down
5 changes: 4 additions & 1 deletion h11/tests/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,15 @@ def receive_and_get(conn, data):
conn.receive_data(data)
return get_all_events(conn)

# Merges adjacent Data events, and converts payloads to bytestrings
# Merges adjacent Data events, converts payloads to bytestrings, and removes
# chunk boundaries.
def normalize_data_events(in_events):
out_events = []
for event in in_events:
if type(event) is Data:
event.data = bytes(event.data)
event.chunk_start = False
event.chunk_end = False
if out_events and type(out_events[-1]) is type(event) is Data:
out_events[-1].data += event.data
else:
Expand Down
60 changes: 55 additions & 5 deletions h11/tests/test_connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from .._state import *
from .._connection import (
_keep_alive, _body_framing,
Connection, NEED_DATA, PAUSED,
Connection, NEED_DATA, PAUSED
)

from .helpers import ConnectionPair, get_all_events, receive_and_get
Expand Down Expand Up @@ -145,23 +145,73 @@ def test_chunked():
Request(method="GET", target="/",
headers=[("Host", "example.com"),
("Transfer-Encoding", "chunked")]))
data = p.send(CLIENT, Data(data=b"1234567890"))
data = p.send(CLIENT,
Data(data=b"1234567890", chunk_start=True, chunk_end=True))
assert data == b"a\r\n1234567890\r\n"
data = p.send(CLIENT, Data(data=b"abcde"))
data = p.send(CLIENT,
Data(data=b"abcde", chunk_start=True, chunk_end=True))
assert data == b"5\r\nabcde\r\n"
data = p.send(CLIENT, EndOfMessage(headers=[("hello", "there")]))
assert data == b"0\r\nhello: there\r\n\r\n"

p.send(SERVER,
Response(status_code=200,
headers=[("Transfer-Encoding", "chunked")]))
p.send(SERVER, Data(data=b"54321"))
p.send(SERVER, Data(data=b"12345"))
p.send(SERVER, Data(data=b"54321", chunk_start=True, chunk_end=True))
p.send(SERVER, Data(data=b"12345", chunk_start=True, chunk_end=True))
p.send(SERVER, EndOfMessage())

for conn in p.conns:
assert conn.states == {CLIENT: DONE, SERVER: DONE}

def test_chunk_boundaries():
conn = Connection(our_role=SERVER)

request = (
b'POST / HTTP/1.1\r\n'
b'Host: example.com\r\n'
b'Transfer-Encoding: chunked\r\n'
b'\r\n'
)
conn.receive_data(request)
assert conn.next_event() == Request(
method="POST",
target="/",
headers=[("Host", "example.com"), ("Transfer-Encoding", "chunked")]
)
assert conn.next_event() is NEED_DATA

conn.receive_data(b'5\r\nhello\r\n')
assert conn.next_event() == Data(
data=b'hello', chunk_start=True, chunk_end=True
)

conn.receive_data(b'5\r\nhel')
assert conn.next_event() == Data(
data=b'hel', chunk_start=True, chunk_end=False
)

conn.receive_data(b'l')
assert conn.next_event() == Data(
data=b'l', chunk_start=False, chunk_end=False
)

conn.receive_data(b'o\r\n')
assert conn.next_event() == Data(
data=b'o', chunk_start=False, chunk_end=True
)

conn.receive_data(b'5\r\nhello')
assert conn.next_event() == Data(
data=b'hello', chunk_start=True, chunk_end=True
)

conn.receive_data(b'\r\n')
assert conn.next_event() == NEED_DATA

conn.receive_data(b'0\r\n\r\n')
assert conn.next_event() == EndOfMessage()

def test_client_talking_to_http10_server():
c = Connection(CLIENT)
c.send(Request(method="GET", target="/",
Expand Down