Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please help with Error: RemoteProtocolError("illegal chunk header: bytearray(b'167 \r\n')") #134

Closed
drschlaumeier opened this issue Jul 20, 2022 · 12 comments

Comments

@drschlaumeier
Copy link

My Home Assistant version: 2022.7.5

Layout-card version (FROM BROWSER CONSOLE): 2.4.2

Newest Version of Multiscape installed

What I am doing:

I try to read my ESP32 HTML sensor.
HTML Code is as following:

<html lang="en">
	<head>
		<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
		<meta HTTP-EQUIV="refresh" CONTENT="30"/>
	</head>

	<body>
		AQUARIUM Temperatur: 25.00 &deg;C <br>Battery: 4.10V, 100%, -<br>BootTime: CEST, 18.Jul 2022, Mon, 18:44:54, NowTime: CEST, 20.Jul 2022, Wed, 16:55:14
 	</body>
 </html>

my multicraper is like this:

    scan_interval: 60
    verify_ssl: false
    log_response: true
    parser: html.parser
    sensor:
    - unique_id: body_temperature
      name: HTML MultiScrape Temperature
      select: "body"
      value_template: '{{ value.split(": ")[1].split(" °C")[0] }}'
     unit_of_measurement: "°C"

What I expected to happen:

I want to get the temperature value 25.00

What happened instead:

I get following error code but not sure what this means. Ist it a bug or any other problem? Hopefully someone can help me here:

2022-07-20 17:04:46 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Error executing get request to url: http://192.168.1.114/.
Error message:
RemoteProtocolError("illegal chunk header: bytearray(b'167 \r\n')")
2022-07-20 17:04:46 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Updating failed with exception: illegal chunk header: bytearray(b'167 \r\n')
@danieldotnl
Copy link
Owner

Could you post the contents of the response_headers log file?

@drschlaumeier
Copy link
Author

Sorry, not possible because nothing os written into the response directory. See also error message:

Scraper_noname_0 # Error executing get request to url: http://192.168.1.114/.

@drschlaumeier
Copy link
Author

Do you have any suggestion to debug more?

@drschlaumeier
Copy link
Author

Hmm, I did some more reading about chunked http encoding. I'm not sure but it's maybe some issue with HA and this is using NGINX etc? I have installed HA on top of Debian.

If I run a python script on debian console, I can extract temperatur value without any problems...and probably no chunks encoding is used. So command line sensor is working. I will do some packet capture to check the difference.. but not sure if it will help for solution...

@danieldotnl
Copy link
Owner

I suspect indeed an issue with the headers being returned by your ESP32. Try the request with curl or postman, and inspect the headers.

@drschlaumeier
Copy link
Author

drschlaumeier commented Jul 21, 2022

I did some package capturing with tshark and analyzed with wireshark.
Thats really odd:

Thats the call from HA & multiscrape, where I get the illegal chunk error:

GET / HTTP/1.1
Host: 192.168.1.114
Accept: */*
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
User-Agent: HomeAssistant/2022.7.6 httpx/0.23.0 Python/3.10
--------------------------------------
HTTP/1.1 200 OK
Content-Type: text/html
Connection: close
Accept-Ranges: none
Transfer-Encoding: chunked

167 

	
	<!DOCTYPE html>
	<html lang="en">
		<head>
			<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
			<meta HTTP-EQUIV="refresh" CONTENT="30"/>
		</head>
	
		<body>
			AQUARIUM Temperatur: 23.44 &deg;C <br>Battery: 4.10V, 100%, -<br>BootTime: CEST, 21.Jul 2022, Thu, 12:53:07, NowTime: CEST, 21.Jul 2022, Thu, 13:51:58
		</body>
	</html>

... and thats the call from my python script on command line sensor which works fine:

GET / HTTP/1.1
Host: 192.168.1.114
User-Agent: python-requests/2.25.1
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
----------------------------------------------------
HTTP/1.1 200 OK
Content-Type: text/html
Connection: close
Accept-Ranges: none
Transfer-Encoding: chunked

167 

	
	<!DOCTYPE html>
	<html lang="en">
		<head>
			<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
			<meta HTTP-EQUIV="refresh" CONTENT="30"/>
		</head>
	
		<body>
			AQUARIUM Temperatur: 23.50 &deg;C <br>Battery: 4.10V, 100%, +<br>BootTime: CEST, 21.Jul 2022, Thu, 12:53:07, NowTime: CEST, 21.Jul 2022, Thu, 14:13:20
		</body>
	</html>

As you see, both have the 167 in between the response. So, I'm not sure, why HA scrape has problems with that.
Here is also my simple python code for the command line sensor. I hope you have an idea what went wrong ...
And, I checked the ESP32 code ...there is no 167 ...I think thats because of chunk adds it....

#!/usr/bin/python3

import lxml.html as lh
import requests

url='http://192.168.1.114'
page = requests.get(url)
#print(page.content)

tree = lh.fromstring(page.content)
body = tree.xpath('body')
bodytxt = body[0].text.strip()
aquatemp = bodytxt.split(': ')[1].split(' °C')[0].strip()
print(aquatemp)

@drschlaumeier
Copy link
Author

...and one more comment:
The 167 seems to be the normal HTTP chunk response added to the response and indicates "chunk size: 359 octets" its a binary field 31 36 37 20 0D 0A ... unfortunately the first three byte turn into ASCII code...which makes trouble in ha-scrape...

Here the entire frame code from wireshark ... look at " HTTP chunked response" part

Frame 7: 62 bytes on wire (496 bits), 62 bytes captured (496 bits) on interface enp2s0, id 0
Ethernet II, Src: Espressi_48:3f:88 (d8:a0:1d:48:3f:88), Dst: PCPartne_81:2a:b3 (00:01:2e:81:2a:b3)
Internet Protocol Version 4, Src: 192.168.1.114 (192.168.1.114), Dst: 192.168.1.30 (192.168.1.30)
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 48
    Identification: 0x050e (1294)
    Flags: 0x00
    Fragment Offset: 0
    Time to Live: 255
    Protocol: TCP (6)
    Header Checksum: 0x32d9 [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 192.168.1.114 (192.168.1.114)
    Destination Address: 192.168.1.30 (192.168.1.30)
Transmission Control Protocol, Src Port: http (80), Dst Port: 51934 (51934), Seq: 480, Ack: 145, Len: 8
    Source Port: http (80)
    Destination Port: 51934 (51934)
    [Stream index: 0]
    [TCP Segment Len: 8]
    Sequence Number: 480    (relative sequence number)
    Sequence Number (raw): 407611
    [Next Sequence Number: 488    (relative sequence number)]
    Acknowledgment Number: 145    (relative ack number)
    Acknowledgment number (raw): 1405156524
    0101 .... = Header Length: 20 bytes (5)
    Flags: 0x018 (PSH, ACK)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...1 .... = Acknowledgment: Set
        .... .... 1... = Push: Set
        .... .... .0.. = Reset: Not set
        .... .... ..0. = Syn: Not set
        .... .... ...0 = Fin: Not set
        [TCP Flags: ·······AP···]
    Window: 5600
    [Calculated window size: 5600]
    [Window size scaling factor: -2 (no window scaling used)]
    Checksum: 0x57d2 [correct]
    [Checksum Status: Good]
    [Calculated Checksum: 0x57d2]
    Urgent Pointer: 0
    [SEQ/ACK analysis]
        [iRTT: 0.037546106 seconds]
        [Bytes in flight: 8]
        [Bytes sent since last PSH flag: 8]
    [Timestamps]
        [Time since first frame in this TCP stream: 0.054896129 seconds]
        [Time since previous frame in this TCP stream: 0.003846908 seconds]
    TCP payload (8 bytes)
    TCP segment data (8 bytes)
[2 Reassembled TCP Segments (487 bytes): #5(479), #7(8)]
    [Frame: 5, payload: 0-478 (479 bytes)]
    [Frame: 7, payload: 479-486 (8 bytes)]
    [Segment count: 2]
    [Reassembled TCP length: 487]
    [Reassembled TCP Data: 485454502f312e3120323030204f4b0d0a436f6e74656e742d547970653a20746578742f…]
Hypertext Transfer Protocol
    HTTP/1.1 200 OK\r\n
        [Expert Info (Chat/Sequence): HTTP/1.1 200 OK\r\n]
            [HTTP/1.1 200 OK\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Response Version: HTTP/1.1
        Status Code: 200
        [Status Code Description: OK]
        Response Phrase: OK
    Content-Type: text/html\r\n
    Connection: close\r\n
    Accept-Ranges: none\r\n
    Transfer-Encoding: chunked\r\n
    \r\n
    [HTTP response 1/1]
    [Time since request: 0.017143793 seconds]
    [Request in frame: 4]
    [Request URI: http://192.168.1.114/]
    HTTP chunked response
        Data chunk (359 octets)
            Chunk size: 359 octets
            Data (359 bytes)
                Data: 0a090a093c21444f43545950452068746d6c3e0a093c68746d6c206c616e673d22656e22…
                [Length: 359]
            Chunk boundary: 0d0a
        End of chunked encoding
        \r\n
    File Data: 359 bytes
Line-based text data: text/html (14 lines)
    \n
    \t\n
    \t<!DOCTYPE html>\n
    \t<html lang="en">\n
    \t\t<head>\n
    \t\t\t<meta http-equiv="content-type" content="text/html; charset=utf-8"/>\n
    \t\t\t<meta HTTP-EQUIV="refresh" CONTENT="30"/>\n
    \t\t</head>\n
    \t\n
    \t\t<body>\n
    \t\t\tAQUARIUM Temperatur: 23.50 &deg;C <br>Battery: 4.10V, 100%, +<br>BootTime: CEST, 21.Jul 2022, Thu, 12:53:07, NowTime: CEST, 21.Jul 2022, Thu, 14:13:20\n
    \t\t</body>\n
    \t</html>\n
    \t

@drschlaumeier
Copy link
Author

...and again ... it seems a problem with httpx !!!! which is used in HA Scrape component.
Simple python script produces same error message:

import httpx

url='http://192.168.1.114'
response = httpx.get(url)

produces error:

    response = httpx.get(url)
  File "/usr/local/lib/python3.9/dist-packages/httpx/_api.py", line 189, in get
    return request(
  File "/usr/local/lib/python3.9/dist-packages/httpx/_api.py", line 100, in request
    return client.request(
  File "/usr/local/lib/python3.9/dist-packages/httpx/_client.py", line 815, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.9/dist-packages/httpx/_client.py", line 916, in send
    raise exc
  File "/usr/local/lib/python3.9/dist-packages/httpx/_client.py", line 910, in send
    response.read()
  File "/usr/local/lib/python3.9/dist-packages/httpx/_models.py", line 792, in read
    self._content = b"".join(self.iter_bytes())
  File "/usr/local/lib/python3.9/dist-packages/httpx/_models.py", line 810, in iter_bytes
    for raw_bytes in self.iter_raw():
  File "/usr/local/lib/python3.9/dist-packages/httpx/_models.py", line 868, in iter_raw
    for raw_stream_bytes in self.stream:
  File "/usr/local/lib/python3.9/dist-packages/httpx/_client.py", line 123, in __iter__
    for chunk in self._stream:
  File "/usr/local/lib/python3.9/dist-packages/httpx/_transports/default.py", line 105, in __iter__
    yield part
  File "/usr/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.9/dist-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.RemoteProtocolError: illegal chunk header: bytearray(b'167 \r\n')

Should I move to solve the issue with python httpx developers or do you have any workaround for this in ha-multiscrape?

Thankx

@danieldotnl
Copy link
Owner

Multiscrape uses httpx instead of requests.
I think this is the problem: encode/httpx#1561
(which is marked as a duplicate of encode/httpx#1363

@drschlaumeier
Copy link
Author

Thanks but I think the below is the right discussion in httpx since its not "illegal header line" but "illegal chunk header"
encode/httpx#1735
I left a comment there and hope that someone will answer and help me to fix it...

@danieldotnl
Copy link
Owner

Good, I'll close issue for now, since it doens't seem to be a multiscrape issue, and replacing httpx is not really an option.

@rohrsh
Copy link

rohrsh commented Feb 15, 2023

For the record I also have trouble scraping a web page from my solar inverter (see ECU-3 link above.)

Logs: " # Updating failed with exception: illegal header line: bytearray(b'debug9') "

Not sure if there are any new fixes to upstream packages that we can pick up here?

cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants