Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host header position causes certain sites to not respond #3265

Closed
Hemable1 opened this issue Sep 12, 2018 · 8 comments
Closed

Host header position causes certain sites to not respond #3265

Hemable1 opened this issue Sep 12, 2018 · 8 comments
Labels
client enhancement good first issue Good for newcomers Hacktoberfest We think it's good for https://hacktoberfest.digitalocean.com/ outdated

Comments

@Hemable1
Copy link

Outbound requests add the Host header last instead of first which causes an issue fetching certain sites. Normally this shouldn't matter, however I'm coming across servers that won't respond unless it's defined first as it is in a browser.

Example:
This is through a browser and successfully responds:

GET / HTTP/1.1
Host: www.accuweather.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: */*
Accept-Encoding: gzip, deflate
Connection: close

This is through aiohttp and does not respond (notice the Host header position):

GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: */*
Accept-Encoding: gzip, deflate
Host: www.accuweather.com
Connection: close

To replicate this behavior use the the following code and notice it will time out.

import asyncio
import aiohttp

async def req(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'
    }
    timeout = aiohttp.ClientTimeout(total=10)

    async with aiohttp.ClientSession(timeout=timeout, headers=headers) as session:
        async with session.get(url, ssl=False) as resp:
            print(resp.status)
            print(await resp.text())

loop = asyncio.get_event_loop()
loop.run_until_complete(req("https://www.accuweather.com"))

Tested on:
Windows 7x64
Python 3.7.0
aiohttp 3.3.2

@asvetlov
Copy link
Member

I'm sorry but the server is not correct.

We can change an order of request headers construction to put Host first but in general there is no guarantee to satisfy other weird servers requirements.

@Hemable1
Copy link
Author

Yes the server is incorrect, however this should be developed to replicate a browser as closely as possible. There are a few of sites that fall into this category, https://www.nasdaq.com is another one. No browsers I've tried had any issues visiting these sites.

@webknjaz
Copy link
Member

😮

@asvetlov
Copy link
Member

asvetlov commented Oct 2, 2018

The patch is pretty easy.

  1. Update Host before processing default headers in client_reqrep.ClientRequest.update_auto_headers.
  2. Add a functional test to make sure that the order is saved. The test should check headers order on server side
  3. I doubt if we need support Host header overriding. Raise ValueError is Host key is passed to client session default headers.

The PR is very appreciated.

@asvetlov asvetlov added the good first issue Good for newcomers label Oct 2, 2018
@webknjaz webknjaz added Hacktoberfest We think it's good for https://hacktoberfest.digitalocean.com/ and removed Hacktoberfest We think it's good for https://hacktoberfest.digitalocean.com/ labels Oct 2, 2018
@GenericStudent
Copy link
Contributor

@asvetlov, I've created a PR with initial attempt at this. I have moved the Host header adding under update_auto_headers to be before the default headers. This works and I have added a test for it. But I don't feel the test is optimal.

How do I check / test the view / state of the headers as seen from the server perspective?

Also I'm not sure what you meant by the third point.

@asvetlov
Copy link
Member

Regarding tests: please take a look on https://github.com/aio-libs/aiohttp/blob/master/tests/test_client_functional.py
On server side request.headers has request's headers in the same order as they are sent.

The third point is about client = aiohttp.ClientSession({'host': 'overridden.host.com'}). It is weird, please don't care about

@asvetlov
Copy link
Member

Fixed by #3342

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
client enhancement good first issue Good for newcomers Hacktoberfest We think it's good for https://hacktoberfest.digitalocean.com/ outdated
Projects
None yet
Development

No branches or pull requests

4 participants