Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starlette behind reverse proxy, url_for does not return externally valid urls #604

Closed
narzeja opened this issue Aug 18, 2019 · 11 comments · Fixed by #699
Closed

Starlette behind reverse proxy, url_for does not return externally valid urls #604

narzeja opened this issue Aug 18, 2019 · 11 comments · Fixed by #699

Comments

@narzeja
Copy link

narzeja commented Aug 18, 2019

Suppose I have my application reverse proxied, such that http://my.domain/foo (e.g. Traefik or Nginx) has the upstream http://localhost:8000 (the Starlette app). url_for will generate urls relative to the upstream url. How do I make it produce urls relative to the external url (with prefix /foo)?

For other projects, e.g. Flask/Werkzeug, this can be handled by setting SCRIPT_NAME in the request environment (scope) if the 'X-Forwarded-Prefix' header is set. It does not look like there is an equivalent option in Starlette.

Am I missing something? Otherwise, consider this a feature request.

@narzeja
Copy link
Author

narzeja commented Aug 18, 2019

Forgive my haste, this is a possible duplicate of or at least related to #592

@tomchristie
Copy link
Member

See uvicorn's --root-path and --proxy-headers in order to configure mounted path, and host/scheme configuration.

@narzeja
Copy link
Author

narzeja commented Aug 29, 2019

I am aware of those configuration settings, the scheme and host works fine, but the resulting URL is still not correct.

Example:

from starlette.applications import Starlette
from starlette.responses import PlainTextResponse

async def index(request):
    return PlainTextResponse(request.url_for('index'))

app = Starlette()
app.add_route('/', index, methods=['GET'])

I use Traefik to load-balance and use the following rules to direct the incoming connections to the Starlette application: traefik.frontend.rule=Host:localhost; PathPrefixStrip:/foo/;.
I use --root-path /foo --proxy-headers in uvicorn.

Accessing http://localhost/foo/ gives the following response: http://localhost/, while the expected response should be http://localhost/foo/.

@sm-Fifteen
Copy link

The Uvicorn documentation says that --proxy-headers only adds the X-Forwarded-Proto (the protocol) and X-Forwarded-For (the client IP) headers to populate remote scheme/address info.

Looking at the proxy-headers middleware, it indeed only checks those two headers. X-Forwarded-Host (the original hostname, possibly the port as well?) isn't checked, so scope['server'] does not get updated to match the address the client is expected to try and access on subsequent requests (though url_for seems to use something else; changing it does not affect the resulting URL as it probably should).

scope.raw_path also does not get checked by any implementation of url_path_for in routing, so @narzeja's example wouldn't be affected by the use of --root-path.

@sm-Fifteen
Copy link

I looked into this further, the problem comes from [calling make_absolute_url][1], which uses the connexion's base URL. That base URL does correct itself to account for root_path, but then [make_absolute_url only copies the netloc][2] from that base URL instead of combining the two.

Also, while scope['server'] can be properly updated with a middleware like this (which I made as an attempt at a workaround):

class ProxyHeadersMiddleware2:
    def __init__(self, app, num_proxies=1):
        self.app = app
        self.num_proxies = num_proxies

    async def __call__(self, scope, receive, send):
        if scope["type"] in ("http", "websocket"):
            headers = dict(scope["headers"])

            if b"x-forwarded-host" in headers:
                x_forwarded_host = headers[b"x-forwarded-host"].decode("ascii")
                (host, sep, port) = x_forwarded_host.partition(":")
                if sep is None:
                    scope["server"] = (host, 0)
                else:
                    scope["server"] = (host, int(port))

        await self.app(scope, receive, send)

...it makes no difference as far as for_url is concerned since the base_url from earlier actually [gives priority to the Host HTTP header][3]
[1]:

return url_path.make_absolute_url(base_url=self.url)

[2]:
if self.host:
netloc = self.host
else:
netloc = base_url.netloc
return str(URL(scheme=scheme, netloc=netloc, path=str(self)))

[3]:
if host_header is not None:
url = f"{scheme}://{host_header}{path}"
elif server is None:
url = path
else:
host, port = server
default_port = {"http": 80, "https": 443, "ws": 80, "wss": 443}[scheme]
if port == default_port:
url = f"{scheme}://{host}{path}"
else:
url = f"{scheme}://{host}:{port}{path}"

@tomchristie
Copy link
Member

scope.raw_path also does not get checked by any implementation of url_path_for in routing, so @narzeja's example wouldn't be affected by the use of --root-path.

Indeed - good catch.

@sm-Fifteen
Copy link

@tomchristie: Excellent, though that doesn't fix setting the host/scheme configuration (the scheme appears to be preserved correctly, but the host isn't affected). --proxy-headers does nothing in regards to the netloc, and the config file from Uvicorn's documentation for running behind nginx just overwrites the Host header when running behind a reverse proxy instead, which works and I've seen plenty of other frameworks do it this way.

Your advice made it seem like preserving the user-facing netloc was possibly maybe part of the intended side effects, and if it was, then that should probably be raised on Uvicorn's side (unless that's something Starlette is expected to take care of, there doesn't seem to be a concensus on what the right option for this is). There's also the possibility of just serving absolute paths instead of URLs, which would circumvent the issue in most cases.

@tomchristie
Copy link
Member

Thanks, yup good follow up.

--proxy-headers does nothing in regards to the netloc, and the config file from Uvicorn's documentation for running behind nginx just overwrites the Host header when running behind a reverse proxy instead, which works and I've seen plenty of other frameworks do it this way.

So - I'd suggest opening an issue against Uvicorn there. As part of that it'd be helpful if you could review exactly what Gunicorn's handling behavior is. I think there's actually quite a range of not-entirely-standardized behaviours for what proxy headers are and aren't used.

We should get the common case sorted in Uvicorn (basically let's just stay in line with whatever Gunicorn does) and leave anything more complex/custom for custom middleware to handle. We probably also want to document all that in Uvicorn, including handling custom case.

There's also the possibility of just serving absolute paths instead of URLs, which would circumvent the issue in most cases.

Yeah, I think we may want to do that, or else at least review it.

I'm also fairly sure that we ought to be adding some API for keeping app and request context, so that you can call url_for from anywhere, without having to pass a request instance around anywhere it's needed.

@sm-Fifteen
Copy link

@tomchristie: I was going to post the following as a uvicorn issue, but as I'm typing this out, I realize that I have no idea how to wrap this up into a Uvicorn issue:

Document how proxy headers are/should be handled by Uvicorn and/or the application

Opened at @tomchristie's suggestion in #604. Part of that issue related to being unable to generate externally-valid URLs from behind a reverse-proxy, since most such proxies (as mandated by RFC7230 §5.4) change the Host header, and Starlette gives priority to that header when generating URLs.

"What authority component to use when generating URLs?" is something that most specification documents don't really address and is usually left to the proxy server configuration to spoof or the application developper to handle. The HTTP protocol spec has a list of steps on how to reconstruct the request URL, though it does not account for proxying at any point. That responsibility is de facto given to application servers (like --proxy-headers currently does for schema and client info), though how those proxy headers are to be interpreted and how they're passed by the CGI/WSGI/ASGI server to the application is entirely unstandardized.

Gunicorn takes no special measure to override any environment variable with the contents of the proxy headers in favor of following RFC 3875 (with which WSGI is supposed to be compatible). They instead advise that reverse proxies should be configured to setup any necessary X-Forwarded* headers and alter the Host header as needed, but otherwise seem to indicate that this is outside Gunicorn's responsibility and most likely the application's job to handle (uvicorn's NGINX doc does mainly the same, if a bit more implicitly).

(((Now what? Is that really a Uvicorn issue?)))

If Uvicorn is meant to be some kind of "ASGI drop-in replacement for Gunicorn" of sorts, then this would suggest that it should not handle proxy headers at all and that this responsibility should possibly be left up to the application framework or to some kind of middleware. The ASGI protocol says scope['server'] is supposed to behave like WSGI's SERVER_NAME combined with SERVER_PORT and has no additionnal channel to send this sort of "real client" information to the application.

I would definitely prefer this to be an HTTP server responsibility, but it looks like proxies (in regard to the application knowing what hostname it should use for URLs) are "somebody else's problem" everywhere you look.

@jonathanunderwood
Copy link

I am confused as to why this, and #592 , have both been closed, @tomchristie - do you regard this as something that should not be fixed in Starlette?

@sm-Fifteen
Copy link

I really just think Starlette shouldn't be trying to generate URLs with authority, here. The rule seems to be that, unless the authority you want to use is known to you in advance (such as having it stored in an environment variable or a config file), you should just specify the URL as an absolute path and leave it at that. It's what Flask's url_for does and what Django's HttpRequest.build_absolute_uri(location) does if you omit the location parameter. Now that root_path is properly acknowledged (see #690), URLs generated this way should be valid for a client making requests to a Starlette app from behind proxy.

Trying to automatically build URLs with authority using the routing headers like that is just going to lead to issues. One should still be able to build authority-form URLs, but it shouldn't be the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants