New HTTP 1.1 RFCs versus WSGI


Posted: 2014-08-19 12:35   |   Source   |  More posts about planetpython python wsgi http

Recently new HTTP 1.1 RFCs were published that obsolete the old HTTP 1.1 RFCs. They are extensively rewritten.

Unfortunately the WSGI PEP 3333 refers to something only made explicit in the old version of the RFCs, but which is much harder to find in the new versions of the RFCs. I thought I'd leave a report of my investigations here so that others who may run into this in the future can find it.

WSGI is a protocol that's like HTTP but isn't quite HTTP. In particular WSGI defines its own iterator-based way to send larger responses out in smaller parts. It therefore cannot deal with so-called "hop-by-hop" headers, which try to control this behavior on a HTTP level. The WSGI spec says a WSGI application must not generate such headers.

This is relevant when you're dealing with a WSGI-over-HTTP proxy. This is a special WSGI application that talks to an underlying HTTP server. It presents itself as a normal WSGI application.

The underlying HTTP server could very well be sending out stuff like such as Transfer-Encoding: chunked. The WSGI spec does not allow a WSGI application to send them out though, so a WSGI proxy must strip these headers out.

So what headers are to be stripped out? The WSGI spec refers to section 13.5.1 in now-obsolete RFC 2616.

This nicely lists hop-by-hop headers:

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • TE
  • Trailers
  • Transfer-Encoding
  • Upgrade

That RFC also says:

"All other headers defined by HTTP/1.1 are end-to-end headers."

and then confusingly:

"Other hop-by-hop headers MUST be listed in a Connection header, (section 14.10) to be introduced into HTTP/1.1 (or later)."

which one is it, HTTP 1.1? I guess that's one of the reasons this text got rewritten.

In the new rewritten version of HTTP 1.1, this list is gone. Instead it specifies for some headers (such as TE and Upgrade) that these should be added to the Connection field. A HTTP proxy can then strip out the headers listed in Connection, and then also strip out Connection itself.

Confusingly, while the new RFC 7230 refers to the concept of 'hop-by-hop' early on, and also say this in the change notes in A.2:

"Also, "hop-by-hop" header fields are required to appear in the Connection header field; just because they're defined as hop- by-hop in this specification doesn't exempt them."

it doesn't actually say any headers are hop-by-hop anywhere else. Instead it mandates some headers should be added to Connection.

But wait: Transfer-Encoding is not to be listed in the Connection header, as it's not hop-by-hop. At least, not anymore. I've seen it described as 'hopX-by-hopY', but not in the RFC. This is, I think, because a HTTP proxy could let these through without having to remove them. But not for a WSGI over HTTP proxy: it MUST remove Transfer-Encoding, as WSGI applications have no such concept.

I think the WSGI PEP should be updated in terms of the new HTTP RFC. It should make explicit that some headers such as Transfer-Encoding must not be specified by a WSGI app, and that no headers that must be listed in Connection can be specified by a WSGI app, or something like that.

Relevant mailing list thread:

http://lists.w3.org/Archives/Public/ietf-http-wg/2014JulSep/thread.html#msg1710

Contents © 2005-2014 Martijn Faassen | Twitter | Github | Gittip