New HTTP 1.1 RFCs versus WSGI

Recently new HTTP 1.1 RFCs were published that obsolete the old HTTP 1.1 RFCs. They are extensively rewritten.

Unfortunately the WSGI PEP 3333 refers to something only made explicit in the old version of the RFCs, but which is much harder to find in the new versions of the RFCs. I thought I'd leave a report of my investigations here so that others who may run into this in the future can find it.

WSGI is a protocol that's like HTTP but isn't quite HTTP. In particular WSGI defines its own iterator-based way to send larger responses out in smaller parts. It therefore cannot deal with so-called "hop-by-hop" headers, which try to control this behavior on a HTTP level. The WSGI spec says a WSGI application must not generate such headers.

This is relevant when you're dealing with a WSGI-over-HTTP proxy. This is a special WSGI application that talks to an underlying HTTP server. It presents itself as a normal WSGI application.

The underlying HTTP server could very well be sending out stuff like such as Transfer-Encoding: chunked. The WSGI spec does not allow a WSGI application to send them out though, so a WSGI proxy must strip these headers out.

So what headers are to be stripped out? The WSGI spec refers to section 13.5.1 in now-obsolete RFC 2616.

This nicely lists hop-by-hop headers:

  • Connection

  • Keep-Alive

  • Proxy-Authenticate

  • Proxy-Authorization

  • TE

  • Trailers

  • Transfer-Encoding

  • Upgrade

That RFC also says:

"All other headers defined by HTTP/1.1 are end-to-end headers."

and then confusingly:

"Other hop-by-hop headers MUST be listed in a Connection header, (section 14.10) to be introduced into HTTP/1.1 (or later)."

which one is it, HTTP 1.1? I guess that's one of the reasons this text got rewritten.

In the new rewritten version of HTTP 1.1, this list is gone. Instead it specifies for some headers (such as TE and Upgrade) that these should be added to the Connection field. A HTTP proxy can then strip out the headers listed in Connection, and then also strip out Connection itself.

Confusingly, while the new RFC 7230 refers to the concept of 'hop-by-hop' early on, and also say this in the change notes in A.2:

"Also, "hop-by-hop" header fields are required to appear in the Connection header field; just because they're defined as hop- by-hop in this specification doesn't exempt them."

it doesn't actually say any headers are hop-by-hop anywhere else. Instead it mandates some headers should be added to Connection.

But wait: Transfer-Encoding is not to be listed in the Connection header, as it's not hop-by-hop. At least, not anymore. I've seen it described as 'hopX-by-hopY', but not in the RFC. This is, I think, because a HTTP proxy could let these through without having to remove them. But not for a WSGI over HTTP proxy: it MUST remove Transfer-Encoding, as WSGI applications have no such concept.

I think the WSGI PEP should be updated in terms of the new HTTP RFC. It should make explicit that some headers such as Transfer-Encoding must not be specified by a WSGI app, and that no headers that must be listed in Connection can be specified by a WSGI app, or something like that.

Relevant mailing list thread:

http://lists.w3.org/Archives/Public/ietf-http-wg/2014JulSep/thread.html#msg1710

Against "contrib"

It's pretty common for an open source project to have a "contrib" directory as part of its project structure. This contains useful code donated to the project by outsiders. It seems innocuous. A contrib section, why not?

I don't like contrib. A contrib directory gives the signal that "yes, we carry this source code around, but it's not really part of our project". What does that mean? Why is it even part of your project at all then? Why isn't this code distributed in library form instead? I'd much prefer the project to be smaller instead, as in that case I wouldn't have to worry about the contrib code at all.

Perhaps in the case of your project, placing code in contrib doesn't really mean "it's not really part of our project". Perhaps the code in contrib is meant to be a fully supported part of project's codebase. If so, why use the name "contrib" at all? It doesn't signal anything functional -- it only signals something about origins, which is why people should suspect any claim that it's a fully integral part of the project. Projects, instead of dumping something in contrib, just put that code in its appropriate place and really own it.

Arguments for contrib

One argument for a contrib section is that by placing code there, the tests are automatically run for it each time you run the tests in the core code. This way a project is in a position to fix obvious breakages in this code before release.

There's a problem with this approach: more subtle breakages run the risk of being undetected, and nobody is clearly in charge of guarding against that, because the code isn't really owned by the project or the contributor anymore. It's in this weird contrib half-way house.

Besides, we have plenty of experience as an open source community with developing extension code that lives outside of a project. Making sure extensions don't break and get fixed when they do requires communication between core library authors and extension authors. I think it's mostly an illusion that by placing the code in contrib you could do away with such communication -- if a project really wants to do away with communication, really own the code.

Placing code in contrib is not a substitute for communication.

That's not to say the current infrastructure cannot be improved to help communication. For instance, in the Python world the devpi project is exploring ways to automatically run the tests for dependent projects to see whether you caused any breakage in them.

Another argument for a contrib section has to do with discovery. As a user of your project I can look through contrib for anything useful there. I don't have to go and google for it instead. Of course googling is really easy anyway, but...

If you want to make discovery easy, then add a list of useful extensions to your project to the project's documentation. Many projects with a contrib directory do this anyway. But that already takes care of discovery; no reason to add the code to "contrib".

And again, infrastructure can to help support this -- it is useful to be able to discover what projects depend on a project. Linux package managers generally can tell you this, but I can see how language-specific ecosystems can offer more support for this too. For a Python specific example, it would be useful if PyPI had an easy way to discover all projects that depend on another one.

Effects on contribution

As an open source project developer you should want to attract contributions to your project. When you add code to "contrib", you tell a contributor "your contribution is not a full and equal part of this project". That's not a great way to expand your project's list of core contributors...

And you are a new contributor who wants to improve something in the contrib of a project, who do you even talk to? You might be worried that the project owner will say: sorry, that code is in contrib, I don't care about improving it. Since people are less confident that the project even cares about code in "contrib", that discourages them from trying to contribute to that code

Summary

Don't add code to a "contrib" section of your project. "contrib", paradoxically, can have a chilling effect on contribution. Either maintain that code externally entirely, or make your project really own that code.

On Naming In Open Source

Here are some stories on how you can go wrong with naming, especially in open source software.

Easy

Don't use the name "easy" or "simple" in your software as it won't be and people will make fun of it.

Background

People tend to want to use the word 'easy' or 'simple' when things really are not, to describe a facade. They want to paper over immense complexity. Inevitably the facade will be a leaky abstraction, and developers using the software are exposed to it. And now you named it 'easy', when it's anything but not. Just don't give in to the temptation in the first place, and people won't make fun of it.

Examples

easy_install is a Python tool to easily and automatically install Python packages, similar to JavaScript npm or Ruby gems. pip is a more popular tool these days that does the same. easy_install hides, among many other complicated things, a full-fledged web scraper that follows links onto arbitrary websites to find packages. It's "easy" until it fails, and it will fail at one point or another.

SimpleItem is an infamous base class in Zope 2 that pulls in just about every aspect of Zope 2 as mixin classes. It's supposed to make it easy to create a new content type for Zope. The amount of methods made available is truly intimidating and anything but simple.

Demo

Don't use the word "demo" or "sample" in your main codebase or people will depend on it and you will be stuck with it forever.

Background

It's tempting in some library or framework consisting of many parts to want to expose an integrated set of pieces, just as an example, within that codebase itself. Real use of it will of course have the developers integrating those pieces themselves. Except they won't, and now you have people using Sample stuff in real world code.

The word Sample or Demo is fine if the entire codebase is a demo, but it's not fine as part of a larger codebase.

Examples

SampleContainer was a part of Zope 3 that serves as the base class of most actual container subclasses in real world code. It was just supposed to demonstrate how to do the integration.

Rewrite

Don't reuse the name of software for an incompatible rewrite, unless you want people to be confused about it.

Background

Your software has a big installed base. But it's not perfect. You decide to create a new, incompatible version, without a clear upgrade path. Perhaps you handwave the upgrade path "until later", but that then never happens.

Just name the new version something else. Because the clear upgrade path may never materialize, and people will be confused anyway. They will find documentation and examples for the old system if they search for the new one, and vice versa. Spare your user base that confusion.

The temptation to do this is great; you want to benefit from popularity of the name of the old system and this way attract users to the shiny new system. But that's exactly the situation where doing this is most confusing.

Examples

Zope 3: there was already a very popular Zope 2 around, and then we decide to completely rewrite it and named it "Zope 3". Some kind of upgrade path was promised but conveniently handwaved. Immense confusion arose. We then landed pieces of Zope 3 in the old Zope 2 codebase, and it took years to resolve all the confusion.

Company name

If you want a open source community, don't name the software after your company, or your company after the software.

Background

If you have a piece of open source software and you want an open source community of developers for it, then don't name it after your company. You may love your company, but outside developers get a clear indication that "the Acme Platform" is something that is developed by Acme. They know that as outside developers, they will never gain as much influence on the development of that software as developers working at Acme. So they just don't contribute. They go to other open source software that isn't so clearly allied to a single business and contribute there. And you are left to wonder why developers are not attracted to work on your software.

Similarly, you may have great success with an open source project and now want to name your own company after it. That sends a powerful signal of ownership to other stakeholders, and may deter them from contributing.

Of course naming is only a part of what makes an open source project look like something a developer can safely contribute to. But if you get the naming bit wrong, it's hard to get the rest right.

Add the potential entanglement into trademark politics on top of it, and just decide not to do it.

Examples

Examples omitted so I won't get into trouble with anyone.

My visit to EuroPython 2014

I had a fun time at EuroPython 2014 in Berlin last week. It was a very well organized conference and I enjoyed meeting old friends again as well as meeting new people. Before I went I was a bit worried with the amount of attendees it'd feel too massive; I had that experience at a PyCon in the US a few years ago. But I was pleasantly surprised it didn't -- it felt like a smaller conference, and I liked it.

Another positive thing that stood out was a larger diversity; there seemed to be more people from central and eastern Europe there than before, and most of all, there were more women. It was underscored by a 13 year old girl giving a lightning talk -- that was just not going to happen at EuroPython 5 years ago.

This is a very positive trend and I hope it continues. I know it takes a lot of work on the part of the organizers to get this far.

I gave a talk at EuroPython myself this year, and I think it went well:

Morepath 0.4.1 released (with Python 3 fixes)

I just released Morepath 0.4.1. This fixes a regression with Python 3 compatibility and has a few other minor tweaks to bring test coverage back up to 100%.

I had broken Python 3 support in Morepath 0.4. I'm still not in the habit of running 'tox' before a release, so I find out about these problems too late.

I'll go into a bit of detail about this issue, as it's a mildly amusing example of writing Python code being more complicated than it should be.

Morepath 0.4 broke in Python 3 because I introduced a metaclass for the morepath.App class. I usually avoid metaclasses as they are a source of unpredictability and complexity, but the best solution I saw here was one. It's a very limited one.

One task of the metaclass is to attach to the class with Venusian. Venusian is a library that lets you write decorators that don't execute during import time but later. This is nice as import time side effects can be a source of trouble.

Venusian also lets you attach a callback to a Python object (such as a class) outside of a decorator. That's what I was doing; attaching to a class, in my metaclass.

Venusian determines in what context the decorator was called, such as module-level and class-level, so you can use that later. For this it inspects the Python stack frame of its caller.

My first attempt to make the metaclass work in Python 3 was to use the with_metaclass functionality from the future compatibility layer. I am using this library anyway in Reg, which is a dependency of Morepath, so using it would not introduce a new dependency for Morepath.

Unfortunately after making that change my tests broke in both Python 2 and Python 3. That's not an improvement over having the tests being broken in just Python 2!

It appears that with_metaclass introduces a new stack frame into the mix somewhere, which breaks Venusian's assumptions. Now Venusian's attach has a depth argument to determine where in the stack to check, so I increased the stack depth by one and ran the tests again. Less tests broke than before, but quite a few still did. I think the cause is that the stack depth of with_metaclass is just not consistent for whatever reason.

Digging around in the future package I saw it includes a copy of six, another compatibility layer project. six has a name close to my heart -- long ago I originated the Five project for compatibility between Zope 2 and Zope 3.

That copy of six had another version of with_metaclass. I tried using future.util.six.with_metaclass, and hey, it all started working suddenly. All tests passed, in both Python 2 and Python 3. Yay!

Okay then, I figured, I don't want to depend on a copy of six that just happens to be lying about in future. It's not part of its public API as far as I understand. So I figured I should introduce a new dependency for Morepath after all, on six. It's not a big deal; Morepath's testing dependencies include WebTest, and this already has a dependency on six.

But when I pulled in six proper, I got a newer version of it than the one in future.util.six, and it caused the same test breakages as with future. Argh!

So I copied the code from old-six into Morepath's compat module. It's a two-liner anyway. It works for me. Morepath 0.4.1 done and released.

But I don't know why six had to change its version, and why future's version is different. It worries me -- they probably have good reasons. Are those reasons going to break my code at some point in the future?

Being a responsible open source citizen, I left bug reports about my experiences in the six and future issue trackers:

https://bitbucket.org/gutworth/six/issue/83/with_meta-and-stack-frame-issues#comment-11125428

https://github.com/PythonCharmers/python-future/issues/75

I much prefer writing Python code. Polyglot is an inferior programming language as it introduces complexities like this. But Polyglot is what we got.

Morepath 0.4 and breaking changes

I've just released Morepath 0.4!

Morepath 0.4 is a Python web framework that's small ("micro") and packs a lot of power. There are a lot of facilities for application reuse. And as opposed to most web frameworks, it actually has some intelligence about generating hyperlinks to objects.

Morepath 0.4 has a breaking change to the way application reuse works. Don't worry, you can fix your code by making a few minor changes. In short, Morepath application objects are now classes, not instances, and you can instantiate this class to get a WSGI object. See the CHANGES for a lot of details on what happened and what you need to do.

The big win is that application reuse in Morepath has become Python subclassing, and that making a WSGI application (even a parameterized one) is just instantiating the class.

The other win is that Morepath gained even more extensibility features, namely the ability for Morepath extension to introduce new Morepath directives (the decorators you see everywhere in Morepath examples). But I can't talk too much about that until I document them properly.

Along with the new Morepath, I've also made the initial release of BowerStatic (announcement). BowerStatic is the WSGI framework that lets you easily include bower-installed resources in your web page and do the right thing with caching (forever, thank you, but on a separate URL for each version).

How does that relate to Morepath, you may ask? Well, today I've also released the Morepath integration for BowerStatic, more.static. I've described in the Morepath documentation what to do to get it working in your Morepath project. The reason Morepath 0.4 had the breaking change was in part to support more.static, which needed the ability to introduce a new Morepath directive among other things.

Announcing BowerStatic

I've been working on something new these last few weeks that I'd like to share with you: BowerStatic.

BowerStatic is Python WSGI framework for easily serving and including components that you manage by using Bower.

Let's unpack that:

  • BowerStatic serves static resources (JS, CSS, etc) for you. It makes sure those resources are cached when they can be, and that the cache is automatically busted when you change them.

  • BowerStatic can serve third-party components you can easily install using bower install. It can also serve locally developed components for you.

  • BowerStatic can automatically create inclusions for static resources in your web page; it inserts <script> and <link> tags for you automatically

  • It is a WSGI-based framework, meaning that you should be able to integrate it easily into any WSGI-based Python web application. So if you use Pyramid, Flask or Morepath: BowerStatic is for you!

Look at the BowerStatic website for much more information about all this. And check out the source. Contributions are welcome!

BowerStatic was heavily inspired by another project that I helped create: Fanstatic. For more information on how things led up to BowerStatic, read the history section in the docs.

I am not done yet. I still need to work on some of the cache busting behavior, improve error reporting. But soon it will be time for the first release.

Deeper integration with the Morepath web framework is also coming. I have a good idea of what know what it will look like. Look out for more.static soon!

Morepath 0.3 released!

Today I've released Morepath 0.3!

Morepath is a Python web framework that lets you write simple, flexible code for the modern web, and makes linking to things as easy as it should be.

New in Morepath 0.3 is the ability to write path directives that absorb the rest of the path. This is useful when you want to integrate a web server with a frontend that does its own client-side routing using the HTML 5 history API. In that case the server-side code needs an easy way to absorb all sub-paths so it can pass them to the frontend router.

Another feature is the ability to mark views as internal. This allows you to write a view that is not published on the web but is for reuse within other views.

Besides this Morepath 0.3 contains a number of bugfixes and documentation improvements. See CHANGES for more details.

If you like Morepath, please get in touch with me, either through the Morepath issue tracker or the #morepath IRC channel on Freenode. Let me know if there's interest in starting a mailing list too!

Morepath 0.2

I've just released Morepath 0.2! Morepath is your friendly neighborhood Python web microframework with super powers.

Major changes include:

And various smaller tweaks. You can also read the changelog.

What makes Morepath special? You can read about its superpowers and read how it compares to other web frameworks, but here are some highlights:

  • Route to model, then look up views. This leads to better link generation, better generic views, better reusable code.

  • Have apps that can extend other apps, combine apps together. Extend and override anything, even Morepath behavior itself, but don't affect other apps in the same runtime.

It's lots of power contained in a small codebase.

(If you followed the Morepath link above, ignore the 0.1 in the title bar there: those are the 0.2 docs, not 0.1. I've now fixed the doc building configuration to include the version number from the package itself.)

Morepath Python 3 support

Thanks to an awesome contribution by Alec Munro, Morepath, your friendly neighborhood Python micro framework with super powers, has just gained Python 3 support!

Developing something new while juggling the complexities of Python 2 and Python 3 in my head at the same time was not something I wanted to do -- I wanted to focus on my actual goals, which was to create a great web framework.

So then I had to pick one version of Python or the other. Since my direct customer use cases involves integrating it with Python 2 code, picking Python 2 was the obvious choice.

But now that Morepath has taken shape, taking on the extra complexity of supporting Python 3 is doable. The Morepath test coverage is quite comprehensive, and I had already configured tox (so I could test it with PyPy). Adding Python 3.4 meant patiently going through all the code and adjusting it, which is what Alec did. Thank you Alec, this is great!

Morepath's dependencies (such as WebOb) already had Python 3 support, so credit goes to their maintainers too (thanks Chris McDonough in particular!). This includes the Reg library, which I polyglotted to support Python 3 myself a few months ago.

All this doesn't take away from my opinion that we need to do more to support the large Python 2 application codebases. They are much harder to transition to Python 3 than well-tested libraries and frameworks, for which the path was cleared in the last 5 years or so.

[update: this is still in git; the Morepath 0.1 release is Python 2 only. But it will be included in the upcoming Morepath 0.2 release]