The Call of Python 2.8

Introduction

Guido recently felt he needed to re-empathize that there will be no Python 2.8. The Python developers have been very clear for years that there will never be a Python 2.8.

http://legacy.python.org/dev/peps/pep-0404/

At the Python language summit there were calls for a Python 2.8. Guido reports:

We (I) still don't want to do a 2.8 release, and I don't want to accelerate 3.5, but I do think we should make things better for people who have to straddle Python 2 and 3 in a single codebase, by developing more tools, and by security and possibly installer updates to 2.7 (PEP 466).

At his keynote at PyCon, he said it again:

/guido_no.jpg

A very good thing happened to recognize the reality that Python 2.7 is still massively popular: the end of life date for Python 2.7 was changed by Guido to 2020 (it was 2015). In the same change he felt he should repeat there will be no Python 2.8:

+There will be no Python 2.8.

The call for Python 2.8 is strong. Even Guido feels it!

People talk about a Python 2.8, and are for it, or, like Guido, against it, but rarely talk about what it should be. So let's actually have that conversation.

Why talk about something that will never be? Because we can't call for something, nor reject something if we don't know what it is.

What is Python 2.8 for?

Python 2.8 could be different things. It could be a Python 2.x release that reduces some pain points and adds features for Python 2 developers independent from what's going on in Python 3. It makes sense, really: we haven't had a new Python 2 feature release since 2010 now. Those of us with existing large Python 2 codebases haven't benefited from the work the language developers have done in those years. Even polyglot libraries that support Python 2 and 3 both can't use the new features, so are also stuck with a 2010 Python. Before Python 2.7, the release cycle of Python has seen a new compatible release every 2 years or less. The reality of Python for many of its users is that there has been no feature update of the language for years now.

But I don't want to talk about that. I want to talk about Python 2.8 as an incremental upgrade path to Python 3. If we are going to add features to Python 2, let's take them from Python 3. I want to talk about bringing Python 2.x closer to Python 3. Python 2 might never quite reach Python 3 parity, but it could still help a lot if it can get closer incrementally.

Why an incremental upgrade?

In the discussion about Python 3 there is a lot of discussion about the need to port Python libraries to Python 3. This is indeed important if you want the ability to start new projects on Python 3. But many of us in the trenches are working on large Python 2 code bases. This isn't just maintenance. A large code base is alive, so we're building new features in Python 2.

Such a large Python codebase is:

  • Important to some organization. Important enough for people to actually pay developers money to work on Python code.

  • Cannot be easily ported in a giant step to Python 3, even if all external open source libraries are ported.

  • Porting would not see any functional gain, so the organization won't see it as a worthwhile investment.

  • Porting would entail bugs and breakages, which is what the organization would want to avoid.

You can argue that I'm overstating the risks of porting. But we need to face it: many codebases written in Python 2 have low automatic test coverage. We don't like to talk about it because we think everybody else is better at automated testing than we are, but it's the reality in the field.

We could say, fine, they can stay on Python 2 forever then! Well, at least until 2020. I think this would be unwise, as these organizations are paying a lot of developers money to work on Python code. This has an effect on the community as a whole. It contributes to the gravity of Python 2.

Those organizations, and thus the wider Python community, would be helped if there was an incremental way to upgrade their code bases to Python 3, with easy steps to follow. I think we can do much more to support such incremental upgrades than Python 2.7 offers right now.

Python 2.8 for polyglot developers

Besides helping Python 2 code bases go further step by step, Python 2.8 can also help those of us who are maintaining polyglot libraries, which work in both Python 2 and Python 3.

If a Python 2.8 backported Python 3 features, it means that polyglot authors can start using those features if they drop Python 2.7 support right there in their polyglot libraries, without giving up Python 2 compatibility. Python 2.8 would actually help encourage those on Python 2.7 codebases to move towards Python 3, so they can use the library upgrades.

Of course dropping Python 2.x support entirely for a polyglot library will also make that possible. But I think it'll be feasible to drop Python 2.7 support in favor of Python 2.8 much faster than it is possible to drop Python 2 support entirely.

But what do we want?

I've seen Python 3 developers say: but we've done all we could with Python 2.7 already! What do you want from a Python 2.8?

And that's a great question. It's gone unanswered for far too long. We should get a lot more concrete.

What follows are just ideas. I want to get them out there, so other people can start thinking about them. I don't intend to implement any of it myself; just blogging about it is already breaking my stress-reducing policy of not worrying about Python 3.

Anyway, I might have it all wrong. But at least I'm trying.

Breaking code

Here's a paradox: I think that in order to make an incremental upgrade possible for Python 2.x we should actually break existing Python 2.x code in Python 2.8! Some libraries will need minor adjustments to work in Python 2.8.

I want to do what the from __future__ pattern was introduced for in the first place: introduce a new incompatible feature in a release but making it optional, and then later making the incompatible feature the default.

The Future is Required

Python 2.7 lets you do from __future__ import something to get the interpreter behave a bit more like Python 3. In Python 2.8, those should be the default behavior.

In order to encourage this and make it really obvious, we may want to consider requiring these in Python 2.8. That means that the interpreter raises an error unless it has such a from __future__ import there.

If we go for that, it means you have to have this on the top of all your Python modules in Python 2.8:

  • from __future__ import division

  • from __future__ import absolute_import

  • from __future__ import print_function

absolute_import appears to be uncontroversial, but I've seen people complain about both division and print_function. If people reject Python 3 for those reasons, I want to make clear I'm not in the same camp. I believe that is confusing at most a minor inconvenience with a dealbreaker. I think discussion about these is pretty pointless, and I'm not going to engage in it.

I've left out unicode_literals. This is because I've seen both Nick Coghlan and Armin Ronacher argue against them. I have a different proposal. More below.

What do we gain by this measure? It's ugly! Yes, but we've made the upgrade path a lot more obvious. If an organisation wants to upgrade to Python 2.8, they have to review their imports and divisions and change their print statements to function calls. That should be doable enough, even in large code bases, and is an upgrade path a developer can do incrementally, maybe even without having to convince their bosses first. Compare that to an upgrade to Python 3.

from __future3__ import new_classes

We can't do everything with the old future imports. We want to allow more incremental upgrading. So let's introduce a new future import.

New-style classes, that is classes that derive from object, were introduced in Python 2 many years ago, but old-style classes are still supported. Python 3 only has new-style classes. Python 2.8 can help here by making new style classes the default. If you import from __future3__ import new_classes at the top of your module, any class definition in that module that looks like this:

class Foo:
   pass

is interpreted as a new-style class.

This might break the contract of the module, as people may subclass from this class and expect an old-style class, and in some (rare) cases this can break code. But at least those problems can be dealt with incrementally. And the upgrade path is really obvious.

__future3__?

Why did I write __future3__ and not __future__? Because otherwise we can't write polyglot code that is compatible in Python 2 and Python 3.

Python 3.4 doesn't support from __future__ import new_classes. We don't want to wait for a Python 3.5 or Python 3.6 to support this, even there is even any interest in supporting this among the Python language developers at all. Because after all, there won't be a Python 2.8.

That problem doesn't exist for __future3__. We can easily fake a __python3__ module in Python 3 without being dependent on the language developers. So polyglot code can safely use this.

from __future3__ import explicit_literals

Back to the magic moment of Nick Coghlan and Armin Ronacher agreeing.

Let's have a from __future3__ import explicit_literals.

This forces the author to be entirely explicit with string literals in the module that imports it. "foo" and 'foo' are now errors; the module won't import. Instead the module has to be explicit and use b'foo' and u'foo' everywhere.

What does that get us? It forces a developer to think about string literals everywhere, and that helps the codebase become incrementally more compatible with Python 3.

from __future3__ import str

This import line does two things:

  • you get a str function that creates a Python 3 str. This string has unicode text in it and cannot be combined with Python 2 style bytes and Python 3 style bytes without error (which I'll discuss later).

  • if from __future__ import explicit_literals is in effect, a bare literal now creates a Python 3 str. Or maybe explicit_literals is a prerequisite and from __future3__ import str should error if it isn't there.

I took this idea from the Python future module, which makes Python 3 style str and bytes (and much more) available in Python 2.7. I've modified the idea as I have the imaginary power to change the interpreter in Python 2.8. Of course anything I got wrong is my own fault, not the fault of Ed Schofield, the author of the future module.

from __past__ import bytes

To ensure you still have access to Python 2 bytes (really str) just in case you still need it, we need an additional import:

from __past__ import bytes as oldbytes

oldbytes` can be called with Python 2 str, Python 2 bytes and Python 3 bytes. It rejects a Python 3 str. I'll talk about why it can be needed in a bit.

Yes, __past__ is another new namespace we can safely support in Python 3. It would get more involved in Python 3: it contains a forward port of the Python 2 bytes object. Python 3 bytes have less features than Python 2 bytes, and this has been a pain point for some developers who need to work with bytes a lot. Having a more capable bytes object in Python 3 would not hurt existing Python 3 code, as combining it with a Python 3 string would still result in an error. It's just an alternative implementation of bytes with more methods on it.

from __future3__ import bytes

This is the equivalent import for getting the Python 3 bytes object.

Combining Python 3 str/bytes with Python 2 unicode/str

So what happens when we somehow combine a Python 3 str/bytes with a Python 2 str/bytes/unicode? Let's think about it.

The future module by Ed Schofield forbids py3bytes + py2unicode, but supports other combinations and upcasts them to their Python 3 version. So, for instance, py3str + py2unicode -> py3str. This is a consequence of the way it tries to make Python 2 string literals work a bit like they're Python 3 unicode literals. There is a big drawback to this approach; a Python 3 bytes is not fully compatible with APIs that expect a Python 2 str, and a library that tried to use this approach would suffer API breakage. See this issue for more information on that.

I think since we have the magical power to change the interpreter, we can do better. We can make real Python 3 string literals exist in Python 2 using __future3__.

I think we need these rules:

  • py3str + py2unicode -> py3str

  • py3str + py2str: UnicodeError

  • py3bytes + py2unicode: TypeError

  • py3bytes + py2str: TypeError

So while we upcast existing Python 2 unicode strings to Python 3 str we refuse any other combination.

Why not let people combine Python 2 str/bytes with Python 3 bytes? Because the Python 3 bytes object is not compatible with the Python 2 bytes object, and we should refuse to guess and immediately bail out when someone tries to mix the two. We require an explicit Python 2 str call to convert a Python 3 bytes to a str.

This is assuming that the Python 3 str is compatible with Python 2 unicode. I think we should aim for making a Python 3 string behave like a subclass of a Python 2 unicode.

What have we gained?

We can now start using Python 3 str and Python 3 bytes in our Python 2 codebases, incrementally upgrading, module by module.

Libraries could upgrade their internals to use Python 3 str and bytes entirely, and start using Python 3 str objects in any public API that returns Python 2 unicode strings now. If you're wrong and the users of your API actually do expect str-as-bytes instead of unicode strings, you can go deal with these issues one by one, in an incremental fashion.

For compatibility you can't return Python 3 bytes where Python 2 str-as-bytes is used, so judicious use of __past__.str would be needed at the boundaries in these cases.

After Python 2.8

People who have ported their code to Python 2.8 and have turned on all the __future3__ imports incrementally will be in a better place to port their code to Python 3. But to offer a more incremental step, we can have a Python 2.9 that requires the __future3__ imports introduced by Python 2.8. And by then we might have thought of some other ways to smoothen the upgrade path.

Summary

  • There will be no Python 2.8. There will be no Python 2.8! Really, there will be no Python 2.8.

  • Large code bases in Python need incremental upgrades.

  • The upgrade from Python 2 to Python 3 is not incremental enough.

  • A Python 2.8 could help smoothen the way.

  • A Python 2.8 could help polyglot libraries.

  • A Python 2.8 could let us drop support for Python 2.7 with an obvious upgrade path in place that brings everybody closer to Python 3.

  • The old __future__ imports are mandatory in Python 2.8 (except unicode_literals).

  • We introduce a new __future3__ in Python 2.8. __future3__ because we can support it in Python 3 today.

  • We introduce from __future3__ import new_classes, mandating new style objects for plain class statements.

  • We introduce from __future3__ import explicit_literals, str, bytes to support a migration to use Python 3 style str and bytes.

  • We introduce from __past__ import bytes to be able to access the old-style bytes object.

  • A forward port of the Python 2 bytes object to Python 3 would be useful. It would error if combined with a Python 3 str, just like the Python 3 bytes does.

  • A future Python 2.9 could introduce more incremental upgrade steps. But there will be no Python 2.9.

  • I'm not going to do the work, but at least now we have something to talk about.

Morepath 0.1 released!

I've just released Morepath 0.1! This is Morepath's first ever proper release. Hurray! If you've been waiting for a release before trying Morepath, now is the time to stop waiting!

Morepath is your friendly neighborhood Python web framework with super powers.

The docs for 0.1 are here: http://morepath.readthedocs.org/en/0.1/

It includes a quickstart, and installation docs, and much much more.

The docs are quite complete in describing Morepath's abilities and how to use them, though undoubtedly things could be improved. Let me know if you find any sections where things are unclear or missing!

The Extra Bits

There are still some docs floating around that I intend to integrate into the main documentation. One bit involves the permission and authentication docs; I previously blogged a description.

You can integrate Morepath with SQLAlchemy or the ZODB using more.transaction. I've just made a 0.1 release of that too. See this blog entry for more information on what's going on with that, and here is example code.

Talking about Morepath

There is a #morepath channel on freenode IRC.

If you have an issue or feature request, you can use the issue tracker on Github:

https://github.com/morepath/morepath/issues?milestone=1&state=open

You can also contact me directly; contact information is at the bottom of this weblog. I'm happy to hear from you!

Onward to 0.2!

Morepath is now transitioning into a phase where its actual use will drive its further development. I expect others (you?) will come up with new and interesting ways to use the facilities and abstractions that Morepath offers. I'm looking forward to discovering new and better ways to do web development with you!

WebOb and Werkzeug compared

Yesterday I wrote an article discussing why Morepath switched from the Werkzeug library to the WebOb library. I promised a followup with some feedback on WebOb and Werkzeug, and here it is.

Morepath is the friendly neighborhood Python web framework with super powers that I'm working on.

Let me start by stating that Werkzeug and WebOb are extremely similar libraries. There are some minor differences in the details of the Request and the Response object API, but the capabilities are pretty equivalent. It was easy to switch from one to the other.

I am primarily interested in the Request and Response wrappers for WSGI, and my second interest is in the lower-level APIs to handle HTTP.

Lower-level APIs

Werkzeug exposes and documents lower level APIs for HTTP processing. WebOb does not have so much layering, and does not expose a low-level HTTP API.

I like Werkzeug better here. I noticed the lack in WebOb in one point in Morepath: dealing with basic authentication. While Werkzeug exposed an API for parsing the authentication header (parse_authorization_header), WebOb did not and I had to steal code from Pyramid. It would be nice if WebOb included more of such lower-level utility APIs.

Routing

Werkzeug contains a routing implementation, which always bothered me a little; I have my own routing implementation in Morepath and do not want Werkzeug's. WebOb focuses on just request and response handling and is a better fit for Morepath here.

Testing Tools

During my testing I ran into a point where I wanted to test cookies. Werkzeug offers handy test utilities that take care of this. WebOb does not do it out of the box. But I quickly found WebObToolkit, which does offer a Client class with the same capabilities as the one in Werkzeug. I could easily convert my tests from Werkzeug to WebOb doing this, and only have to have a test dependency on WebObToolkit.

args versus GET, form versus POST

Werkzeug exposes URL parameters in a args attribute of the Request object, and WebOb instead offers a GET attribute.

I find the name of the GET attribute slightly wrong, as you can have URL parameters for a POST request as well.

Werkzeug exposes a parsed form in its form attribute, whereas WebOb uses the POST attribute for this. This is also confusing, as it contains the POST body for other HTTP methods as well, such as PUT. In addition with form I get the immediate clue that it's a parsed form, whereas with POST I don't get this clue and in fact I had to check the manual to verify it only can contain form data.

WebOb also offers params, which is GET and POST combined, but Morepath needs specific access, not this combined one. Werkzeug calls this values.

It's easy enough to learn this and only a minor annoyance. Still, I wonder whether it'd be worth it for WebOb to introduce args and form as aliases for GET and POST and then perhaps deprecate the old style.

Performance

As discussed, WebOb is a bit faster for my use cases than Werkzeug. I suspect a lot of the performance in WebOb has to do with the optimization efforts by Chris McDonough, who uses WebOb in Pyramid.

Werkzeug's performance issues may be a regression due to compatibility code for Python 3 -- much of it seems to be due to an excessive amount of isinstance calls that probably have to do with string processing.

Python 3

Both WebOb and Werkzeug are Python 3 compatible, though the way WebOb introduced this compatibility evidently avoided performance issues.

Pyramid Compatibility

While Morepath looks like Flask, it is quite similar to Pyramid under the hood in many details.

When I announced the switch to WebOb from Werkzeug I got some positive feedback -- of course I might've gotten equivalent positive feedback from the Flask folk if I'd switched the other way around; it's impossible to say. I do know that in the Pyramid world there seems to be a bit more of a culture of sharing generic libraries than there is in the Flask world.

People already expressed interest in sharing code between Pyramid libraries and Morepath libraries, and this should now be easier as the request and response objects are shared.

This should in particular make it easier to write tweens in such a way that they work in Pyramid and Morepath both. Tweens are an idea I took from Pyramid and a tween function has the same API in Pyramid and Morepath -- it takes a request and returns a response.

This would involve some refactoring of tween factory code however (or a compatibility layer), as the way tweens are created is different.

Mixins

One thing that bothered me with Werkzeug are the many mixins provided that you can include in the Request and Response objects. It was never quite clear to me what mixins Morepath should be using exactly, except in one case, where I had to involve CommonResponseDescriptorsMixin to make sure the content_type header got set properly on the response -- which I found out only after some debugging.

I don't really see the point of all these mixins; in theory you could include just the functionality you need, but in practice the extra functionality does not really hurt on the original Request and Response objects itself, and I just get confused as what I should use.

WebOb does offer BaseRequest versus Request, where Request adds the AdhocAttrMixin, which seems to maintain all non-webob attributes on the Request in the WSGI environment for some reason. Once I saw the performance drawback that brought, I quickly started using BaseRequest instead.

Debug Server

Werkzeug has a built-in debug server with some interesting capabilities. WebOb does not. I hadn't used the debug server myself with Morepath yet, though I had integrated it, so I didn't feel terrible in replacing it with the server of wsgiref for development purposes. Still, I should look around in WSGI/WebOb land to see whether I can find something similar. Anyone have any ideas?

HTTP Exceptions

Werkzeug implements HTTP exception classes, and WebOb does too. This means you can raise a HTTP exception and have the framework catch it and use it as a HTTP response. Very convenient, and I use it in Morepath.

But Werkzeug actually documents the HTTP exception classes available, and I can link to them with the Morepath documentation using intersphinx.

WebOb does not offer API documentation for its exception classes, and I had to look at the source. It would be nice if WebOb included API documentation for these.

Conclusion

The two frameworks are pretty equivalent. There are not really very strong reasons to pick one over the other.

Werkzeug does a bit more, which is sometimes nice and something more than I need. Werkzeug also has better API documentation. On the other hand it offers a complex system of mixins.

WebOb is faster, and a bit closer to the goldilocks zone for the purposes of Morepath: not too much, and only rarely too little. It should also not be hard to improve WebOb in the areas where Werkzeug is nicer. There's also the hope of more code sharing with the Pyramid ecosystem.

Hopefully this article will be helpful to those trying to figure out what WSGI request/response implementation to use, and also to the maintainers of Werkzeug and WebOb themselves.

Let me know what you think!

Morepath: from Werkzeug to WebOb

Today I changed over Morepath to use WebOb instead of Werkzeug as its request and response implementation.

Morepath is your friendly neighborhood Python web micro framework with super powers.

In this post I'd like to explain what lead me there.

Having had now quite a bit of experience with both Werkzeug and WebOb I will offer some points of comparison and feedback that may be useful to improve Werkzeug and WebOb both, but I have done that in a followup post.

Performance Testing

Two weeks ago I gave a talk about Morepath in Singapore, for the Python Singapore User Group. When preparing the talk I ran into a blog post describing a performance comparison between web frameworks called Python Fastest Web Framework.

Now Morepath is not striving to be Python's fastest web framework. It's striving to be fast enough, and offer a lot of power and flexibility to developers in a small package. Morepath offers some special features such as linking and application reuse.

A performance comparison between web frameworks implies functional equivalence between them, but they are not: some web frameworks like Morepath have powers that others don't have. Using those powers may allow you to build applications more quickly, and also organize them in ways so that they are faster end than is easy to accomplish with other, less versatile frameworks.

In addition we know that real world web applications typically have so much overhead doing other things (such as dealing with databases) that simple things like request handling are a minimal contribution to performance in the end.

All that aside, I still wondered how Morepath did compared to other web frameworks. Of course I did! It's nice to be able to say your web framework is fast. More subtly benchmarking can also say something about the amount of work a web framework does to serve a request, and the less work, arguably the easier it is to understand the web framework and to debug it.

So I plugged Morepath into the "hello world" page benchmark and found Morepath was about as fast as Flask, but that Flask was one of the slower of the lot compared:

             msec    rps      tcalls funcs
django       10605   9430     183     89
flask        14611   6844     257    119
falcon        1355  73825      29     25
morepath     15967   6263     314    122
pyramid       3417  29269      64     48
tornado      10073   9928     188     67
wheezy.web    1222  81847      25     23

msec here is the amount of milliseconds to run all 100,000 requests in the benchmark, rps is amount of requests per second, tcalls is the total amount of function calls to handle a single request, and funcs is the amount of different functions called during request handling.

Morepath in this benchmark is about the same speed as Flask. Morepath is slower at this benchmark than Django, Tornado. Pyramid does pretty well too, which is not a surprise due to its focus on performance. Morepath is hopelessly slower at this benchmark than speed monsters such as Falcon or wheezy.web. wheezy.web, the web framework by the author of the blog entry.

That evening when I gave my presentation someone actually referenced this benchmark and wheezy.web and asked how fast is Morepath. Having done my research I could answer him Morepath is about as fast as Flask, and also gave the caveats concerning performance above.

Still, could Morepath not do better? Morepath is about as fast as Flask in this benchmark. Perhaps the Werkzeug library that both Morepath and Flask used was the common factor dragging performance down?

wheezy.http

wheezy.web is fast, so I took a look at this. I discovered wheezy.web is built on wheezy.http, which is a library that abstracts request and response from WSGI much like WebOb and Werkzeug do.

After coming back from Singapore to the Netherlands, I looked up the author of wheezy.web and wheezy.http up on IRC, and had a nice conversation with him. He pointed out that his benchmarking system has a knob that shows profiler statistics. I turned it on and this is what I saw:

         31200017 function calls (29700017 primitive calls) in 25.958 seconds

   Ordered by: internal time
   List reduced from 134 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1700000    1.728    0.000    5.079    0.000 lookup.py:136(all)
800000/300000    1.530    0.000   16.120    0.000 mapply.py:5(mapply)
900000/400000    1.036    0.000   18.370    0.000 lookup.py:104(call)
  1000000    1.015    0.000    6.367    0.000 lookup.py:54(component)
900000/400000    0.934    0.000   19.425    0.000 generic.py:44(wrapper)
  1000000    0.853    0.000    1.147    0.000 compose.py:83(all)
  1000000    0.853    0.000    1.606    0.000 generic.py:29(get_lookup)
  3100002    0.851    0.000    0.851    0.000 {isinstance}
   100000    0.685    0.000    3.577    0.000 wrappers.py:733(__init__)
   200000    0.628    0.000    5.895    0.000 core.py:37(traject_consume)

Lots of this stuff I recognize as the internals of Reg, the generic function call library that Morepath is built on and that is already a known candidate for optimization efforts, but that will have to wait until later. We care about the request/response implementation now.

Werkzeug shows up twice in the top 10. First there's response object generation:

100000    0.685    0.000    3.577    0.000 wrappers.py:733(__init__)

Second, and harder to recognize, is this one:

3100002    0.851    0.000    0.851    0.000 {isinstance}

This is an enormous amount of calls to isinstance(). I recognized this as due to Werkzeug as the profile for Flask showed the exact same number of calls (3100002), strongly suggesting Werkzeug as the cause.

I bit the bullet and experimentally changed Morepath to use wheezy.http as its request/response implementation instead of Morepath. This caused the request/response implementation to completely disappear from the top 10 most expensive functions. The isinstance stuff was gone too.

Morepath was 47% faster on helloworld now than with Werkzeug!

Armin Ronacher responded to this result on Twitter, and said the isinstance business is likely a performance regression due to Python 3 compatibility in Werkzeug...

Switching to wheezy.http?

I wondered now whether I should switch Morepath to wheezy.http. It is certainly an attractive library, along with some of the other wheezy.* libraries.

My main trouble with it is that wheezy.http has seen much less real-world battle testing than either Werkzeug or WebOb. Looking at its source code the request and response wrappers were very simple indeed, which made them a lot easier to read than the equivalent implementations in Werkzeug and WebOb. That is certainly attractive. But they also seemed to do rather little with encodings. And later I heard from Chris McDonough that wheezy.http will have trouble dealing with non-ascii URLs.

WebOb

There was an obvious candidate sitting around that I hadn't tried yet: WebOb.

I had initially deliberately avoided using WebOb for Morepath for two reasons:

  • when I had to do some other WSGI work I found that Werkzeug had a nicer lower-level API exposed that let me work with raw WSGI better.

  • Pyramid is already using WebOb, and I figured since Morepath was already similar enough to Pyramid anyway I could try Werkzeug for a change. Perhaps using it would benefit Morepath in some ways I could not foresee.

The second reason wasn't very good, except for one thing: I learned more about Flask and could model aspects of the Morepath documentation after it. Otherwise Werkzeug and WebOb are pretty interchangeable. And I'm confident Morepath is different enough from Pyramid anyway.

Now I had a strong reason to try WebOb: performance. I know that Chris McDonough had been working on WebOb a lot and that he cares a lot about performance, so I figured I should give it a shot.

So I swapped in WebOb and tried the benchmark again. The first result was disappointing:

             msec    rps      tcalls funcs
flask        14638   6832     257    119
morepath     15089   6627     289     95

Morepath was only a little bit faster, and still flower than Flask. What's going on here? Turning on the profiler showed me what was going on:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1700000    1.829    0.000    5.270    0.000 lookup.py:136(all)
800000/300000    1.495    0.000   13.850    0.000 mapply.py:5(mapply)
   800000    1.047    0.000    1.980    0.000 request.py:1405(__setattr__)

WebOb's request.py __setattr__ was showing up at number 3. I discovered that WebOb's request object has some magic that observes attributes. I also discovered that WebOb has a BaseRequest that doesn't include this magic.

WebOb: the results

So I tried things again using BaseRequest instead:

             msec    rps      tcalls funcs
flask        14741   6784     257    119
morepath     12236   8173     245     92

That makes Morepath 30% faster with WebOb than with Werkzeug and faster than Flask.

Not as good as with wheezy.http, but using a much more battle-tested framework, so not bad. This gets Morepath closer to Django and Tornado at this. Once I optimize Reg I think I can get closer still.

Also see that the amount of functions called during a request dropped from 314 to 245, and the amount of functions used drom and used has dropped from 122 to 92.

Switching Morepath to WebOb

If I wanted to switch something as big as a request/response implementation, now was the time: before a Morepath release. So I made the switch.

It wasn't difficult; the APIs are very similar. The most work was actually porting the Morepath tests, but that got a lot easier once I discovered the webobtoolkit library.

Another benefit of switching to WebOb is that it may eventually allow more code sharing between the Morepath and Pyramid ecosystems. I suspect the easier candidate for code sharing would be Tweens, as Morepath and Pyramid now have the same basic Tween API.

I've followed up this post with some feedback about Werkzeug and WebOb in general.

Racing the Morepath: SQLAlchemy Integration

I felt like I was racing on the Morepath today. My goal was to see how to integrate Morepath with a database. To make this goal practical, I looked into integrating Morepath with SQLAlchemy. To go faster, I borrowed ideas and code liberally from Pyramid.

Tweens

This morning I borrowed the idea of tweens from Pyramid. A tween is basically much like a WSGI framework component, but one that knows about the web framework -- it gets the web framework's request, and it can send the web framework's response, among other things. Now you can write this with Morepath:

@app.tween_factory()
def get_tween_factory(app, handler):
    def wrapping_handler(request, mount):
         reponse = handler(request, mount)
         response.headers['foo'] = 'bar'
         return response
    return wrapping_handler

You can plug in a function that generate a wrapper around the original request handler.

more.transaction

This allows all sorts of neat stuff, including, for Pyramid, pyramid_tm, which integrates Pyramid with the transaction module.

So I ported over this code to Morepath in the form of more.transaction.

To use it in your Morepath app, simply do:

from more.transaction import transaction_app

app = morepath.App(extends=[transaction_app])

What happens now? Morepath will automatically commit the transaction if there is no error (like a 500 error, say).

This means that Morepath now has integration with databases that use the transaction module, such as the ZODB and also SQLALchemy (using zope.sqlalchemy for transaction integration).

[update after someone blindly complains after seeing the word "zope" in a package name... Please don't do that.]

Moreover, you can use multiple such databases in the same application. You can modify the ZODB and a relational database in an application, and be secure that if anything fails during the request handling, none of the databases will changed -- both transactions will be aborted. There's a lot of goodness in the transaction module.

Morepath settings infrastructure

It turns out pyramid_tm is configurable in various ways. It allows you to set the number of attempts it will try to commit in the face of conflicts, for instance. To support this, I had to build Morepath's settings infrastructure; just the part where you can have settings at all, not loading them from a config file -- that's for later.

Here's an example of the settings for the transaction app in more.transaction:

@app.setting_section(section='transaction')
def get_transaction_settings():
    return {
        'attempts': 1,
        'commit_veto': default_commit_veto
        }

These are the defaults defined by more.transaction, but they can be easily overridden in your own app (by writing the same code as above with different values for the configuration).

When I started to write Morepath's settings infrastructure I wrote quite a bit of code involving combining and extending settings, only to throw it away by replacing it with much shorter code that builds on Morepath's config engine that I already use for its other directives. Nice!

morepath_sqlalchemy

Now that I had all the pieces I needed to put them together to demonstrate SQLAlchemy integration: morepath_sqlalchemy.

This uses more.transaction, zope.sqlalchemy and SQLAlchemy together. It's all done here, but I'll summarize the important bits of integration here:

from sqlalchemy.orm import scoped_session, sessionmaker
# SQLAlchemy session
Session = scoped_session(sessionmaker())

from zope.sqlalchemy import register
# register session with transaction module
register(Session)

import morepath
from more.transaction import transaction_app

# create our app, extending transaction_app.
# this means we get Morepath transaction integration
# and default settings for it.
app = morepath.App(extends=[transaction_app])

Quite a day

It's all still rough, needs polishing and documenting, but the foundations are now there. Quite the day of coding! I couldn't have done it without the Pyramid project from which I could borrow many an idea and piece of code.

Now I know Morepath can be integrated with any kind of database -- if transaction module integration is there, like for SQLAlchemy and the ZODB, it is very easy: just use more.transaction. But even if not, it should now be possible to write a tween that does the trick.

Tweens allow other neat things too: I think I saw Pyramid's custom error view system is based on a tween, and I still need custom error views in Morepath...

Interested? Hope to hear from you! Join #morepath on freenode IRC, drop me an email, or leave a comment on the issue tracker.

The Centre Cannot Hold

Things fall apart; the centre cannot hold

(The Second Coming by Yeats)

Last time I talked about how I went back to the center of the Zope project. Over the course of the year following, we managed to refactor the Zope Toolkit, clean up the dependency structure, and we could drop many of its unwanted dependencies.

I had hoped that with new leadership, a steering group, we could also reinvigorate the Zope project itself. Could we get together and do exciting new things again?

The answer was no.

Chris McDonough

In late march 2009, I visited PyCon, in Chicago. There I had a conversation with Chris McDonough, who was working on what was to become the Pyramid web framework. He and I had a conversation about the dependency cleanup project I had started and that had been making waves on the zope-dev mailing list. Some beer was involved, and some miscommunication. Chris was skeptical that the cleanup project would succeed within the year, which confused me a bit, as we had already made a lot of progress.

But I was talking about cleaning up circular dependencies and the ability to lose a lot of the code. Chris was talking about making libraries with a clear purpose and documentation.

And while we got better dependencies at the end of the year, I failed.

Burning out on Zope

The details of the straw that broke the camel's back (though I hardly have the stamina of a camel) are immaterial. Suffice it to say that when practical disagreements happened our steering group did not function.

So in early 2010 I realized I was putting more into the project than I was getting out of it. I was running out of steam. It was costing me emotionally. So I stopped going to the Zope mailing lists. While we had made progress on matters where there was consensus, maintenance seemed the only consensus that could still be found.

Consensus on the boring stuff is not enough when the web is changing. And the web was changing, as it always is. A lot of the innovation on the web was happening on the client-side, in JavaScript, but Zope had no client-side story. People had moved into different directions, and the community had fractured.

The dependency cleanup was just about the only progress being made -- what about my personal goals? Where was the creativity, the getting together to do new interesting things? It wasn't there. Instead we had a bunch of opinionated people who couldn't agree enough to get anything but basic maintenance work done, and stumbled doing even that.

I'd been involved in Zope heavily, also serving on the Zope Foundation board for some years, including as its chairman. I decided I had to pull back from all of it.

The Zope Summit

In September 2010 I found myself at a Zope Developer Summit, which had been organized in part because of my urging. I had been heavily invested in Zope, and had been for more than 10 years. I had used Zope, benefited from Zope, contributed to Zope, redefined Zope and built on Zope. I had learned from Zope. I had been first board member and then chairman of the Zope Foundation.

I had hoped that a summit could get things moving again. Talk about cool new things that we might do together.

Zope was in trouble. The codebase would live on. It still does. But it is in maintenance mode - it doesn't do much that is new.

I came to the Zope Summit with a gloomy heart. That did not help my mood at the summit -- sorry.

I think most of us left the summit with the feeling not enough had been accomplished. The future of Zope was disappearing. Zope had lost its power to adapt. The web changes, but Zope didn't anymore.

This was the end of Zope, for me. I still use it in the shape of Grok to this day, but it is not the same.

Life After Zope

But this is not the end. The code still continues, and is being used, though is mostly in maintenance mode.

And there is life after Zope. Next I will talk a bit about what came after.

This blog entry is a part of a series on Zope and my involvement with it. Previous.

Breaking Morepath Changes

I'm slowly heading to a first release of the Morepath web framework, but right now I can still change anything without breaking any significant code. So I took the opportunity to do so.

What's Morepath? Morepath is your friendly neighborhood web framework with superpowers. Read more here.

These changes are in fact less big than some refactorings I do to Morepath frequently, but they break the public API of Morepath, so they're big in that sense.

These are the changes:

  • The @app.model directive was renamed to @app.path, as I realized the directive is describing a path more than a model. Here's what it looks like now:

    @app.path(model=Document, path='documents/{id}')
    def get_document(id):
        return query_document(id)
    

    The name is justified as such: just like the view directive describes a view, the path directive describes a path. Paths and views are related to each other by model class. The word model was rather overused in Morepath anyway.

  • The @app.view directive decorates a function that's a view. It used to get request, model parameters. I've changed this to self, request, reversing the order. This to make it clearer to people that a view is really much like a method, and to free up the word model some more. Here's what it looks like:

    @app.view(model=Document)
    def document_default(self, request):
        return "Hello document: %s" % self.id
    

    self is not a reserved word in Python, so I figured this was a good place to use it, even though document_default really is a function, not a method. But since it's a generic function it's like a method anyway.

    The lookup of the view is still done giving request a greater weight than model, like in Pyramid. That's mostly an implementation detail in Morepath. In Pyramid this matters a lot more, but in Morepath there really isn't anything done yet with different request classes.

  • The @app.root directive is now gone. It wasn't pulling its weight anymore, as it had become just an alias for @app.path with a path parameter of "/". This is what it looks like now:

    @app.path(model=Root, path='/')
    def get_root():
        return the_root
    

(by the way, if the docs on the Morepath website don't update for you, do a shift reload. I'm not sure how long it takes for the cache to expire.)

Want to talk to me about Morepath? Leave a comment here, drop me an email, use the Morepath issue tracker or join the #morepath IRC channel on freenode. Hope to hear from you!

Morepath Update

I've been hard at work on Morepath recently, and I thought I'd give an update. Morepath is your friendly neighborhood Python web framework with super powers.

Models and linking

URL parameters are now a first-class citizen in Morepath model-link generation, meaning that you can generate a link like:

/document?id=100

as easily as a link like this:

/document/100

with the same invocation:

request.link(doc)

It just depends on how you register the document.

Morepath can now also automatically decode the string "100" into the int 100 and encode it back again, and this is extensible to other kinds of objects.

Much more about this can be found in the model and linking section in the Morepath documentation.

Views

Views also have had their own powers and capabilities for a while.

Writing a generic view is as easy as writing any view in Morepath. You can extend view matching with new predicates, and composing views programmatically is as easy as calling request.view().

What's new are the docs. Read much more about views.

Comparing Morepath with other web frameworks

There are so many web frameworks, why should you even consider Morepath? I attempted to give a bit of background on how Morepath is the same and how it is different, and why, in the Comparison with other Web Frameworks document.

Feedback

Even a little bit of feedback can be enormously helpful to me, so I invite everybody again to leave a comment, drop me a mail, or join us on the tiny but growing #morepath IRC channel on freenode!

Join the Morepath in the early days and gain "I was there at the time" bragging rights!

How to do REST with Morepath

Another day, another piece of Morepath documentation. Morepath is your friendly neighborhood Python microframework with super powers.

Morepath lets you write RESTful web services using the same tools you'd use to write any web applications, not with a separate framework. That's how it should be: traditional multi-page web applications tend to be eminently RESTful, after all. I think it's a good idea to make what's behind your modern rich client-side applications in JavaScript RESTful too: that's what I've been doing for some years now.

Lots of people claim to do REST, and they're all wrong! Okay, well, not all of them, but many of them are. They do all the rest of REST, except the very important bit called HATEOAS, which has the scariest acronym ever, I can barely even spell it, but basically means you link your REST resources to each other, just like you'd link web pages. Morepath does HATEOAS out of the box, because Morepath is good at paths. It's in the name.

Join me on the Morepath and read how to do REST with it!

http://morepath.readthedocs.org/en/latest/rest.html

As always feedback is very welcome. Even a little bit of feedback can mean a lot to me. It gives me ideas. Leave a comment here, or join the ever burgeoning #morepath IRC channel on Freenode. Seriously. I counted like 5 people there today! Including me. I'm totally not lonely!

Have fun reading!

Morepath Security

In this post I'd like to discuss the security infrastructure of the Morepath web framework. I hope that I can showcase some of the interesting features of Morepath security, generate a bit of feedback for me, and also serve as the start of documentation on this topic. Finally, it's also useful for me to get an overview of where gaps in my implementation still exist.

What it should do

So what do I mean by "security"? I mean infrastructure in the Morepath web framework that helps you make sure that the web resources published by your application are only accessible by those people who are allowed to by your application. If you're not allowed you'll get a HTTP error.

Right now that error is HTTP Unauthorized (401) but I think I'll shortly change it to make it HTTP Forbidden (403), and make HTTP Unauthorized optional when HTTP Basic Authentication is in use. But I need to add a bit of infrastructure to make that possible first.

What does this look like in practice?

First we need to make sure that the application has an identity policy: a system that takes the HTTP request and established a claimed identity for it. For basic authentication for instance it will extract the username and password. The claimed identity can be accessed by looking at the identity attribute on the request object.

This is how you install an identity policy into a Morepath app:

from morepath.security import BasicAuthIdentityPolicy

@app.identity_policy()
def get_identity_policy():
    return BasicAuthIdentityPolicy()

Let's say we want two permissions in our application, view and edit. We define those as plain Python classes:

class ViewPermission(object):
    pass

class EditPermission(object):
    pass

Since they're classes they could even inherit from each other and form some kind of permission hierarchy, but we'll keep things simple here.

Now we can protect views with those permissions. Let's say we have a Document model that we can view and edit:

@app.html(model=Document, permission=ViewPermission)
def document_view(request, model):
    return "<p>The title is: %s</p>" % model.title

@app.html(model=Document, permission=EditPermission)
def document_edit(request, model):
    return "some kind of edit form"

This says that we only want to allow access to document_view if the identity has ViewPermission on the Document model, and only allow access to document_edit if the identity has EditPermission on the Document model.

But how do we establish permissions for models at all? We can do this with the permission directive. Let's give absolutely everybody view permission:

@app.permission(model=Document, permission=ViewPermission)
def document_view_permission(identity, model, permission)
    return True

We can implement any permission policy we like. Let's say a user has EditPermission on Document if it's in a list allowed_users on that document:

@app.permission(model=Document, permission=EditPermission)
def document_edit_permission(identity, model, permission):
    return identity.userid in model.allowed_users

Morepath Super Powers Go!

What if we don't want to have to define permissions on a per-model basis? In our application, we may have a generic way to check for the edit permission. We can easily do it with Morepath, as it knows about inheritance:

@app.permission(model=object, permission=EditPermission)
def has_edit_permission(identity, model, permission):
    return has_generic_edit_permission(identity, model)

This permission function is registered for model object, so will be valid for all models in our application. We can of course still make exceptions for particular models:

@app.permission(model=SpecialModel, permission=EditPermission)
def special_model_edit_permission(identity, model, permission):
    return exceptional_edit_permission_check(identity, model)

Variation is not only on model, but can also on permission or identity. For instance, we can register a permission policy for when the user has not logged in yet, i.e. has no claimed identity:

@app.permission(model=object, permission=EditPermission, identity=None)
def has_edit_permission_not_logged_in(identity, model, permission):
    return False

This permission check works in addition to the ones we specified above. In short you can be as generic or specific as you like.

Login forms

We've tackled a lot, but not yet how a user actually logs in to establish an identity in the first place. We need a view that gets triggered when the user logs in, for instance by submitting a login form. I'll sketch out some code here to give you an idea:

import morepath

@app.root()
class Root(object):
    pass

@app.html(model=Root, name='login')
def login_attempt(request, model):
    username = request.form['username']
    password = request.form['password']
    # check whether user has password, using password hash and database
    if not user_has_password(username, password):
        return "Sorry, login failed"
    # okay, user is known, so modify response remembering that user
    @request.after
    def remember_security(response):
        # create identity object
        identity = morepath.Identity(username)
        # modifies response, setting identity
        morepath.remember(response, request, identity)
   # redirect to original page after successful login attempt
   return morepath.redirect(request.form['came_from'])

This would work well for session, cookie or ticket-based authentication: after the identity is established once it is remembered on the response object, so that it is automatically sent with each new request.

For HTTP basic authentication this form-based authentication approach wouldn't work, of course. We'd instead need to issue a HTTP Unauthorized response, causing the browser to ask the user for a username/password combination, which it would then send along for each subsequent request. The remember operation is a no-op for basic auth.

Note: Currently this is in fact broken code as I haven't enabled the implicit generic function implementation lookup yet for Morepath implementations. It's an easy fix as I have the infrastructure for it in Reg, but I need to add it to my todo list.

Theory: Identity/permits

Security in Morepath has two parts:

  • establish someone's claimed identity.

  • use that claimed identity to see whether access to a Morepath view is permitted.

This looks very much like, but isn't quite, a separation between authentication and authorization. There are subtle differences.

Establishing identity only establishes a claimed identity and does not verify (with, say, a database), that this identity is in fact still recognized by the system (the user may be removed), or that this identity is even as it is claimed.

In the case of some identity policies and some applications this is in fact enough: if someone manages to claim an identity, then it is the identity, without the need to access a database. This is for instance the case with cryptographic ticket based authentication systems such as the one implemented by mod_auth_tkt. If someone comes along with the right encrypted auth tkt cookie, we know that's the identity we gave to this user earlier when they first logged in. No database verification check is needed.

Then there's the next step: to see whether a claimed identity is permitted to access a view on a model. This can be seen as the authorization step, and will normally have to access some kind of database to do so. It may however do something in addition when accessing the database: verify that the claimed identity is still valid. This is done in this step because it can sometimes be established whether someone has access and whether the claimed identity is valid by a single database query.

What have these changes accomplished? We've made sure that establishing a claimed identity can be done without touching a database; only checking whether that identity is permitted something has to touch a database, possibly only once.

I'm grateful to Chris McDonough, creator of the Pyramid web framework, who wrote a very useful postmortem on Pyramid's design and what he'd do differently if he could change things. That was extremely useful when I come up with Morepath's security system, so thanks again, Chris! Also inspirational was my long familiarity with Zope's security system, and I hope to have distilled some of it down to the minimum. Of course anything I got wrong is my own fault.

Identity Verification?

I think the case of Morepath it would in fact be easy to let users specify an app-specific verify function, which will be called with the identity and can use the database to establish whether an identity is claimed. This would be easy to integrate into Morepath, and doesn't require the miniframeworks that Chris bother him in his postmortem. It could look like this (for something like basic auth):

@app.function(generic.verify, object)
def verify_identity(identity):
     return validate_password(identity.userid, identity.password)

I could add a bit of sugar for this in the form of a directive:

@app.verify()
def verify_identity(identity):
     return validate_password(identity.userid, identity.password)

The default verify would just return True, so it would still work for identity policies and applications that have no need for identity verification. Identification verification would not be part of identity policy, but part of Morepath's security infrastructure, as it would be entirely application specific.

More identity policies

Right now Morepath implements only a single identity policy, and that's BasicAuthIdentityPolicy for HTTP basic auth. I hope I will get time in the future to port some of the more interesting functionality from Pyramid's authentication module; the AuthTktAuthenticationPolicy looks interesting in particular. These could then be made available in a Morepath extension module. If you are interested in helping this porting activity I'd be thrilled though -- get in touch with me, please!

Conclusion

I'm pretty pleased with the flexibility of the Morepath security system: you can be fine-grained or generic when you need to. It's all built on top of the Morepath foundations of directive-based configuration and Reg generic functions, and it fits.

The identity system should be able to cope well with any kind of authentication system you can throw at it, allowing you to write generic code and swapping in a different one with ease. The only oddball is the perennial exception of HTTP Basic Auth, but at least Morepath can deal with it.

There may seem to be many concepts involved at first, but I think in fact they've been kept down to an amount that you can still keep in your head after you get used to it.

I hope to get feedback on this, and I also hope people will play with it, so we can smoothen out any kinks quickly. Please let me know what you think! Join the #morepath IRC channel on freenode!