Skip to main content

Framework Patterns

A software framework is code that calls your (application) code. That's how we distinguish a framework from a library. Libraries have aspects of frameworks so there is a gray area.

My friend Christian Theune puts it like this: a framework is a text where you fill in the blanks. The framework defines the grammar, you bring some of the words. The words are the code you bring into it.

If you as a developer use a framework, you need to tell it about your code. You need to tell the framework what to call, when. Let's call this configuring the framework.

There are many ways to configure a framework. Each approach has its own trade-offs. I will describe some of these framework configuration patterns here, with brief examples and mention of some of the trade-offs. Many frameworks use more than a single pattern. I don't claim this list is exhaustive -- there are more patterns.

The patterns I describe are generally language agnostic, though some depend on specific language features. Some of these patterns make more sense in object oriented languages. Some are easier to accomplish in one language compared to another. Some languages have rich run-time introspection abilities, and that make certain patterns a lot easier to implement. A language with a powerful macro facility will make other patterns easier to implement.

Where I give example code, I will use Python. I give some abstract code examples, and try to supply a few real-world examples as well. The examples show the framework from the perspective of the application developer.

Pattern: Callback function

The framework lets you pass in a callback function to configure its behavior.

Fictional example

This is a Form class where you can pass in a function that implements what should happen when you save the form.

from framework import Form

def my_save(data):
    ... application code to save the data somewhere ...

my_form = Form(save=my_save)

Real-world example: Python map

A real-world example: map is a (nano)framework that takes a (pure) function:

>>> list(map(lambda x: x * x, [1, 2, 3]))
[1, 4, 9]

You can go very far with this approach. Functional languages do. If you glance at React in a certain way, it's configured with a whole bunch of callback functions called React components, along with more callback functions called event handlers.

Trade-offs

I am a big fan of this approach as the trade-offs are favorable in many circumstances. In object-oriented languages this pattern is sometimes ignored because people feel they need something more complicated like pass in some fancy object or do inheritance, but I think callback functions should in fact be your first consideration.

Functions are simple to understand and implement. The contract is about as simple as it can be for code. Anything you may need to implement your function is passed in as arguments by the framework, which limits how much knowledge you need to use the framework.

Configuration of a callback function can be very dynamic in run-time -- you can dynamically assemble or create functions and pass them into the framework, based on some configuration stored in a database, for instance.

Configuration with callback functions doesn't really stand out, which can be a disadvantage -- it's easier to see someone subclasses a base class or implements an interface, and language-integrated methods of configuration can stand out even more.

Sometimes you want to configure multiple related functions at once, in which case an object that implements an interface can make more sense -- I describe that pattern below.

It helps if your language has support for function closures. And of course your language needs to actually support first class functions that you can pass around -- Java for a long time did not.

Pattern: Subclassing

The framework provides a base-class which you as the application developer can subclass. You implement one or more methods that the framework will call.

Fictional example

from framework import FormBase

class MyForm(FormBase):
    def save(self, data):
        ... application code save the data somewhere ...

Real-world example: Django REST Framework

Many frameworks offer base classes - Django offers them, and Django REST Framework even more.

Here's an example from Django REST Framework:

class AccountViewSet(viewsets.ModelViewSet):
    """
    A simple ViewSet for viewing and editing accounts.
    """
    queryset = Account.objects.all()
    serializer_class = AccountSerializer
    permission_classes = [IsAccountAdminOrReadOnly]

A ModelViewSet does a lot: it implements a lot of URLs and request methods to interact with them. It integrates with Django's ORM so that you get a REST API that you can use to create and update database objects.

Subclassing questions

When you subclass a class, this is what you might need to know:

  • What base classes are there?
  • What methods can you override?
  • When you override a method, can you call other methods on self (this) or not? Is there is a particular order in which you are allowed to call these methods?
  • Does the base class provide an implementation of this method, or is it really empty?
  • If the base class provides an implementation already, you need to know whether it's intended to be supplemented, or overridden, or both.
  • If it's intended to be supplemented, you need to make sure to call this method on the superclass in your implementation.
  • If you can override a method entirely, you may need to know what methods to use to to play a part in the framework -- perhaps other methods that can be overridden.
  • Does the base class inherit from other classes that also let you override methods? when you implement a method, can it interact with other methods on these other classes?

Trade-offs

Many object-oriented languages support inheritance as a language feature. You can make the subclasser implement multiple related methods. It seems obvious to use inheritance as a way to let applications use and configure the framework.

It's not surprising then that this design is very common for frameworks. But I try to avoid it in my own frameworks, and I often am frustrated when a framework forces me to subclass.

The reason for this is that you as the application developer have to start worrying about many of the questions above. If you're lucky they are answered by documentation, though it can still take a bit of effort to understand it. But all too often you have to guess or read the code yourself.

And then even with a well designed base class with plausible overridable methods, it can still be surprisingly hard for you to do what you actually need because the contract of the base class is just not right for your use case.

Languages like Java and TypeScript offer the framework implementer a way to give you guidance (private/protected/public, final). The framework designer can put hard limits on which methods you are allowed to override. This takes away some of these concerns, as with sufficient effort on the part of the framework designer, the language tooling can enforce the contract. Even so such an API can be complex for you to understand and difficult for the framework designer to maintain.

Many languages, such as Python, Ruby and JavaScript, don't have the tools to offer such guidance. You can subclass any base class. You can override any method. The only guidance is documentation. You may feel a bit lost as a result.

A framework tends to evolve over time to let you override more methods in more classes, and thus grows in complexity. This complexity doesn't grow just linearly as methods get added, as you have to worry about their interaction as well. A framework that has to deal with a variety of subclasses that override a wide range of methods can expect less from them. Too much flexibility can make it harder for the framework to offer useful features.

Base classes also don't lend themselves very well to run-time dynamism - some languages (like Python) do let you generate a subclass dynamically with custom methods, but that kind of code is difficult to understand.

I think the disadvantages of subclassing outweigh the advantages for a framework's external API. I still sometimes use base classes internally in a library or framework -- base classes are a lightweight way to do reuse there. In this context many of the disadvantages go away: you are in control of the base class contract yourself and you presumably understand it.

I also sometimes use an otherwise empty base class to define an interface, but that's really another pattern which I discuss next.

Pattern: interfaces

The framework provides an interface that you as the application developer can implement. You implement one or more methods that the framework calls.

Fictional example

from framework import Form, IFormBackend

class MyFormBackend(IFormBackend):
    def load(self):
        ... application code to load the data here ...

    def save(self, data):
        ... application code save the data somewhere ...

my_form = Form(MyFormBackend())

Real-world example: Python iterable/iterator

The iterable/iterator protocol in Python is an example of an interface. If you implement it, the framework (in this case the Python language) will be able to do all sorts of things with it -- print out its contents, turn it into a list, reverse it, etc.

class RandomIterable:
    def __iter__(self):
         return self
    def next(self):
        if random.choice(["go", "stop"]) == "stop":
            raise StopIteration
        return 1

Faking interfaces

Many typed languages offer native support for interfaces. But what if your language doesn't do that?

In a dynamically typed language you don't really need to do anything: any object can implement any interface. It's just you don't really get a lot of guidance from the language. What if you want a bit more?

In Python you can use the standard library abc module, or zope.interface. You can also use the typing module and implement base classes and in Python 3.8, PEP-544 protocols.

But let's say you don't have all of that or don't want to bother yet as you're just prototyping. You can use a simple Python base class to describe an interface:

class IFormBackend:
    def load(self):
        "Load the data from the backend. Should return a dict with the data."
        raise NotImplementedError()

    def save(self, data):
        "Save the data dict to the backend."
        raise NotImplementedError()

It doesn't do anything, which is the point - it just describes the methods that the application developer should implement. You could supply one or two with a simple default implementation, but that's it. You may be tempted to implement framework behavior on it, but that brings you into base class land.

Trade-offs

The trade-offs are quite similar to those of callback functions. This is a useful pattern to use if you want to define related functionality in a single bundle.

I go for interfaces if my framework offers a more extensive contract that an application needs to implement, especially if the application needs to maintain its own internal state.

The use of interfaces can lead to clean composition-oriented designs, where you adapt one object into another.

You can use run-time dynamism like with functions where you assemble an object that implements an interface dynamically.

Many languages offer interfaces as a language feature, and any object-oriented language can fake them. Or have too many ways to do it, like Python.

Pattern: imperative registration API

You register your code with the framework in a registry object.

When you have a framework that dispatches on a wide range of inputs, and you need to plug in application specific code that handles it, you are going to need some type of registry.

What gets registered can be a callback or an object that implements an interface -- it therefore builds on those patterns.

The application developer needs to call a registration method explicitly.

Frameworks can have specific ways to configure their registries that build on top of this basic pattern -- I will elaborate on that later.

Fictional Example

from framework import form_save_registry

def save(data):
   ... application code to save the data somewhere ...

# we configure what save function to use for the form named 'my_form'
form_save_registry.register('my_form', save)

Real-world example: Falcon web framework

A URL router such as in a web framework uses some type of registry. Here is an example from the Falcon web framework:

class QuoteResource:
    def on_get(self, req, resp):
        ... user code ...

api = falcon.API()
api.add_route('/quote', QuoteResource())

In this example you can see two patterns go together: QuoteResource implements an (implicit) interface, and you register it with a particular route.

Application code can register handlers for a variety of routes, and the framework then uses the registry to match a request's URL with a route, and then can all into user code to generate a response.

Trade-offs

I use this pattern a lot, as it's easy to implement and good enough for many use cases. It has a minor drawback: you can't easily see that configuration is taking place when you read code. Sometimes I expose a more sophisticated configuration API on top of it: a DSL or language integrated registration or declaration, which I discuss later. But this is foundational.

Calling a method on a registry is the most simple and direct form to register things. It's easy to implement, typically based on a hash map, though you can also use other data structures, such as trees.

The registration order can matter. What happens if you make the same registration twice? Perhaps the registry rejects the second registration. Perhaps it allows it, silently overriding the previous one. There is no general system to handle this, unlike patterns which I describe later.

Registration can be done anywhere in the application which makes it possible to configure the framework dynamically. But this can also lead to complexity and the framework can offer fewer guarantees if its configuration can be updated at any moment.

In a language that supports import-time side effects, you can do your registrations during import time. That makes the declarations stand out more. This is simple to implement, but it's also difficult to control and understand the order of imports. This makes it difficult for the application developer to do overrides. Doing a lot of work during import time in general can lead to hard to predict behavior.

Pattern: convention over configuration

The framework configures itself automatically based on your use of conventions in application code. Configuration is typically driven by particular names, prefixes, and postfixes, but a framework can also inspect other aspects of the code, such as function signatures.

This is typically layered over the procedural registration pattern.

Ruby on Rails made this famous. Rails will automatically configure the database models, views and controllers by matching up names.

Fictional example

# the framework looks for things prefixed form_save_. It hooks this
# up with `myform` which is defined elsewhere in a module named `forms`
def form_save_myform(data):
   ... application code to save the data somewhere ...

Real-world example: pytest

pytest uses convention over configuration to find tests. It looks for modules and functions prefixed by test_.

pytest also goes further and inspects the arguments to functions to figure out more things.

def test_ehlo(smtp_connection):
    response, msg = smtp_connection.ehlo()
    assert response == 250
    assert 0  # for demo purposes

In this example, pytest knows that test_ehlo is a test, because it is prefixed with test_. It also knows that the argument smtp_connection is a fixture and looks for one in the same module (or in its package).

Django uses convention over configuration in places, for instance when it looks for the variable urlpatterns in a specially named module to figure out what URL routes an application provides.

Trade-offs

Convention over configuration can be great. It allows the user to type code and have it work without any ceremony. It can enforce useful norms that makes code easier to read -- it makes sense to prefix tests with test_ anyway, as that allows the human reader to recognize them.

I like convention over configuration in moderation, for some use cases. For more complex use cases I prefer other patterns that allow registration with minimal ceremony by using features integrated into the language, such as annotation or decorator syntax.

The more conventions a framework has, the more disadvantages show up. You have to learn the rules, their interactions, and remember them. You may sometimes accidentally invoke them even though you don't want to, just by using the wrong name. You may want to structure your application's code in a way that would be very useful, but doesn't really work with the conventions.

And what if you wanted your registrations to be dynamic, based on database state, for instance? Convention over configuration is a hindrance here, not a help. The developer may need to fall back to a different, imperative registration API, and this may be ill-defined and difficult to use.

It's harder for the framework to implement some patterns -- what if registrations need to be parameterized, for instance? That's easy with functions and objects, but here the framework may need more special naming conventions to let you influence that. That may lead the framework designer to use classes over functions, as in many languages these can have attributes with particular names.

Static type checks are of little use with convention over configuration -- I don't know of a type system that can enforce you implement various methods if you postfix your class with the name View, for instance.

If you have a language with enough run-time introspection capabilities such as Ruby, Python or JavaScript, it's pretty easy to implement convention over configuration. It's a lot harder for languages that don't offer those features, but it may still be possible with sufficient compiler magic. But those same languages are often big on being explicit, and convention over configuration's magic doesn't really fit well with that.

Pattern: metaclass based registration

When you subclass a framework-provided baseclass, it gets registered with the framework.

Some languages such as Python and Ruby offer meta-classes. These let you do two things: change the behavior of classes in fundamental ways, and do side-effects when the class is imported. You can do things during class declaration that you normally only can do during instantiation.

A framework can exploit these side-effects to do some registration.

Fictional example

from framework import FormBase

class MyForm(FormBase):
    def save(self, data):
        ... application code save the data somewhere ...

# the framework now knows about MyForm without further action from you

Real-world example: Django

When you declare a Django model by subclassing from its Model base class, Django automatically creates a new relational database table for it.

from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

Trade-offs

I rarely use these because they are so hard to reason about and because it's so easy to break assumptions for the person who subclasses them.

Meta-classes are notoriously hard to implement. If they're not implemented correctly, they can also lead to surprising behavior that you may need to deal with when you use the framework. Basic assumptions that you may have about the way a class behaves can go out of the door.

Import-time side-effects are difficult to control -- in what order does this happen?

Python has a simpler way to do side-effects for class declarations using decorators.

A base-class driven design for configuration may lead the framework designer towards meta-classes, further complicating the way the framework uses.

Many languages don't support this pattern. It can be seen as a special case of language integrated registration, discussed next.

Pattern: language integrated registration

You configure the application by using framework-provided annotations for code. Registrations happen immediately.

Many programming languages offer some syntax aid for annotating functions, classes and more with metadata. Java has annotations. Rust has attributes. Python has decorators which can be used for this purpose as well.

These annotations can be used as a way to drive configuration in a registry.

Fictional example

from framework import form_save_registry

# we define and configure the function at the same time
@form_save_registry.register('my_form')
def save(data):
   ... application code to save the data somewhere ...

Real-world example: Flask web framework

A real-world example is the @app.route decorator of the Flask web framework.

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

Trade-offs

I use this method of configuring software sometimes, but I'm also aware of its limitations -- I tend to go for language integrated declaration, discussed below, which looks identical to the end user but is more predictable.

I'm warier than most about exposing this as an API to application developers, but am happy to use it inside a library or codebase, much like base classes. The ad-hoc nature of import-time side effects make me reach for more sophisticated patterns of configuration when I have to build a solid API.

This pattern is lightweight to implement at least in Python -- it's not much harder than a registry. Your mileage will vary dependent on language. Unlike convention over configuration, configuration is explicit and stands out in code, but the amount of ceremony is kept to a minimum. The configuration information is co-located with the code that is being registered.

Unlike convention over configuration, there is a natural way to parameterize registration with metadata.

In languages like Python this is implemented as a possibly significant import-time side-effect, and may have surprising import order dependencies. In a language like Rust this is done by compiler macro magic -- I think the Rocket web framework is an example, but I'm still trying to understand how it works.

Pattern: DSL-based declaration

You use a DSL (domain specific language) to configure the framework. This DSL offers some way to hook in custom code. The DSL can be an entirely custom language, but you can also leverage JSON, YAML or (shudder) XML.

You can also combine these: I've helped implement a workflow engine that's configured with JSON, and expressions in it are a subset of Python expressions with a custom parser and interpreter.

It is typically layered over some kind of imperative registration system.

Fictional example

{
   "form": {
     "name": "my_form",
     "save": "my_module.save"
   }
}

We have a custom language (in this case done with JSON) that lets us configure the way our system works. Here we plug in the save behavior for my_form by referring to the function save in some Python module my_module.

Real-world example: Plone CMS framework

Pyramid and Plone both are descendants of Zope, and you can use ZCML, a XML-derived configuration language with them both.

Here is some ZCML from Plone:

<configure
    xmlns="http://namespaces.zope.org/zope"
    xmlns:browser="http://namespaces.zope.org/browser"
    i18n_domain="my.package">

  <!-- override folder_contents -->
  <configure package="plone.app.content.browser">
      <browser:page
          for="Products.CMFCore.interfaces._content.IFolderish"
          class="my.package.browser.foldercontents.MyFolderContentsView"
          name="folder_contents"
          template="folder_contents.pt"
          layer="my.package.interfaces.IMyPackageLayer"
          permission="cmf.ListFolderContents"
      />
  </configure>
</configure>

This demonstrates a feature offered by a well-designed DSL: a way to do a structured override of behavior in the framework.

Trade-offs

Custom DSLs are a very powerful tool if you actually need them, and you do need them at times. But they are also a lot more heavyweight than the other methods discussed, and that's a drawback.

A custom DSL is thorough: a framework designer can build it with very clean boundaries, with a clear grammar and hard checks to see whether code conforms to this grammar. If you build your DSL on JSON or XML, you can implement such checks pretty easily using one of the various schema implementations.

A custom DSL gives the potential for non-developers to configure application behavior. At some point in a DSL there is a need to interface with user code, but this may be abstracted away quite far. It lets non-developers reuse code implemented by developers.

A DSL can be extended with a GUI to make it even easier for non-developers to configure it.

Since code written in a DSL can be stored in a database, you can store complex configuration in a database.

A DSL can offer certain security guarantees -- you can ensure that DSL code can only reach into a limited part of your application.

A DSL can implement a declaration engine with sophisticated behavior -- for instance the general detection of configuration conflicts (you try to configure the same thing in conflicting ways in multiple places), and structured, safe overrides that are independent of code and import order. A DSL doesn't have to use such sophistication, but a framework designer that designs a DSL is naturally lead in such a direction.

A drawback of DSL-based configuration is that it is quite distant from the code that it configures. That is fine for some use cases, but overkill for others. A DSL can cause mental overhead -- the applciation developer not only needs to read the application's code but also its configuration files in order to understand the behavior of an application. For many frameworks it can be much nicer to co-locate configuration with code.

A DSL also provides little flexibility during run-time. While you could generate configuration code dynamically, that's a level of meta that's quite expensive (lots of generate/parse cycles) and it can lead to headaches for the developers trying to understand what's going on.

DSL-based configuration is also quite heavy to implement compared to many other more lightweight configuration options described.

Pattern: imperative declaration

You use a declaration engine like in a DSL, but you drive it from programming language code in an imperative way, like imperative registration. In fact, an imperative declaration system can be layered over a imperative registration system.

The difference from imperative registration is that the framework implements a deferred configuration engine, instead of making registrations immediately. Configuration commands are first collected in a separate configuration phase, and only after collection is complete are they executed, resulting in actual registrations.

Fictional example

from framework import Config

def save(data):
   ... application code to save the data somewhere ...

config = Config()
config.form_save('my_form', save)
config.commit()

The idea here is that configuration registries are only modified when config.commit() happens, and only after the configuration has been validated.

Real-world example: Pyramid web framework

From the Pyramid web framework:

def hello_world(request):
    return Response('Hello World!')

with Configurator() as config:
    config.add_route('hello', '/')
    config.add_view(hello_world, route_name='hello')

This looks very similar to a plain registry, but inside something else is going on: it first collects all registrations, and then generically detects whether there are conflicts, and generically applies overrides. Once the code exits the with statement, config is complete and committed.

Trade-offs

This brings some of the benefits of a configuration DSL to code. Like a DSL, the configuration system can detect conflicts (the route name 'hello' is registered twice), and it allows sophisticated override patterns that are not dependent on the vagaries of registration order or import order.

Another benefit is that configuration can be generated programmatically, so this allows for a certain amount of run-time dynamism without some the costs that a DSL would have. It is still good to avoid such dynamism as much as possible though, as it can make for very difficult to comprehend code.

The code that is configured may still not be not co-located with the configuration, but at least it's all code, instead of a whole new language.

Pattern: language integrated declaration

You configure the application by using framework-provided annotations for code. This configuration is declarative and does not immediately take place.

Language integration declaration looks like language integrated registration, but uses a configuration engine like with imperative declaration.

Fictional example

from framework import Config

config = Config()

# we define and configure the function at the same time
@config.form_save('my_form')
def save(data):
   ... application code to save the data somewhere ...

# elsewhere before application starts
config.commit()

Real-world example: Morepath web framework

My own Morepath web framework is configured this way.

import morepath

class App(morepath.App):
    pass

@App.path(path='/hello')
class Hello(object):
    pass

@App.view(model=Hello)
def view_get(self, request):
    return "Hello world!"

Here two things happen: an instance of Hello is registered for the route /hello, and a GET view is registered for such instances. You can supply these decorators in any order in any module -- the framework will figure it out. If you subclass App, and re-register the /hello path, you have a new application with new behavior for that path, but the same view.

Trade-offs

I like this way of configuring code very much, so I built a framework for it.

This looks very similar to language-integrated registration but the behavior is declarative.

It's more explicit than convention over configuration, but still low on ceremony, like language-integrated registration. It co-locates configuration with code.

It eliminates many of the issues with the more lightweight language-integrated registration while retaining many of its benefits. It imposes a lot of structure on how configuration works, and this can lead to useful properties: conflict detection and overrides, for instance.

It's a lot more heavy-weight than just passing in a callback or object with an interface -- for many frameworks this is more than enough ceremony, and nothing beats how easy that is to implement and test.

You can't store it in a database or give it to a non-programmer: for that, use a DSL.

But if want a configuration language that's powerful and friendly, this is a good way to go.

It's a lot more difficult to implement though, which is a drawback. If you use Python, you're in luck: I've implemented a framework to help you build this, called Dectate. My Morepath web framework is built on it.

In Dectate, import-time side-effects are minimized: when the decorator is executed the parameters are stored, but registration only happens when commit() is executed. This means there is no dependence on run-time import order, and conflict detection and overrides are supported in a general way.

Conclusion

I hope this helps developers who have to deal with frameworks to understand the decisions made by these frameworks better. If you have a problem with a framework, perhaps I gave you some arguments that lets you express it better as well.

And if you design a framework -- which you should do, as larger applications need frameworks to stay coherent -- you now hopefully have some more concepts to work with to help you make better design decisions.

Secret Weblog Highlights

This is an old blog by now. I started it in 2005. But I'm not old! No way!

Over the years I wrote a lot of stuff. Sprinkled throughout are entries that I think are still relevant. So if you'd like, join me in my little journey through the history of my secret weblog. Warning: it's mostly about software development in one way or another.

In a few places I will brag about my uncanny ability to invent future web development trends just in time -- around the same time other people are more successfully inventing them.

What is Pythonic?

What is Pythonic? from 2005 is one of the earliest entries in my blog, and one of the most popular ones. It's because it answers a question many who learn Python will ask: what the heck does "Pythonic?" mean?

Programming

Under-engineering, over-engineering, right-engineering talks about the careful balancing act we have to do as developers: what's the Goldilocks zone of engineering complexity for a particular problem?

Debugging Strategy: easy stuff first gives the following advice: even though the bug can't possibly be caused by a thing, if that thing is quick and easy to check, check anyway.

The Story of None is a series of posts about how to deal with None/null/undefined in software. It touches on guard clauses, validation and normalization.

Life at the Boundaries: Conversion and Validation goes into application layering and what we do at the boundaries. A lot of creative software development is establishing and guarding boundaries in applications and frameworks.

Punctuated Equilibrium in Software is a case study of the conceptual changes that happened over the course of a few years in one of the software libraries I wrote.

And just recently I wrote Refactoring to Multiple Exit Points.

Python 3 transition

Last year (2018), I read an article on lwn that said the following:

The switch from Python 2 to 3 is a huge job; one might guess that it is orders of magnitude larger than anyone had anticipated back in the heady days of Python 3000 (around 2007, say).

Anyone, you say? In 2007, I wrote Brief Python 3000 thoughts. The Greek myth of Cassandra resonates with me now. I am glad the end of this transition now appears to be in sight at last.

I think the Gravity of Python 2 (from 2014) is the most insightful article of the bunch I wrote on this topic - it talks about the invisible hands that were holding back Python 3 adoption. I think these forces apply in any widely used foundational software that breaks compatibility.

Modern client-side times

In an otherwise not very relevant article in 2009, I wrote this:

If I can count techniques I've been trying to pioneer myself: Template-driven development where the web browser renders the templates. This along with the notion of client-side views can lead to surprisingly clean rich client-side apps.

Client-side templates and views are now a huge thing on the web. But when I wrote the article, Backbone and Angular were almost a year away still.

In 2011, I wrote about my experiment in 2003, when I first tried to build a client-side template language:

I told other developers about it, and they all asked "why?". My answer was something like "I don't know man, it's just cool!"

(my apologies for the gendered language)

In the article I discuss how client-side frameworks affect the architecture of web applications.

In Modern Client-Side Times I explore what the client-side revolution means for the backend web framework.

I like writing retrospectives. If you like reading them, you can read my Seven Years: A Very Personal History of the Web, from 2017.

Open Source

In The importance of communication I tell the tale of how making lots of noise helped me in open source. I do make a lot of noise. I admit it can be a bit much. But it has also benefited me and it may benefit you.

I discuss how to handle ideas when they are offered to an open source project.

In a massive multi-part retrospective on the epic story of the rise and fall of Zope, I describe the evolution of that ancient web framework and my involvement in it. It's a story that's Dan Abramov Approved (tm)!

Names are important. I tell you how not to name software.

I also explain why I think open source projects shouldn't have "contrib" directories.

In 2015 I write about the history of reselect, a popular JS library I accidentally helped to call into existence.

Miscellaneous topics

On occasion I've also written about topics on my blog that aren't about software development.

In 2014's They say something I don't like so they must be lying! I observe how human behavior can make communities fight and how to compensate. The article ends like this:

Not everyone is well-intentioned. There are real liars, trolls, manipulators and psychopaths out there. There are those among us who want to try to fan the flames for their own amusement. I think being generous to others in our interpretations can reduce their power to do so. Maybe I'll talk a bit more about this in the future.

I haven't, yet. But it is, unfortunately, relevant.

Finally, in The Incredible Drifting Cyber I go into the wild evolution of the prefix cyber, and how it became unfun.

Conclusion

And this concludes the tour. I hope you enjoy reading some of my articles!

Refactoring to Multiple Exit Points

Introduction

Functions should have only a single entry point. We all agree on that. But some people also argue that functions should have a single exit that returns the value. More people don't seem to care enough about how their functions are organized. I think that makes functions a lot more complicated than they have to be. So let's talk about function organization and how multiple exit points can help.

I'm going to use Python in the examples, but these examples apply to many other languages such as JavaScript and Ruby as well, so do keep reading.

Starting point

Let's consider the following function:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            if item.match == "A":
                result = item.payload
            elif item.match == "B":
                continue
            else:
                if item.other == "C":
                    result = item.override
                else:
                    result = bar
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

It's a silly function, it's a hypothetical function, but there are plenty of functions with this kind of structure. They might not be born this way, but they've certainly grown into it. I find them difficult to follow. You can recognize them by one symptom already: quite a bit of indentation. You can also recognize them by trying to trace what happens in them; notice how your working memory fills up quickly.

Extract function from loop body

How would we go about refactoring it? The first step I would take is to extract the loop body into a separate function. You may say, why do so? Objections could be:

  • The loop body isn't reused in multiple places, so why should it be a function?
  • You have to manage function parameters whereas before all was conveniently available in the body of foo.

That is all so, but let's do it anyway and see what happens, and then get back to this in the end:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

def process_item(item, bar):
    if item.match == "A":
        result = item.payload
    elif item.match == "B":
        result = None
    else:
        if item.other == "C":
            result = item.override
        else:
            result = bar
    return result

We've had to extract two parameters - item and bar. It turns out process_item doesn't care about default. We've had to convert the continue to a result = None to keep things working properly, as now we always run into the if result is not None check whereas before we did not.

Multiple exit points

We notice that result is only touched once in each code path in process_item. This means we can convert the function to use multiple exit points with the return statement, so let's do that:

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    elif item.match == "B":
        return None
    else:
        if item.other == "C":
            return item.override
        else:
            return bar

Convert to guard clauses

That's still more complicated than it should be. Since we have early exit points, we can get rid of the elif and else clauses:

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    if item.match == "B":
        return None
    if item.other == "C":
        return item.override
    else:
        return bar

Some indentation is gone, which is a good sign. And we see another else we can get rid of now:

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    if item.match == "B":
        return None
    if item.other == "C":
        return item.override
    return bar

Pay attention to None

I think the return None case is special, so let's move that up. That's safe as A and B for item.match are mutually exclusive and this function has no side effects:

def process_item(item, bar):
    if item.match == "B":
        return None
    if item.match == "A":
        return item.payload
    if item.other == "C":
        return item.override
    return bar

This function is now a lot more regular. If you read it past return None you can forget about the case where item.match == "B", and then forget about the case where item.match == "A", and then forget about the case where item.other == "C". In the original version that was a lot harder to see.

Why pay attention to None?

This last reorganization of the guard clauses may seem like a useless action. But I pay special attention to None (or null or undefined or however your language may name the absence of value). If you organize the guard clauses that deal with None to come earlier, it makes your functions more regular and thus more easy to read.

It also triggers you to consider whether perhaps item.match == "B" is something you can handle at the call site, which can lead to further refactorings. Later we'll consider that further in a bonus refactoring.

Languages that have an Option or Maybe type such as Haskell and Rust make this more obvious and have special ways to handle these cases -- the language forces you. TypeScript also tracks tracks null/undefined in its type system. But in many other languages, such as Python, we're on our own. But we certainly still have to pay attention to None.

See also my the Story of None.

Back to process_items

Now let's look at the process_items function again:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

Multiple exit points

Let's first transform this so we return early when we can:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    else:
        return "No bar"
    if result is None:
        return default
    return result

Flip condition to create a guard

We can see clearly that "No bar" is returned if bar is None, so let's flip that condition:

def process_items(items, bar, default):
    result = None
    if bar is None:
        return "No bar"
    else:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    if result is None:
        return default
    return result

We can now see the else clause is not needed anymore, so let's unindent the for loop. We also move result = None below that guard clause for bar is None, as it's not needed until that point:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    result = None
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            break
    if result is None:
        return default
    return result

So it turns out in the rest of the function we can completely forget about bar being None. That's good. Maybe that guard can even be removed if we can somehow guarantee the non-None nature of bar at the call site. But we can't determine that in this limited example. Let's go on refactoring this function a bit more.

Turn loop break into early return

We take a look at the break. If result is not None, we break. Then after that we check if result is None. This can only happen if the loop never breaked. If the loop did break we end up returning result.

So we can just as well do the return result immediately in the loop:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    result = None
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    if result is None:
        return default
    return result

Let's look at the bit of code past the end of the loop again. We know that result has to be None if it reaches there. It's initialized to None and the loop returns early if it's ever not None. So why do we even check whether result is None anymore? We can simply always return default:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    result = None
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    return default

We have no more business setting result to None before the loop starts. It's a local variable within the loop body now:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    return default

In review

Let's look at where we started and ended.

We started with this:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            if item.match == "A":
                result = item.payload
            elif item.match == "B":
                continue
            else:
                if item.other == "C":
                    result = item.override
                else:
                    result = bar
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

And we ended with this:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    return default

def process_item(item, bar):
    if item.match == "B":
        return None
    if item.match == "A":
        return item.payload
    if item.other == "C":
        return item.override
    return bar

The second version is much easier to follow, I think. (it's also a few lines less code, but that's not that important.)

In defense of single-use functions

So we created a process_item function even though we only use it in one place. Earlier asked why you would do such a thing. What benefits does that have?

  • We could convert the function to use guard clauses, removing a level of nesting and letting us come up with followup refactoring steps that simplified our code.
  • It's clearer to see what actually really matters in the loop and what doesn't, as it's spelled out in the parameters of the function.
  • We gave what happens in the for loop a name. process_item doesn't say much in this case, but in a real-world code base your function name can help you read your code more easily.
  • Maybe we'll end up reusing it after all!

It also can lead to interesting future refactorings as it's easier to see patterns. If you do OOP for instance, you may end up with a group of functions that all share the same set of arguments and this would suggest creating a class with methods. But let's leave OOP be and consider None.

A possible followup refactoring

We know bar cannot be None when process_item is called -- see our guard clause. If we know (or find a way to guarantee) that item.payload and item.override can never be None either, we can do this:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    for item in items:
        if item.match != "B":
            return process_item(item, bar)
    return default

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    if item.other == "C":
        return item.override
    return bar

Which then leads to the question whether we should filter items with item.match != "B" before they even reach process_items in the first case -- another potential refactoring.

All of these refactorings require knowledge of what's impossible in the code and the data -- its invariants. We don't know this in this contrived example. But in a real code base, you can find out. A static type system can help make these invariants explicit, but that doesn't mean that in a dynamically typed language we should forget about them.

Yes, I'm saying the same as what I said about None before -- whether something is nullable is an important example of an invariant.

Conclusion

It's sometimes claimed that not only should a function have a single entry point, but that it should also have a single exit. One could argue such from sense of mathematical purity. But unless you work in a programming language that combines mathematical purity with convenience (compile-time checked match expressions help), that point seems moot to me. Many of us do not. (and no, we can't easily switch either.)

Another argument for single exit points comes from languages like C, where you have to free memory you allocated in the end before you exit a function, and you want to have a single place where you do the cleanup. But again that's irrelevant to many of us that use languages with automated garbage collection.

I've hope to have shown to you that for many of us, in many languages, multiple exit points can make code a lot more clear. It helps to expose invariants and potential invariants, which can then lead to followup refactorings.

P.S. If you like this content, consider following @faassen on Twitter. That's me! Besides many other things, I sometimes talk about code there too.

mstform: a form library for mobx-state-tree

Introduction

So I've written a new web form library: mstform.

The first form library I ever wrote was called ZFormulator and I released it in 1999. My first frontend form library was Obviel Forms and it I wrote it in 2012 or so. I've been at it for a while.

Much has changed in web development since 1999, and I've learned a thing or two along the way as well. So a little while ago, almost against my will (though I admit I also enjoy it a lot), I helped to create another form library, one for the frontend again. It integrates with mobx-state-tree. You'd probably use React to render the form, though you don't have to.

A whirlwind introduction to mobx-state-tree

So what, you may ask, is mobx-state-tree, and why should I want to use it in my React application? mobx-state-tree is a state management library - it manages the state you show in your frontend UI. It's built on top of another React state management library named MobX. MobX is magic. You define your model classes, and you mark a few properties and methods so that MobX can observe them. You also mark your React components as observers. It's easy. After that when you change your model instances your React UI will automatically and efficiently update itself.

As I said, mobx-state-tree (MST to make it less of a mouthful) is built on MobX. The way you mark your React components as observers is identical. What it then adds is the following: it forces you to define your models in a very particular way and then it gives you a ton of features in return.

Here's a tiny example:

const Animal = types
  .model("Animal", {
    name: types.string,
    hunger: types.integer
  })
  .views(self => ({
    get isSated() {
      return self.hunger === 0;
    }
  }))
  .actions(self => ({
    feed(amount) {
      self.hunger -= amount;
      if (self.hunger < 0) {
        self.hunger = 0;
      }
    }
  }));

Here we have a model Animal. It has two properties, name and hunger. There's also a special view property we have defined with a getter, isSated. We've also defined an action which manipulates the properties.

MST then gives you some interesting features:

  • You get run-time checks if you put in the wrong type of value. With TypeScript you also get compile-time checks.
  • You can automatically serialize this to JSON and deserialize it from JSON.
  • You can install hooks to monitor any changes to properties.
  • There is a mechanism for defining references between models, and the JSON serializer works with it.

mstform builds on MST to help you create web forms.

Rendering a web form: you're on your own

In the past, for a library like Formulator or Grok or Obviel Forms, I made sure it could render your form. That's nice to have when you have simple forms - you just write a form description or perhaps a schema, and boom, the system automatically shows a form. Django's form system works like that too.

The problem with automatically rendering a form is that any complex form will have special UI. Perhaps you want to display a piece of text between two widgets. Perhaps you want to render repeating sub-forms in a certain way. Perhaps you want to layout the form so that it responds to the width of the browser window. I don't know. If you automatically render a form and that's the default way of using the form library, the form library has to supply concepts and hooks so that you can influence the form rendering process. These hooks are typically difficult to use for the developer and are incomplete, as it's difficult to anticipate everything.

With mstform, I let all of that go. It doesn't try to render forms. It isn't in control of rendering your form at all. It would be fairly straightforward to build something on top of mstform that did this for you, but it's not a priority. You have React. You have your own form components or use some UI library that already provides them. mstform instead makes it easy to integrate with these components.

What does mstform do then?

If mstform doesn't render your form, what does it do then? It manages the form contents - the form state. In fact, an earlier name for this library was FormState, until I discovered there already was another library out there with that name.

mstform in its essence lets you define a form that represents a MST model. Here is an example for the Animal model we defined above:

import { Form, Field, converters } from "mstform";

const form = new Form(Animal, {
    name: new Field(converters.string),
    hunger: new Field(converters.integer)
});

You can then use this form to manage the state of instances of that model:

// use MST to create an instance of an animal
const elephant = Animal.create({name: "Elephant", hunger: 3});

// now make a form state for elephant
const formState = form.state(elephant);

You can access fields from the form state:

const nameField = formState.field('name');

Once you have a field, you can render it with React. Here is how you could render an input text:

import { Component }, React from "react";
import { observer } from "mobx-react";

@observer
class Input extends Component {
    render() {
        const { type, field } = this.props;
        return <input type={type} {...field.inputProps} />;
    }
}

You can do a lot of other things with the field accessor too: get its error message with field.error, check whether the field is required with field.required, check whether the field is empty with field.isEmpty and so on.

What does that give you?

Why should I use MST with mstform instead of rolling my own, you may ask?

Here we come to features of mstform:

Convert form input
The form input is often just a bunch of texts, but you have a data model underneath. For instance, if I enter an integer in a text input it is a string. mstform converts this into an integer.
Show edit forms
Say you already already have form content, because you just loaded it from the backend and deserialized it for instance, you need to convert that content back again to its display state. For instance, render an integer as a string so it can be displayed in a text input.
Client-side validation
Even after the conversion was successful, you want to validate that the value fulfills some criteria, such as being within a certain range. You still need to do validation on the backend to be sure, but this way you can show errors right away.
Error handling
An easy way to show errors, both in conversion and validation. We can also show backend-generated errors, to which we will get in a bit.
Repeated forms and sub forms
You can express nested and repeated structures in your models easily in the form.
Backend integration
You can define how your form is saved to the backend. You can only submit a form if it's valid. The backend can return validation errors and warnings which can be displayed in the form.
Server-side validation
You can also set it up so that your backend generates validation errors during user interaction. This way the backend can remain in charge of validation as the single source of truth, while you still dynamically display errors right away.
Form access states
Make your fields required, disabled, hidden and read-only. Disable individual fields or a whole sub-form at once. You have to write rendering code that can handle these states as mstform doesn't do rendering, but once you have it your forms become very flexible indeed.
Modify the underlying object
You can modify the underlying MST model instance with your own code and the form automatically updates itself. Modify a field value, add a new repeated item, the works. It's just that simple.
Derived values between fields
A field can have a default value that is based on the values filled in with other fields.
Support for different kinds of React input components
Some components get the value with onChange, others get the value as event.target.value. Some components display their value with value, others with checked. mstform converters have defaults, but you can override them to suit your needs.
Support for internationalized decimal input
Decimal numbers tend to have quite a few differences between countries. For instance, in the US you use periods for the fraction, and commas for the thousands, but in the Netherlands it's the other way around. mstform has a special decimal parser built-in which takes care of that.
TypeScript support
mstform is written in TypeScript and exports TypeScript type definitions for your development convenience.

For more, consult the mstform documentation.

Conclusion

Setting up a form with mstform takes a bit of effort, though it's not difficult. Once you do, a lot of code you might write as custom and failure-prone React components goes away, while you retain a lot of flexibility. Your users edit a form, and a proper MST instance sits under it. You can immediately sends its JSON to the backend. You can load up new form contents from the backend just as easily.

Is this the last form library I will ever write? Experience teaches me not to make such promises. But mstform does pack a lot of my experience with web forms in it. I hope you'll give it a try!

Credits

mstform was created by myself and team of people at a customer of mine named ISProjects. I am very grateful to ISProjects for their great support in this and quite a few other interesting projects I got to work on for them. And they're looking for developers!

Seven Years: A Very Personal History of the Web

Introduction

Humans are storytellers. As anyone who knows me can confirm, I definitely enjoy the activity of telling stories. We process and communicate by telling stories. So let me tell you all a story about my life as a web developer the last 7 years. It may just be self-reflection, but hopefully it's also useful for some others. Perhaps we can see my story as a highly personal history of the web.

I always say that what pulls me to software development most is creativity. I am creative; I can't help myself. I enjoy thinking about creativity. So this is also going to be about my creative trajectory over the last 7 years.

Why 7 years? Because in early 2010 I decided to withdraw myself from a software development community I had been involved in for about 12 years previously, and I took new paths after that. It is now early 2017; time to take stock.

Letting go of an intense involvement with a software development project, up to the point where it became part of my identity, was difficult. Perhaps that's a geeky thing to say, but so it is. I needed to process it.

Life after Zope

Zope was a web framework before that concept existed. The modern web framework only materialized sometime in the year 2005, but Zope had been ahead of its time. I was involved with the Zope community from when it was first released, in 1998. I learned a lot from Zope and its community. I made many valued connections with people that last until this day.

Zope helped shape who I am and how I think, especially as a web developer. In 2013 I wrote a retrospective that went into the history of Zope and my involvement with it.

But I did not just process it by writing blog posts. I also processed it creatively.

Frameworks

So I am a web developer. Back in 2010 I saw some people argue that the time of the web framework had passed. Instead, developers should just gather together a collection of building blocks and hook it up to a server using a standard API (WSGI in the case of Python). This would provide more flexibility than a framework ever could.

In a "X considered Y" style post I argued that web frameworks should be considered useful. Not that many people needed convincing, but hey, why not?

I wrote:

The burden of assembling and integrating best of breed components can be shared: that's what the developers of a framework do. And if you base your work on a pre-assembled framework, it's likely to be less work for you to upgrade, because the framework developers will have taken care that the components in the framework work together in newer versions. There is also a larger chance that people will write extensions and documentation for this same assembly, and that is very likely something you may benefit from in the future.

I follow the ReactJS community. The React community definitely is a community that lets you self-assemble a framework out of many parts. This gives that ecosystem flexibility and encourages creativity -- new approaches can be tried and adopted quickly. I like it a lot.

But figuring out how to actually start a React-based project had become a major effort. To get a good development platform, you needed to learn not only about React but also about a host of packaging and build tools: CommonJS and Webpack and npm and Babel and so on. That's quite intimidating and plain work.

So some React developers realized this and created create-react-app which makes it easy to start a working example, with minimal boilerplate code, and with suggestions on how to expand from there. It's a build framework for React that gathers good software in one place and makes it easy to use. It demonstrates how frameworks can make life easier for developers. It even goes a step further and allows you to opt out of the framework once you need more control. Now that's an interesting idea!

Client-side JS as Servant to the Server UI

So frameworks are useful. And in late 2010 I had an idea for a new one. But before I go into it, I will go on a digression on the role of client-side JavaScript on the web.

This is how almost all JavaScript development used to be done and how it's still done today in many cases: the server framework generates the HTML for the UI, and handles all the details of UI interaction in request/response cycles. But sometimes this is not be enough. More dynamic behavior on the client-side is needed. You then write a little bit of JavaScript to do it, but only when absolutely necessary.

This paradigm makes JavaScript the ugly stepsister of whatever server server programming language is used; a minor servant of Python, Ruby, Java, PHP or whatever. The framework is on the server. JavaScript is this annoying thing you have to use; a limited, broken language. As a web developer you spend as little as time possible writing it.

In short, in this paradigm JavaScript is the servant to the server, which is in charge of the UI and does the HTML generation.

But JavaScript had been gathering strength. The notion of HTTP-based APIs had attracted wide attention through REST. The term AJAX had been coined by 2005. Browsers had become a lot more capable. To exploit this more and more JavaScript needed to be written.

jQuery was first released in 2006. jQuery provided better APIs over sucky browser APIs, and hid incompatibilities between them. Its core concept is the selector: you select things in the web page so you can implement your dynamic behavior. Selectors fit the server-dominant paradigm very well.

Client-side JS as Master of the UI

By 2010, the notion of single-page web application (SPA) was in the air. SPAs promised more powerful and responsive UIs than server-side development could accomplish. The backend is a HTTP API.

This is a paradigm shift: the server framework lets go of its control of the UI entirely, and client-side JavaScript becomes its new master. It encourages a strong separation between UI on the client and business logic on the server. This brings a big benefit: the unit of UI reuse is on the client, not spread between client and server. This makes reuse of UI components a lot easier.

By 2010, I had played with client-side template languages already. I was about to build a few large web applications, and I wanted them to be dynamic and single page. But client-side JavaScript could easily become a mess. I wanted something that would help organize client-side code better. I wanted a higher-level framework.

The idea

So there we finally get to my idea: to create a client side web framework, by bringing over concepts from server frameworks to the client, and to see what happens to them. Cool things happen! We started with templates and then moved to MVC. We created a notion of components you could compose together. We created a client-side form library based on that. In 2011 we released this as Obviel.

For a little while in 2010, early 2011, I thought I was the only one with this cool idea of a client-side web framework. It turns out that I was not: it was a good idea, and so many people had the same idea at about the same time. Even before we released Obviel, I started to hear about Backbone. Ember and Angular soon followed.

I continued working on Obviel for some time. I created a template language with a built-in i18n system for it, and a client-side router. Almost nobody seemed to care.

In 2011 and 2012 we built a lot of stuff with Obviel. In the beginning of 2013 those projects were done. Obviel didn't get any traction in the wider community. It was a project I learned a lot from, so I don't regret it. I can claim deep experience in the area of client-side development.

React

I went to my first JS conference in September of 2013. I had originally submitted a talk about Obviel to it, but it wasn't accepted. Everybody was promoting their shiny new client-side framework by that time.

So was Facebook. Pete Hunt gave a talk about React. This was in fact only the second time React had been introduced to a wider audience. Apparently it went over a lot better than the first time. It certainly made an impression on me: there were some fascinating new ideas in it. The React community became ferment with fascinating new ideas. At the conference I talked to people about another idea I'd had: a client framework that helps coordinate client/server communication; maybe sort of like a database, with transactions that commit UI state to the server? Nobody seemed to know of any at the time. Uh oh. If nobody has the idea at the same time, then it might be a bad one?

Then from the React community came Flux and Redux and Relay and Mobx. I let go of Obviel and started to use React. There is a little irony there: my move to client-side frameworks had started with templates, but React actually let go of them.

The Server in Modern Client-side times

In early 2013 I read an interesting blog post which prompted me to write Modern Client-Side Times, in which I considered the changed role of the server web framework if it was to be the servant to JavaScript instead of its master.

I wrote a list of what tasks remain for the server framework:

What remains is still significant, however:

  • serving up JSON on URLs with hyperlinks to other URLs
  • processing POSTs of JSON content (this may include parts of form validation)
  • traversal or routing to content
  • integrating with a database (SQLAlchemy in this case)
  • authentication - who are you?
  • authorization - who can access this URL?
  • serve up an initial web page pulling in a whole bunch of static resources (JS, CSS)

I also wrote:

Much of what was useful to a server side web framework is still useful. The main thing that changes is that what goes over the wire from server to client isn't rendered HTML anymore. This is a major change that affects everything, but much does stay the same nonetheless.

I didn't know at the time of writing that I would be working on just such a server web framework very soon.

On the Morepath

In 2013 I put some smaller pieces I had been playing with for a while together and created Morepath, a server Python web framework. I gave an over-long keynote at PyCON DE that year to describe the creative processes that had gone behind it. I gave a more focused talk at EuroPython 2014 that I think works better as an introduction.

I announced Morepath on my blog:

For a while now I've been working on Morepath. I thought I'd say a bit about it here.

Morepath is a Python web micro-framework with super powers. It looks much like your average Python micro-framework, but it packs some seriously power beneath the hood.

One of the surprises of Morepath was the discovery that a web framework that tries to be good at being a REST web server actually works very well as a server web framework as well. That does make sense in retrospect: Morepath is good at letting you build REST services, therefore it needs to be good at HTTP, and any HTTP application benefits from that, no matter whether they render their UI on the client or the server. Still, it was only in early 2015 that Morepath gained official support for server templates.

2014 was full with Morepath development. I announced it at EuroPython. It slowed down a little in 2015, then picked up speed again in 2016.

I'm proud that Morepath is micro in implementation, small in its learning surface, but macro in power. The size of Morepath is another surprise: Morepath itself is currently a little over 2000 lines of Python code, but it does a lot, helped by the powerful Reg (<400 lines) and Dectate (<750 lines) libraries. Morepath offers composable, overridable, extensible applications, an extensible configuration system, an extensible view dispatch system, automated link generation, built-in powerful permission rule system, and lots more. Morepath is like Flask, but with a nuclear fusion generator inside. Seriously. Take a look.

The Future?

Over the last few years Morepath has become a true open source project; we have a small team of core contributors now. And in late 2016 Morepath started to gain a bit more attention in the wider world. I hope that continues. Users that turn into contributors are invaluable for an open source project.

There was a mention of Morepath in an Infoworld article, I was interviewed about it for Podcast__init__, and was also interviewed about it for an upcoming episode of Talk Python to Me.

Ironically I've been writing some Django code lately. I'm new to Django (sort of). I have been reintroduced to the paradigm I started to leave behind 7 years ago. With standard Django, the server rules and JavaScript is this adjunct that you use when you have to. The paradigm works, and for some projects it may be the best approach, but it's definitely not my preferred way to work anymore. But I get to help with architecture and teach a bit so I'll happily take the Django on board.

The Django management UI is cool. It makes me want to implement the equivalent for Morepath with PonyORM and React and Mobx. Or something. Want to help?

I've been itching to do something significant on the client-side again. It's been a little while since I got to do React. I enjoyed attending React Europe 2015 and React Europe 2016. I played with React Native for a bit last year. I want to work with that stuff again.

The space where client and server interacts is fertile with creative potential. That's what I've found with Obviel and React on the client, and with Morepath on the server side. While GraphQL replaces the REST paradigm that Morepath is based around (oops!), I'd enjoy working with it too.

Where might the web be going? I like to think that by being creative I sometimes get to briefly peek into its near future. I hope I can continue to be creative for the next 7 years, as I really enjoy it.

I'm a freelancer, so the clients I work for in part shape my creative future. Hint. Let me know if you have something interesting for me to work on.

Looking for new challenges

Fair warning: In this blog post I aim to sell myself. I'm looking for an exciting and challenging new freelance engagement.

I'm a software developer and I have been one professionally for about 20 years now. I have deep experience, and I continue to learn and create. I know what real-world codebases look like, and I know software development is also about people. I think I can offer lot of value. I can develop software for you, and I can also help you improve the way you develop software.

I see myself as a creative developer -- I want to invent things to improve life for myself and others. Creativity is what attracts me to software development the most. Creativity is transformative. If you need a bit of transformation in your software, talk to me.

I'm a web application developer. I've focused on web development since the late 90s. In that period I've seen the web grow from static websites and with a few server-side Perl scripts thrown in to the dynamic application platform it is today. I started developing web applications with the server-side, the only game in town then, but over the last 10 years I've shifted more and more to the client-side and JavaScript, where much of the creativity is today. I am however still very much at home on the server as well. I've done React, I've done REST, I've done hypermedia APIs, I've dug into GraphQL. If you need a web developer with deep insight in the whole stack, look no further.

I've focused on Python for the better part of my software development career. I came to Python early; I have seen Python grow from a small language with no name recognition to the enormously popular language it is today. I greatly enjoy using it. I've also criticized it where I felt it was needed, painful as it was. If you need a very experienced Python developer, contact me.

But in the last decade I've started to use JavaScript more and more. Thanks to the React community I've learned a few new things about functional programming; it's exciting to see it become relevant to the web. I haven't stopped looking at interesting new languages. I'm a developer who can look beyond a single programming language.

I have repeatedly demonstrated I can take a large piece of software and transform it:

  • Back in the day, I rejuvenated the Zope web framework with Five, a technology that is still in use by the Plone CMS today.
  • I took libxml2, a huge C library that was difficult to use for Python developers, and created the most powerful XML library in the Python world: lxml. I used the predecessor to Cython to do this (in 2004!).
  • I created a simpler, better way to use Zope with Grok. I've helped add web capabilities to existing codebases with a deep history.

I can help you open up your codebase to new possibilities.

I can build on other people's foundations, but I can also build new foundations. In the last few years I've created Morepath, a web framework that compresses the power I expect into a few thousand lines of Python code.

I'm not afraid to say I've also created many a thing that went nowhere. In 2010 I created Obviel, a client-side web framework. I took concepts like model/view separation and templates and a form library and brought them to JavaScript and the client. I thought I was doing something new in 2010, something that people hadn't really thought of yet. It turns out everybody had the same idea about the same time. Backbone burst upon the scene and Obviel never got any traction. Now I'm a frontend web framework hipster; I was doing them before it was cool. It was worth it, because I gained deep insights in what makes a frontend framework tick.

These days I prefer to use React for front-end application development. React is awesome. With React Native a mere web developer such as myself can even build a real mobile app. Want a developer that loves React but is also tempered by experience? Look no further.

I'm looking for a new challenge. I want to help build, create and transform. I work from home (in the Netherlands) so I can enjoy my family and garden, but I can certainly come visit you on a regular basis. I write code, I write docs, I write tests, I give talks, I give training, and I review your code. I am open, constructively critical and honest. I contribute a bit of insight here and there. My services are not cheap, but they are worth it.

In summary, here I am: a very experienced, creative web developer who looks a little bit beyond what's in front of him.

Think you have an interesting challenge for me? Please drop me a mail at faassen@startifact.com and let's talk.

Morepath 0.16 released!

I'm proud to announce the release of Morepath 0.16. Morepath_ is a Python web framework that is easy to use and lightweight but grows with you when your project demands more.

Morepath 0.16 is one of the biggest releases of Morepath in a while. I want to discuss a few of the highlights of this release here.

Reg Rewritten

Morepath uses the predicate dispatch library Reg for its view lookup system and other behavior. We've rewritten Reg once again. For most Morepath users nothing changes, except that Reg is faster which also makes Morepath faster. If you want to use Reg directly, the new registration API makes it easier to use.

With Reg you can control the context in which dispatch takes place: this allows multiple separate configurations of dispatch in the same runtime. To control context, previously we used an implicit global lookup object, or an explicit but not very Pythonic lookup argument. Those are all gone. If you need multiple dispatch contexts in an application, you can define dispatch methods which derive their context from their class. This change allowed us to simplify Reg considerably and increase its performance.

This work was done by Stefano Taschini in collaboration with myself. Thanks Stefano!

New pip-based build system

This only affects us Morepath developers, but it's a significant change, so I want to highlight it here. We have a nice core team of contributors now and I hope we can attract more, after all.

I've been a happy buildout user over the years, so of course I used it for Morepath's development setup as well. But for a Python-only project like Morepath, pip can now do what buildout does. Since many more Python programmers are familiar with pip, and we want to make it as easy as possible for someone to start contributing, we've taken the plunge and entirely replaced buildout with pip. Even a buildout guy such as myself has been appreciating the results.

We've updated our developer documentation to reflect the changes, so it's easy to find how to do common things. The build environment for the Reg and Dectate libraries were used to use pip as well.

This work was done by Henri Hulski. Thanks Henri!

Other significant changes

  • I took a good look at Traject's routing system with an eye on performance and refactored it.
  • We realized that the directive directive was a bit too magic for its own good. I changed Dectate so that new Morepath configuration directives are now defined directly on the App class using the dectate.directive function. This breaks some code if you define new directives, but it's easy to fix.
  • Our extensive documentation has had a reorganization of its table of contents.

Look at the detailed changelog for much more information, including upgrade notes.

Performance increase

I benchmarked Morepath quite frequently during this development cycle. To make benchmarking easier, I created a new benchmarking tool called howareyou. It can not only benchmark Morepath, but can also benchmark other web frameworks -- Michael Merickel has in fact been using it already to help optimize Pyramid. You can find the howareyou tool here . The origins of this tool ultimately go back to work by wheezy.web creator Andriy Kornatskyy.

Morepath uses Webob for its request and response implementation. I learned quite a lot about Webob performance characteristics during this development cycle. This allowed me to make performance tweaks in Morepath.

It also let me detect that Webob's development version had some performance regressions that affected both Pyramid and Morepath. I'm very grateful to Bert Regeer for picking up so quickly and thoroughly on my reports of performance problems in Webob, and the Webob development version is currently actually slightly faster than release 1.6.1.

I talked about Morepath's performance history recently in my article Is Morepath Fast yet?. There we had peaked at about 19000 requests per second (on a synthetic benchmark) for the development version. I am happy to announce that we've managed to increase performance even more in our 0.16 release. It's now more than 28000 requests per second!

Morepath performance over time

Let's compare Morepath with some other carefully selected frameworks:

Morepath performance compared

Cool, Morepath 0.16 is actually faster than Pyramid at this point in time! I don't expect it to last long given that the Pyramid devs already using howareyou to optimize Pyramid, but it's nice to have such a moment. And I deliberately didn't include Falcon, Bottle or wheezy.web in this comparison, as that would rather spoil the effect. Do remember these are somewhat silly, synthetic benchmarks. It's rare indeed that Python web framework overhead is going to affect real world performance, but at least Morepath isn't the slowest one, right?

Enjoy!

I hope you all enjoy the fresh new release. Do get in touch with us!

Is Morepath Fast Yet?

Morepath is a Python web framework. But is it fast enough for your purposes?

Does performance matter?

Performance is one of the least important criteria you should use when you pick a Python web framework. You're using Python for web development after all; you are already compromising performance for ease of use.

But performance makes things seem easy. It boils down the whole choice between web frameworks to a single seemingly easy to understand variable: how fast is this thing? All web frameworks are the same anyway, right? (Wrong). We don't want the speed of our application be dragged down by the web framework. So we should just pick the one that is fastest. Makes total sense.

It makes total sense until you take a few minutes to think about it. Performance, sure, but performance doing what? Performance is notoriously difficult to measure. Sending a single "hello world" response? Parsing complex URLs with multiple variables in them? HTML template rendering? JSON serialization? Link generation? What aspect of performance matters to you depends on the application you're building. Why do we worry so much about performance and not about features, anyway?

Choosing a web framework based on performance makes no sense for most people. For most applications, application code dominates the CPU time spent. Pulling stuff out of a database can take vastly more time than rendering a web response.

What matters

So it makes sense to look at other factors when picking a web framework. Is there documentation? Can it do what I need it to do? Will it grow with me over time? Is it flexible? Is it being maintained? What's the community like? Does it have a cool logo?

Okay, I'm starting to sound like someone who doesn't want to admit the web framework I work on, Morepath, is atrociously slow. I'm giving you all kinds of reasons why you should use it despite its performance, which you would guess is pretty atrocious. It's true that the primary selling point of Morepath isn't performance -- it's flexibility. It's a micro-framework that is easy to learn but that doesn't let you down when your requirements become more complex.

Morepath performance

I maintain a very simple benchmark tool that measures just one aspect of performance: how fast a web framework at the Python WSGI level can generate simple "hello world" responses.

https://github.com/faassen/welcome-bench

I run it against Morepath once every while to see how we're doing with performance. I actually care more about what the framework is doing when Morepath generates the response than I care about the actual requests per second it can generate. I want Morepath's underlying complexity to be relatively simple. But since performance is so easy to think about I take advantage of that as a shortcut. I treat performance as an approximation of implementation complexity. Plus it's cool when your framework is fast, I admit it.

The current Morepath development version takes about 5200 ms per 100,000 requests, which translates to about 19200 requests per second. Let's see how that compares to some of the friendly competition:

                ms     rps  tcalls  funcs
django       10992    9098     190     85
flask        15854    6308     270    125
morepath      5204   19218     109     80

So at this silly benchmark, Morepath is more than twice as fast as Django and more than three times faster than Flask!

Let me highlight that for marketing purposes and trick those who aren't reading carefully:

Morepath is more than 2 times faster than Django and more than 3 times faster than Flask

Yay! End of story. Well, I gave a bit of a selective view just now. Here are some other web frameworks:

                ms     rps  tcalls  funcs
bottle        2172   46030      53     31
falcon        1539   64961      26     24
morepath      5204   19218     109     80
pyramid       3920   25509      72     57
wheezy.web    1201   83247      25     23

I'm not going to highlight that Bottle is more than two times faster at this silly benchmark nor that Falcon is more than three times faster. Let's not even think about wheezy.web.

I think this performance comparison actually highlights my point that in practice web framework performance is usually irrelevant. People aren't flocking to wheezy.web just because it's so fast. People aren't ignoring Flask because it's comparatively slow. I suspect many are surprised are surprised Flask is one of the slowest frameworks in this benchmark, as it's a lightweight framework.

Flask's relatively slow performance hasn't hurt its popularity. This demonstrates my point that web framework performance isn't that important overall. I don't fully understand why Flask is relatively slow, but I know part of the reason is werkzeug, its request/response implementation. Morepath is actually doing a lot more sophisticated stuff underneath than Flask and it's still faster. That Pyramid is faster than Morepath is impressive, as what it needs to do during runtime is similar to Morepath.

Let's look at the tcalls column: how many function calls get executed during a request. There is a strong correlation between how many Python function calls there are during a request and requests per second. This is why performance is a decent approximation of implementation complexity. It's also a clear sign we're using an interpreted language.

How Morepath performance has changed

So how has Morepath's performance evolved over time? Here's a nice graph:

Morepath performance over time

So what does this chart tell us? Before its 0.1 release when it still used werkzeug, Morepath was actually about as slow as Flask. After we switched to webob, Morepath became faster than Flask, but was still slower than Django.

By release 0.4.1 a bunch of minor improvements had pushed performance slightly beyond Django's -- but I don't have a clear idea of the details. I also don't understand exactly why there's a performance bump for 0.7, though I suspect it has to do with a refactor of application mounting behavior I did around that time -- while that code isn't exercised in this benchmark, it's possible it simplified a critical path.

I do know what caused the huge bump in performance in 0.8. This marked the switch to Reg 0.9, which is a dispatch library that is used heavily by Morepath. Reg 0.9 got faster, as this is when Reg switched to a more flexible and efficient predicate dispatch approach.

Performance was stable again until version 0.11, when it went down again. In 0.11 we introduced a measure to make the request object sanitize potentially dangerous input, and this cost us some performance. I'm not sure what caused the slight performance drop in 0.14.

And then there's a vast performance increase in current master. What explains this? Two things:

  • We've made some huge improvements to Reg again. Morepath benefits because it uses Reg heavily.
  • I cheated. That is, I found work that could be skipped in the case no URL parameters are in play, as in this benchmark.

Skipping unnecessary work was a legitimate improvement of Morepath. The code now avoids accessing the relatively expensive GET attribute on the webob request, and also avoids a for loop through an empty list and a few if statements. In Python, performance is sensitive to even a few extra lines of code on the critical path.

But when you do have URL parameters, Morepath's feature that lets you convert and validate them automatically is pretty nice -- in almost all circumstances you should be glad to pay the slight performance penalty for the convenience. Features are usually worth their performance cost.

So is Morepath fast yet?

Is Morepath fast yet? Probably. Does it matter? It depends. What benchmark? But for those just skimming this article, I'll do another sneaky highlight:

Morepath is fast. Morepath outperforms the most popular Python web frameworks while offering a lot more flexibility.

Introducing Bob Strongpinion

I posted an article about programming yesterday. (punctuated equilibrium in software development) In it I try to share some insights I've had about software development, and illustrate it with a library I work on that I think is interesting. I posted it to reddit /r/programming, obviously hoping to get a bit of attention with it. I think the topic is interesting to other developers, and writing it took a bit of time. Besides, I'm human enough to want positive attention.

My reddit post quickly sank from sight with nary an upvote. That got me thinking about what kind of posts I could make that would draw more attention. I joked a bit that having strong opinions boldly spoken gets more attention -- but I should blame the topic and my writing more than anything else. To blame the outside world makes you stop learning from it, and I want to learn.

In engineering there are always trade offs, and it's important to be aware of them. It is also nicer to be respectful of works that people have put a lot of time and effort in. And if you disregard them you are also less likely to learn and grow. So I won't go out there and say Morepath, the Python web framework I work on, is better than, say, Flask or Django or Pyramid. It depends on what you're doing. When I compare Morepath to other web frameworks, I probably don't excite you; it may in fact be sleep inducing.

I think Morepath is great. I think it solves problems that many Flask and Django developers don't even fully realize they actually have. That's in fact part of the problem I have in explaining it. But I also recognize that in many circumstances other frameworks are the better choice, for a variety of reasons.

But sometimes I wish I was more bold in expressing my opinions than I usually am. Sometimes I wish I was more like Bob Strongpinion.

Who is Bob Strongpinion?

Bob Strongpinion has strong opinions and conviction. He would blog saying all other web frameworks suck compared to Morepath. He'd have a lot blog posts with eye-catching titles:

Django was yesterday. The future belongs to Morepath.

10 Reasons Morepath is just plain better.

You're doing configuration wrong. This library gets it right. (Dectate)

Single dispatch is over. Get ready for predicate dispatch. (Reg)

Why routing to the model instead of the view is winning.

Bob Strongpinion posts a lot of articles with lists, and they're all 10 items explaining that what he likes is plain better. Bob Strongpinion doesn't believe in trade offs in engineering. There's better, and there's worse, and what he likes is just plain better. Bob Strongpinion knows there's ONE tool that's right for EVERY job.

If someone doesn't agree with Bob Strongpinion's choice of tools, they're either stupid, or more likely, evil. Bob Strongpinion may not have more experience in a problem domain than you do, but it's the RIGHT experience so he's still right.

When Bob Strongpinion makes a pronouncement in a comment thread, the case is closed. Disagree with him and incur his righteous wrath.

Engineering projects with Bob Strongpinion on it always succeed as he picks the right tools. And when they don't, it's someone else's fault. When Bob Strongpinion doesn't get to use his favorite tools it's no wonder the project failed.

When Bob Strongpinion uses an operating system, it's because it's the best one for everyone, unless they're too wimpy, so he's still elite. Bob Strongpinion definitely believes systemd is a great conspiracy to destroy Linux.

Conclusion

I think Bob Strongpinion would get a lot more upvotes on reddit. He'd get attention; some of it negative, but a lot of it would be positive.

"Yeah, Strongpinion! You tell the truth about software development!"

And he's always right. Obvious comparisons to certain public figures come to mind. You can see why his mindset would be comfortable, so it's understandable why Bob Strongpinion lives among us.

I sometimes wish I could be more like Bob Strongpinion when I promote my own work. As you can see from the above, I can channel him pretty well. I snuck him in while rejecting him, how sneaky! The Strongpinion must be strong in me. I've even done a list of 10 bullet points before. It got some upvotes.

But in the end I keep choosing not to be him.

Punctuated Equilibrium in Software

Punctuated equilibrium

Punctuated equilibrium is a concept in evolution theory. It was developed to explain a feature of the fossil record: biological species appear quite suddenly and then tend to be in relative stasis for a long period, only undergoing very gradual changes. The species is in equilibrium with its environment. Then suddenly this stasis is punctuated: there is a relatively brief period where a large series of changes occur. This results in the evolution of a new species. The rapid changes can be brought on by changes in the environment, or by a lucky mutation in a single individual that opens up a whole set of possibilities for subsequent changes.

I've noticed that software too can evolve with a pattern of punctuated equilibrium. I'll illustrate this using a Python library that I work on: Reg. To explain how it evolved I need to go into some detail about it. Luckily, Reg is quite an interesting little library.

Reg is predicate dispatch implementation for Python. It didn't start out that way, but that's what it is now. The Morepath web framework, which I also work on, uses Reg to enable some powerful features. I'll refer to Morepath a few places in this article as it provides use cases for Reg.

Reg ancestry

The ancestor of Reg is the zope.interface library, which was created around the year 2002 by Jim Fulton. It is still in very active use by large projects like the Pyramid web framework and the Plone CMS.

zope.interface lets you define interfaces for Python objects. It's similar to the Python abc module, though zope.interface came earlier and is more general.

zope.interface also implements a registry that maps interface (or type) to interface to support an adapter lookup pattern: you can adapt an object of an interface (or type) to an object with an interface you want.

In a web application you could for instance look up a HTML view interface (API) for a given content object such as a document or a someone's address, or whatever other type of content object you may have in your system. We'll give an example of this in code when we get to Reg.

The genesis of zope.interface took a few years and involved a predecessor project. Like Reg itself it underwent evolution by punctuated equilibrium in its early years. I describe this a bit in the Reg history document.

I try to keep history documents describing the evolution of various projects I work on, as I think they can provide insight into a project beyond what the rest of the documentation can bring. If you like software history, see Morepath history, Dectate history and Reg history. (The Reg history overlaps with this article, so if you're curious to learn more, do check it out later.)

After 2002 zope.interface became stable: its API hasn't changed much, and neither has its implementation. There were a few small tweaks here and there, in particular to add Python 3 compatibility, but that's it. At some point around 2009 I made some proposals to improve its API, but got nowhere. That's when I started playing around with the idea to reimplement it for myself.

The genesis of Reg

It often takes a period of experimentation and play to create something new. It's important during this phase not to think too much about immediate practical goals. Focus on a few core features that interest you; don't worry about it covering everything. Banish any thoughts about backwards compatibility and how to upgrade existing large code bases; that would be detrimental to the spirit of playfulness.

"Why are you reimplementing zope.interface, Martijn?"

"Just for fun, I don't expect anyone to use this."

After a few years of occasional play with various ideas I had concerning zope.interface, they finally started to come together in 2013. The goal of Reg at the time was straightforward: it was to be like zope.interface, but with an implementation I could understand, and with a streamlined API.

I'm going to show sample code now. Be aware that the sample code in this article may be somewhat fictional for educational purposes.

Reg initially worked like this:

# the view API
class IView(reg.Interface):
    def __call__(self):
        "If you call this, you get a web representation."

# register implementation of the view API for Document and
# HTTP Request classes
@IView.register(Document, Request)
class DocumentToViewAdapter:
    def __init__(self, doc, request):
        self.doc = doc
        self.request = request

    def __call__(self):
        return "<p>%s</p>" % self.doc.content

# can register other implementations, for example for Address and
# Request

# create instances we can look up for
doc = Document()
request = Request()

# look up the view adapter for a specific object, in this case a document
# The specific implementation you find depends on the class of doc and
# request arguments
view = IView.adapt(doc, request)
# and get the representation
html = view()

Here we define an IView interface. You can then register adapters for this view that take parameters (the object and a HTTP request) and turn it into an object that can create a HTML representation of the parameters.

Major transition to generic functions

I worked with the zope.interface version of Reg for a while in the context of the Morepath web framework. This gave me some practical experience. I also talked about Morepath and Reg in public and got some feedback. Even minimal feedback is great; it sets thoughts into motion. I quickly realized that Reg's API could be simplified if I centered it around generic functions with multiple dispatch instead of interfaces and adapters. Something like this:

# generic function definition instead of interface
@reg.generic
def view(obj, request):
    """"If you call this, you get a web representation."""
    raise NotImplemented

# an implementation for Document and Request
@view.register(Document, Request)
def document_view(obj, request):
    return "<p>%s</p>" % obj.content

# get representation for document by calling view()
html = view(doc, request)

This looks simpler. The interface definition is gone, the adapter class is gone. We just have a function that dispatches on the type of its arguments.

Reg worked like this for over a year. It was stable. I didn't foresee any more large changes.

Major transition to predicate dispatch

Meanwhile...

I created a predicate registry implementation. This lived inside of a module in Reg, but it could as well have been in a totally different library: the code was unrelated to the rest of Reg.

The use case for this predicate registry was Morepath view lookup. The predicate system let you register objects by a selection of keys. You could look up a view based on the request_method attribute (HTTP GET, POST, etc) of the request object, for instance, not just by type.

Two things came together in the fall of 2014:

  • I realized that it was annoying that the multiple dispatch system automatically dispatched on all arguments to the function -- in many cases that wasn't required.
  • I needed the predicate system to understand about types and inheritance. The multiple dispatch system in Reg understood types but not predicates, and the predicate dispatch system understood predicates but not types.

Then I realized that if I generalized Reg and turned it into a predicate dispatch system, I could actually unify the two systems. The dialectic (thesis, antithesis, synthesis) is a strong force for creativity in software development

With predicate dispatch you can dispatch on any aspect of the arguments to a function; not just its class but also any attribute of it. You can still do multiple dispatch as before: dispatch on the type of an argument is now just be a special case. Since arguments now needed a description of what predicate they dispatch on, I could also have arguments that are ignored by the dispatch system altogether.

This is when I finally understood some of the reasoning behind the PEAK-Rules library, which is a Python predicate dispatch implementation that predates Reg by many years. Almost everything has been implemented before, but with reimplementation you gain understanding.

With that insight, the equilibrium was punctuated, and Reg underwent rapid change again. Now it looked like this:

# dispatch function instead of generic function
# note how we explicitly name the arguments we want to match on
# (obj, request) in the predicates, and how we match on the
# request_method attribute. match_instance still matches on type.
@reg.dispatch(reg.match_instance('obj'),
              reg.match_key('request_method',
                            lambda request: request.request_method))
def view(obj, request):
    raise NotImplemented

# an implementation for GET requests of documents
@view.register(obj=Document, request_method='GET')
def document_view(obj, request):
    return "<p>%s</p>" % obj.content

# get representation for document by calling view()
html = view(doc, request)

When we define a dispatch function, we can now precisely describe on what aspects of the arguments we want to do dispatch. When we register an implementation, we can use more than just the types of the arguments. We can also have arguments that do not play a role in dispatch at all.

This system allows Morepath to have views looked up by the type of the instance being represented, the last name in the path (/edit, /details), the HTTP request method (GET, POST, etc), and the type of object in the POST body, if any.

I succeeded in making predicates participate in the cache that I already had to speed up multiple dispatch, so this change both simplified and increased performance.

Major transition to dispatch methods

After this huge change, Reg was stable again for almost 2 years. I didn't think it needed further major changes. I was wrong.

The trigger was clear this time, as it was a person. Stefano Taschini, who started contributing to the Morepath web framework project. Stefano's a very smart guy, so I'm doing my best to learn from him. Listen hard, even if your impulse, like mine, is to defend your design decisions. I was lucky that Stefano started to think about Reg. So while Reg seemed outwardly stable, the pressure for change was slowly building up.

In the summer of 2016 Stefano and I had a lot of discussions and created a few branches of Reg and Morepath. All that work has now landed in master of Reg and Morepath. The implementation of Reg is simpler and more explicit, and its performance has been increased. Yet again we had a major punctuation in the evolution of Reg.

I mentioned before how the code samples in this article are somewhat fictional. One fiction is the way you register implementations in Reg. It didn't actually work this way until now. Instead of this:

@view.register(obj=Document, request_method='GET')
def document_view(obj, request):
    return "<p>%s</p>" % obj.content

until very recently, you'd write something like this:

r = reg.Registry()

def document_view(obj, request):
    return "<p>%s</p>" % obj.content

r.register(view, document_view, obj=Document, request_method='GET')

So, we used to have an explicit registry object in Reg. This was there because of a use case of Reg that I haven't told you about yet: we need it to support separate dispatch contexts in the same run-time. Morepath uses this let you compose a larger web application out of multiple smaller ones, each with their own context.

To control which context Reg used you could pass in a special magic lookup parameter to each dispatch function call:

view(doc, request, lookup=registry)

Dispatch implementations needed access to the context too. They could get to it by defining a magic lookup argument in their signature:

def document_view(obj, request, lookup):
    return "<p>%s</p>" % process_content(obj.content, lookup=lookup)

If you didn't specify the lookup, an implicit thread-local lookup was used.

All this wasn't ideal. During the creation of Reg I was fully aware of Python's mantra "explicit is better than implicit", but I didn't know a better way to make context-specific dispatch calls work. I tried my best to at least isolate the implicit behavior in a well-controlled portion of Reg, and to allow a fully explicit option with lookup arguments, but the machinery to support all this was more complex than I'd wish.

When Stefano and I discussed this we came up with the following ideas:

  • Remove multiple registries. Instead allow simple registration on dispatch functions as we've already seen in the examples above. Each function keeps its own private registry. Stefano pushed hard for this while I was resistant, but he was right.
  • To control context, introduce the notion of dispatch methods. Build dispatch methods on dispatch functions.

A dispatch method is associated with a context class:

class Context:
    @reg.dispatch_method(reg.match_instance('obj'))
    def foo(self, obj):
        raise NotImplementedError()

You can register implementations with the method:

@Context.foo.register(obj=Document)
def implementation(self, obj):
    ...

When you call the dispatch method you call it in its context:

c = Context()
c.foo(doc)

Each subclass of the context class creates a new context, with a fresh set of registrations:

# a completely new and clean context
class NewContext(Context):
    pass

Instead of a magic lookup argument to call a generic function in a particular context, you simply use self as the instance of the context. This fits Python a lot better and is faster as well. Magical lookup arguments were gone. Thread-local implicit context was gone too. All is well. With Stefano on board now, Reg's bus factor has doubled too.

A new period of stasis?

Large changes create room for further changes. We've already seen a lot of follow-on changes, especially in the area of performance, and I think we haven't seen the end of this yet. I am starting to understand now why PEAK-Rules has AST manipulation code. We may not quite have reached a point of equilibrium yet.

But after that performance engineering, surely Reg won't need any further drastic changes anymore? I can't think of any. But I've been here several times before. zope.interface is assumed to be done; Reg isn't. If you assume a project is done that could become a self-fulling prophecy and cause the project to stagnate before its time.

Dare to change

Reg is a piece of software that sits at the lower levels of our software stack. Morepath is on top of it, and applications built with it are on top of that. I've been impressed by how much of the underlying codebase of Morepath we've been able to change without breaking Morepath applications much.

Of course the amount of code written with Morepath is insignificant compared to that written with web frameworks like Django or Flask or Pyramid, so we can still afford to be be bold -- now, when the community is still small, before many more people join us, is the time to make changes. That is why we can play with a cool technique like predicate dispatch that while not new, is still unfamiliar to many. It is also a creative challenge to make the unfamiliar approachable.

If you're interested in any of this and want to talk to us, the Morepath devs are one click away.

Self-serving mercenary statement: if you need a developer and like what you hear, talk to me -- I'm on the lookout for interesting projects.