Skip to main content

Build a better batching UI with Morepath and Jinja2

Introduction

This post is the first in what I hope will be a series on neat things you can do with Morepath. Morepath is a Python web micro framework with some very interesting capabilities. What we'll look at today is what you can do with Morepath's link generation in a server-driven web application. While Morepath is an excellent fit to create REST APIs, it also works well server aplications. So let's look at how Morepath can help you to create a batching UI.

On the special occasion of this post we also released a new version of Morepath, Morepath 0.11.1!

A batching UI is a UI where you have a larger amount of data available than you want to show to the user at once. You instead partition the data in smaller batches, and you let the user navigate through these batches by clicking a previous and next link. If you have 56 items in total and the batch size is 10, you first see items 0-9. You can then click next to see items 10-19, then items 20-29, and so on until you see the last few items 50-55. Clicking previous will take you backwards again.

In this example, a URL to see a single batch looks like this:

http://example.com/?start=20

To see items 20-29. You can also approach the application like this:

http://example.com/

to start at the first batch.

I'm going to highlight the relevant parts of the application here. The complete example project can be found on Github. I have included instructions on how to install the app in the README.rst there.

Model

First we need to define a few model classes to define the application. We are going to go for a fake database of fake persons that we want to batch through.

Here's the Person class:

class Person(object):
    def __init__(self, id, name, address, email):
        self.id = id
        self.name = name
        self.address = address
        self.email = email

We use the neat fake-factory package to create some fake data for our fake database; the fake database is just a Python list:

fake = Faker()

def generate_random_person(id):
    return Person(id, fake.name(), fake.address(), fake.email())

def generate_random_persons(amount):
    return [generate_random_person(id) for id in range(amount)]

person_db = generate_random_persons(56)

So far nothing special. But next we create a special PersonCollection model that represents a batch of persons:

class PersonCollection(object):
    def __init__(self, persons, start):
        self.persons = persons
        if start < 0 or start >= len(persons):
            start = 0
        self.start = start

    def query(self):
        return self.persons[self.start:self.start + BATCH_SIZE]

    def previous(self):
        if self.start == 0:
            return None
        start = self.start - BATCH_SIZE
        if start < 0:
            start = 0
        return PersonCollection(self.persons, start)

    def next(self):
        start = self.start + BATCH_SIZE
        if start >= len(self.persons):
            return None
        return PersonCollection(self.persons, self.start + BATCH_SIZE)

To create an instance of PersonCollection you need two arguments: persons, which is going to be our person_db we created before, and start, which is the start index of the batch.

We define a query method that queries the persons we need from the larger batch, based on start and a global constant, BATCH_SIZE. Here we do this by simply taking a slice. In a real application you'd execute some kind of database query.

We also define previous and next methods. These give back the previous PersonCollection and next PersonCollection. They use the same persons database, but adjust the start of the batch. If there is no previous or next batch as we're at the beginning or the end, these methods return None.

There is nothing directly web related in this code, though of course PersonCollection is there to serve our web application in particular. But as you notice there is absolutely no interaction with request or any other parts of the Morepath API. This makes it easier to reason about this code: you can for instance write unit tests that just test the behavior of these instances without dealing with requests, HTML, etc.

Path

Now we expose these models to the web. We tell Morepath what models are behind what URLs, and how to create URLs to models:

@App.path(model=PersonCollection, path='/')
def get_person_collection(start=0):
    return PersonCollection(person_db, start)

@App.path(model=Person, path='{id}',
          converters={'id': int})
def get_person(id):
    try:
        return person_db[id]
    except IndexError:
        return None

Let's look at this in more detail:

@App.path(model=PersonCollection, path='/')
def get_person_collection(start=0):
    return PersonCollection(person_db, start)

This is not a lot of code, but it actually tells Morepath a lot:

  • When you go to the root path / you get the instance returned by the get_person_collection function.
  • This URL takes a request parameter start, for instance ?start=10.
  • This request parameter is optional. If it's not given it defaults to 0.
  • Since the default is a Python int object, Morepath rejects any requests with request parameters that cannot be converted to an integer as a 400 Bad Request. So ?start=11 is legal, but ?start=foo is not.
  • When asked for the link to a PersonCollection instance in Python code, as we'll see soon, Morepath uses this information to reconstruct it.

Now let's look at get_person:

@App.path(model=Person, path='{id}',
          converters={'id': int})
def get_person(id):
    try:
        return person_db[id]
    except IndexError:
        return None

This uses a path with a parameter in it, id, which is passed to the get_person function. It explicitly sets the system to expect an int and reject anything else, but we could've used id=0 as a default parameter instead here too. Finally, get_person can return None if the id is not known in our Python list "database". Morepath automatically turns this into a 404 Not Found for you.

View & template for Person

While PersonCollection and Person instances now have a URL, we didn't tell Morepath yet what to do when someone goes there. So for now, these URLs will respond with a 404.

Let's fix this by defining some Morepath views. We'll do a simple view for Person first:

@App.html(model=Person, template='person.jinja2')
def person_default(self, request):
    return {
        'id': self.id,
        'name': self.name,
        'address': self.address,
        'email': self.email
    }

We use the html decorator to indicate that this view delivers data of Content-Type text/html, and that it uses a person.jinja2 template to do so.

The person_default function itself gets a self and a request argument. The self argument is an instance of the model class indicated in the decorator, so a Person instance. The request argument is a WebOb request instance. We give the template the data returned in the dictionary.

The template person.jinja2 looks like this:

<!DOCTYPE html>
<html>
  <head>
    <title>Morepath batching demo</title>
  </head>
  <body>
    <p>
      Name: {{ name }}<br/>
      Address: {{ address }}<br/>
      Email: {{ email }}<br />
    </p>
  </body>
</html>

Here we use the Jinja2 template language to render the data to HTML. Morepath out of the box does not support Jinja2; it's template language agnostic. But in our example we use the Morepath extension more.jinja2 which integrates Jinja2. Chameleon support is also available in more.chameleon in case you prefer that.

View & template for PersonCollection

Here is the view that exposes PersonCollection:

@App.html(model=PersonCollection, template='person_collection.jinja2')
def person_collection_default(self, request):
    return {
        'persons': self.query(),
        'previous_link': request.link(self.previous()),
        'next_link': request.link(self.next()),
    }

It gives the template the list of persons that is in the current PersonCollection instance so it can show them in a template as we'll see in a moment. It also creates two URLs: previous_link and next_link. These are links to the previous and next batch available, or None if no previous or next batch exists (this is the first or the last batch).

Let's look at the template:

<!DOCTYPE html>
<html>
 <head>
   <title>Morepath batching demo</title>
  </head>
  <body>
    <table>
      <tr>
        <th>Name</th>
        <th>Email</th>
        <th>Address</th>
      </tr>
      {% for person in persons %}
      <tr>
        <td><a href="{{ request.link(person) }}">{{ person.name }}</a></td>
        <td>{{ person.email }}</td>
        <td>{{ person.address }}</td>
      </tr>
      {% endfor %}
    </table>
    {% if previous_link %}
    <a href="{{ previous_link }}">Previous</a>
    {% endif %}
    {% if next_link %}
    <a href="{{ next_link }}">Next</a>
    {% endif %}
  </body>
</html>

A bit more is going on here. First it loops through the persons list to show all the persons in a batch in a HTML table. The name in the table is a link to the person instance; we use request.link() in the template to create this URL.

The template also shows a previous and next link, but only if they're not None, so when there is actually a previous or next batch available.

That's it

And that's it, besides a few details of application setup, which you can find in the complete example project on Github.

There's not much to this code, and that's how it should be. I invite you to compare this approach to a batching UI to what an implementation for another web framework looks like. Do you put the link generation code in the template itself? Or as ad hoc code inside the view functions? How clear and concise and testable is that code compared to what we just did here? Do you give back the right HTTP status codes when things go wrong? Consider also how easy it would be to expand the code to include searching in addition to batching.

Do you want to try out Morepath now? Read the very extensive documentation. I hope to hear from you!

GraphQL and REST

Introduction

There is a new trend in open source that I'm not sure I like very much: big companies announce that they are going to open source something, but the release is nowhere in sight yet. Announcing something invites feedback, especially if it's announced as open source. When the software in question is available already as closed source for people to play with I don't really mind as feedback is possible, though it makes me wonder what the point is of holding back on a release.

It's a bit more difficult to give feedback on something when the thing announced is still heavily in development with no release in sight. How can one give feedback? But since you announced you'd open source it, I guess we should not be shy and give you feedback anyway.

Facebook has been doing this kind of announcement a lot recently: they announced React Native before they released it. They also announced Relay and GraphQL, but both have not yet been released. They've given us some information in a few talks and blog posts, and the rest is speculation. If you want to learn more about Relay and GraphQL I highly recommend you read these slides by Laney Kuenzel.

From the information we do have, Relay looks very interesting indeed. I think I'd like to have the option use Relay in the future. That means I need to implement GraphQL somehow, or at least something close enough to it, as it's the infrastructure that makes Relay possible.

React is a rethink of how to construct a client-side web application UI. React challenges MVC and bi-directional data binding. GraphQL is a rethink of the way client-side web applications talk to their backend. GraphQL challenges REST.

REST

So what was REST again? Rich frontend applications these days typically talk to a HTTP backend. These backends follow some basic REST patterns: resources are on URLs and you can interact with them using HTTP verbs. Some resources represent single items:

/users/faassen

If you issue a GET request to it you get its representation, typically in JSON. You can also issue a PUT request to it to overwrite its representation, and DELETE to delete it.

In a HTTP API we typically also have a way to access collections:

/users

You can issue GET to this too, possibly with some HTTP query parameters to do a filter, to get a representation of the users known to the application. You can also issue POST to this to add a new user.

Real proper REST APIs, also known as Hypermedia APIs, go beyond this: they have hyperlinks between resources. I wrote a web framework named Morepath which aims to make it easier to create complex hypermedia APIs, so you can say I'm pretty heavily invested in REST.

Challenging REST

GraphQL challenges REST. The core idea is that the code that best knows what data is needed for a UI is not on the server but on the client. The UI component knows best what data it wants.

REST delivers all the data a UI might need about a resource and it's up to the client to go look for the bits it actually wants to show. If that data is not in a resource it already has, the client needs to go off to the server and request some more data from another URL. With GraphQL the UI gets exactly the data it needs instead, in a shape handy for the UI.

As we can see here, a GraphQL query looks like this:

 {
   user(id: 400) {
     id,
     name,
     isViewerFriend,
     profilePicture(size: 50)  {
       uri,
       width,
       height
     }
   }
}

You get back this:

{
  "user" : {
    "id": 4000,
    "name": "Some name",
    "isViewerFriend": true,
    "profilePicture": {
      "uri": "http://example.com/pic.jpg",
      "width": 50,
      "height": 50
    }
  }
}

This says you want an object of type user with id 4000. You are interested in its id, name and isViewerFriend fields.

You also want another object it is connected to: the profilePicture. You want the uri, width and height fields of this. While there is no public GraphQL specification out there yet, I think that size: 50 means to restrict the subquery for a profile picture to only those of size 50. I'm not sure what happens if no profilePicture of this size is available, though.

To talk to the backend, there is only a single HTTP end-point that receives all these queries, or alternatively you use a non-HTTP mechanism like web sockets. Very unRESTful indeed!

REST and shaping data

Since I'm invested in REST, I've been wondering about whether we can bring some of these ideas to REST APIs. Perhaps we can even map GraphQL to a REST API in a reasonably efficient way. But even if we don't implement all of GraphQL, we might gain enough to make our REST APIs more useful to front-end developers.

As an exercise, let's try to express the query above as a REST query for a hypothetical REST API. First we take this bit:

user(id: 4000) {

We can express this using a path:

/users/4000

The client could construct this path by using a URI template (/users/{id}) provided by the server, or by following a link provided by the server, or by doing the least RESTful thing of them all: hardcode the URL construction in client code.

How do we express with HTTP what fields a user wants? REST of course does have a mechanism that can be used to shape data: HTTP query parameters. So this bit:

id,
name,
isViewerFriend,

could become these query parameters:

?field=id&field=name&field=isViewerFriend

And the query would then look like this:

/users/4000?field=id&field=name&field=isViewerFriend

That is pretty straightforward. It needs server buy-in, but it wouldn't be very difficult to implement in the basic case. The sub-query is more tricky. We need to think of some way to represent it in query parameters. We could do something like this (multi-line for clarity):

?field=profilePicture&
  filter:profilePicture.size=50&
  field=profilePicture.uri&
  field=profilePicture.width&
  field=profilePicture.height

The whole query now looks like this:

/users/4000?
  field=id&
  field=name&
  field=isViewerFriend&
  field=profilePicture&
  filter:profilePicture.size=50&
  field=profilePicture.uri&
  field=profilePicture.width&
  field=profilePicture.height

The result of this query would look the same as in the GraphQL example. It's important to notice that this REST API is not fully normalized -- the profilePicure data is not behind a separate URL that the client then needs to go to. Instead, the object is embedded in the result for the sake of convenience and performance.

I'd be tempted to make the server send back some JSON-LD to help here: each object (the user object and the profileData subobject) can have an @id for its canonical URL and a @type as a type designator. A client-side cache could exploit this @id information to store information about the objects it already knows about. Client-side code could also use this information to deal with normalized APIs transparently: it can automatically fetch the sub-object for you if it's not embedded, at the cost of performance.

Does this REST API have the same features as the GraphQL example? I have no idea. It probably depends especially on how queries of related objects work. GraphQL does supply caching benefits, which you wouldn't have without more work. On the other hand you might be able to exploit HTTP-level caching mechanisms with this REST-based approach. Then again, this has more HTTP overhead, which GraphQL can avoid.

Let's briefly get back to the idea to automatically map GraphQL to a REST API. What is needed is a way to look up a URI template for a GraphQL type. So, for the user type we could connect it to a URI template /users/{id}. The server could supply this map of GraphQL type to URI template to the server, so the server can make the translation of the GraphQL to the REST API.

Further speculation

What about queries for multiple objects? We could use some kind of collection URL with a filter:

/user?filter:startsWith=a

It is normal in REST to shape collection data this way already, after all. Unfortunately I have no clear idea what a query for a collection of objects looks like in GraphQL.

I've only vaguely started thinking about mutations. If you can access the objects's URL in a standard way such as with an @id field, you can then get a handle on the object and send it POST, PUT and DELETE requests.

Conclusion

All this is wild speculation, as we don't really know enough about GraphQL yet to fully understand its capabilities. It's quite possible that I'm throwing away some important property of GraphQL away by mapping it to a REST API. Scalability, for instance. Then again, usually my scalability use cases aren't the same as Facebook's, so I might not care as long as I get Relay's benefits to client-side code development.

It's also possible that it's actually easier to implement a single GraphQL-based endpoint than to write or adapt a REST API to support these patterns. Who knows.

Another question I don't have the answer to is what properties a system should have to make Relay work at all. Does it need to be exactly GraphQL, or would a subset of its features work? Which properties of GraphQL are essential?

Thank you, GraphQL and Relay people, for challenging the status quo! Though it makes me slightly uncomfortable, I greatly appreciate it. Hopefully my feedback wasn't too dumb; luckily you can't blame me too much for that as I can legitimately claim ignorance! I'm looking forward to learning more.

Server Templating in Morepath 0.10

Introduction

I just released Morepath 0.10 (CHANGES)! Morepath is a modern Python web framework that combines power with simplicity of use. Morepath 0.10's biggest new feature is server-side templating support.

Most Python web frameworks were born at a time when server-side templating was the only way to get HTML content into a web browser. Templates in the browser did not yet exist. Server templating was a necessity for a server web framework, built-in from day 1.

The web has changed and much more can be done in the browser now: if you want a web page, you can accomplish it with client-side JavaScript code, helped by templates, or embedded HTML-like snippets in JavaScript, like what the React framework does. Morepath is a web framework that was born in this new era.

Morepath could take a more leisurely approach to server templating. We recommend that users rely on client-side technology to construct a UI -- something that Morepath is very good at supporting. For many web applications, this approach is fine and leads to more responsive user interfaces. It also has the benefit that it supports a strong separation between user interface and underlying data. And you could still use server template engines with Morepath, but with no help from the framework.

But there is still room for server templates. Server-generated HTML has its advantages. It's the easiest way to create a bookmarkable traditional web site -- no client-side routing needed. For more dynamic web applications it can also sometimes make sense to send a server-rendered HTML page to the client as a starting point, and only switch to a client-side dynamic code later. This is useful in those cases where you want the end-user to see a web page as quickly as possible: in that case sending HTML directly from the server can still be faster, as there is no need for the browser to load and process JavaScript in order to display some content.

So now Morepath has now, at last, gained server template support, in version 0.10. We took our time. We prototyped a bit first. We worked out the details of the rest of the framework. As we will see, it's nice we had the chance to spend time on other aspects of Morepath first, as that infrastructure now also makes template language integration very clean.

The basics

Say you want to use Jinja2, the template language used by Flask, in Morepath. Morepath does not ship with Jinja2 or any other template language by default. Instead you can install it as a plugin in your own project. The first thing you do is modify your project's setup.py and add more.jinja2 to install_requires:

install_requires=[
  'more.jinja2',
],

Now when you install your project's dependencies, it pulls in more.jinja2, which also pulls in the Jinja2 template engine itself.

Morepath's extension system works through subclassing. If you want Jinja2 support in your Morepath application, you need to subclass your Morepath app from the Jinja2App:

from more.jinja2 import Jinja2App

class App(Jinja2App):
    pass

The App class is now aware of Jinja2 templates.

Next you need to tell your app what directory to look in for templates:

@App.template_directory()
def get_template_directory():
    return 'templates'

This tells your app to look in the templates directory next to the Python module you wrote this code in, so the templates subdirectory of the Python package that contains your code.

Now you can use templates in your code. Here's a HTML view with a template:

@App.html(model=Customer, template='customer.jinja2')
def customer_default(self, request):
    return {
      'name': self.name,
      'street': self.street,
      'city': self.city,
      'zip': self.zip_code
    }

The view returns a dictionary. This dictionary contains the variables that should go into the customer.jinja2 template, which should be in the templates directory. Note that you have to use the jinja2 extension, as Morepath recognizes how to interpret a template by its extension.

You can now write the customer.jinja2 template that uses this information:

<html>
<body>
  <p>Customer {{name}} lives on {{street}} in {{city}}.</p>
  <p>The zip code is {{zip}}.</p>
</body>
</html>

You can use the usual Jinja2 constructs here.

When you access the view above, the template gets rendered.

Chameleon

What if you want to use Chameleon (ZPT) templates instead of Jinja2 templates? We've provided more.chameleon that has this integration. Include it in install_requires in setup.py, and then do this to integrate it into your app:

from more.chameleon import ChameleonApp

class App(ChameleonApp):
    pass

You can now set up a template directory and put in .pt files, which you can then refer to from the template argument to views.

You could even subclass both ChameleonApp and Jinja2App apps and have an application that uses both Chameleon and Jinja2 templates. While that doesn't seem like a great idea, Morepath does allow multiple applications to be composed into a larger application, so it is nice that it is possible to combine an application that uses Jinja2 with another one that uses Chameleon.

Overrides

Imagine there is an application developed by a third party that has a whole bunch of templates in them. Now without changing that application directory you want to override a template in it. Perhaps you want to override a master template that sets up a common look and feel, for instance.

In Morepath, template overrides can be done by subclassing the application (just like you can override anything else):

class SubApp(App):
    pass

@SubApp.template_directory()
def get_template_directory_override():
    return 'override_templates'

That template_directory directive tells SubApp to look for templates in override_templates first before it checks the templates directory that was set up by App.

If we want to override master.jinja2, all we have to do is copy it from templates into override_templates and change it to suit our purposes. Since a template with that name is found in override_templates first, it is found instead of the one in templates. The original App remains unaffected.

Implementation

In the introduction we mentioned that the template language integration code into Morepath is clean. You should be able to integrate other template engines easily too. Here is the code that integrates Jinja2 into Morepath:

import os
import morepath
import jinja2


class Jinja2App(morepath.App):
    pass


@Jinja2App.setting_section(section='jinja2')
def get_setting_section():
    return {
        'auto_reload': False,
    }


@Jinja2App.template_loader(extension='.jinja2')
def get_jinja2_loader(template_directories, settings):
    config = settings.jinja2.__dict__.copy()

    # we always want to use autoescape as this is about
    # HTML templating
    config.update({
        'autoescape': True,
        'extensions': ['jinja2.ext.autoescape']
    })

    return jinja2.Environment(
      loader=jinja2.FileSystemLoader(template_directories),
      **config)


@Jinja2App.template_render(extension='.jinja2')
def get_jinja2_render(loader, name, original_render):
    template = loader.get_template(name)

    def render(content, request):
        variables = {'request': request}
        variables.update(content)
        return original_render(template.render(**variables), request)
    return render

The template_loader directive sets up an object that knows how to load templates from a given list of template directories. In the case of Jinja2 that is the Jinja2 environment object.

The template_render directive then tells Morepath how to render individual templates: get them from the loader first, and then construct a function that given content returned by the view function and request, uses the template to render it.

Documentation

For more documentation, see the Morepath documentation on templates.

Let us know what you think!

10 reasons to check out the Morepath web framework in 2015

Happy new year everybody! Last year we've made a lot of progress on the Morepath web framework for Python. It will go quite a lot further in 2015 as well. Here are 10 reasons why you should check out Morepath this year:

  1. Knows about HTTP status codes. When you write a "Hello World" application it does not matter that there are other status codes besides 200 OK, but in real world applications you want your application to know about 404 Not Found, and 405 Method Not Allowed, and so on.

    Morepath does not make you write write verbose and failure-prone special cased code to handle status codes. Instead, Morepath does HTTP status codes correctly right away.

  2. Morepath makes hyperlinks to objects. In a typical routing web framework, to make a URL, you need to remember the name of a route, the parameters that go into the route, and how to get that information from the object to which you are making the route. This leads to duplicated code and hardcodes route names everywhere. Since it does so little, it encourages you to skip it entirely and write even more hardcoded URL generation code everywhere.

    Morepath makes it easier to do the right thing. Morepath lets you link to Python objects. Morepath also understands URL parameters can be part of URLs too, and can create a link with them in there for you too.

  3. Built-in permission system. Morepath does not leave something as important as security entirely up to extensions. The core framework knows that views can be guarded with permissions. Who has what permission for what object is then up to you, and Morepath lets you define permissions powerfully and succinctly.

  4. Compose applications. If you have a project application and a wiki application, you can mount the wiki application into the project applications. You can develop and test applications independently, and then combine them later. These are true coarse-grained components. This way, Morepath lets build large applications out of smaller ones.

  5. All views are reusable. Morepath does not have a separate sub-framework to let you write more reusable and generic views than the normal ones. Instead any view you create in Morepath is already reusable. And remember - you don't have to hardcode route names, which makes views more generic by default.

    Views in Morepath are true fine-grained reusable components, without extra work. Morepath gives you the tools to build a generic UI. You can reuse views with ease with Morepath.

  6. Subclass applications. Morepath does not have a separate sub-framework to let you write reusable blueprints for applications. Instead, any application you create in Morepath is already reusable in that way. In the real world, applications evolve into frameworks all the time, and Morepath does not stand in your way with special cases.

    You can subclass any Morepath app to add new routes and views, or override existing ones.

  7. Made for the modern web. Many modern web applications feature a REST backend with a rich, client-side frontend written in JavaScript. Morepath is written to support REST from the ground up - it's not a layer over something else, but it's not a constraining HTTP API generation tool either.

  8. Extensible framework. Morepath lets you extend the web framework in the same way it lets you extend applications written in it. You can write new Morepath directives and framework applications. As examples, more.static adds advanced browser static resource handling to Morepath, more.transaction integrates Morepath with transaction based databases such as SQLAlchemy and the ZODB, and more.forwarded adds HTTP Forwarded header support.

  9. Micro framework with macro ambitions. Morepath is a micro framework; it's not a lot of code. It's easy to get an overview of what's going on, and it's easy to embed in a larger application. Morepath packs a lot more punch in a small amount of code than your typical Python micro web framework.

    All this does not come at the cost of performance. When the primary selling point of a Python web framework seems to be performance, perhaps it's not doing enough for you. But Morepath has more than adequate performance - on "Hello world" at least Morepath outpaces some very popular web Python frameworks comfortably.

  10. Documentation. Some Python micro frameworks also have micro documentation. Instead, Morepath has lots of docs.

Enjoy 2015!

A Review of the Web and how Morepath fits in

I've spent a bit of time writing the somewhat grandiosely titled A Review of the Web, and I think it's a neat addition to the documentation of the Morepath web framework.

It attempts to do two things:

  • Help those developers who aren't as familiar yet with the details of web technology to get a handle on various concepts surrounding web frameworks and web services.
  • Show to developers who are more familiar with these concepts how Morepath fits in as a web framework.

Does this document fulfill either purpose? Let me know!

Morepath 0.9 released!

Yesterday I released Morepath 0.9 (CHANGES)!

What is Morepath? Morepath is a Python web framework. It tries to be especially good at implementing modern, RESTful backends. It is very good at creating hyperlinks. It is easy to use, but still lets you write flexible, maintainable and reusable code. Morepath is very extensively documented.

This release doesn't involve earth-shaking changes like the 0.7 and 0.8 releases did, but it still has an interesting change I'd like to discuss.

Proxy support

Morepath by default doesn't obey the HTTP Forwarded header in link generation, which is a good thing, as it would allow various link hijacking attacks if it did. But if you're behind a trusted proxy that generates the Forwarded header you do want Morepath to take it into account. To do so, you install the more.forwarded extension and subclass your (root) application from it:

from more.forwarded import ForwardedApp

class MyApp(ForwardedApp):
     pass

We don't have support yet for the old-style X_FORWARDED_HOST and X_FORWARDED_PROTO that the Forwarded header replaces; we're open to contributions to more.forwarded!

Linking to external applications

Now we come to a very interesting capability of Morepath: the ability to model and link to external applications.

Let's consider a hypothetical external application. It's hosted on the ubiquitous http://example.com. It has documents listed on URLs like this:

http://example.com/documents/foo

We could of course simply create links to it by concatenating http://example.com/documents and the document id, foo. For such a simple external application that is probably the best way to go. So what I'm going to describe next is total overkill for such a simple example, but I have to use a simple example to make it comprehensible at all.

Here's how we'd go about modeling the external site:

class ExternalDocumentApp(morepath.App):
    pass

class ExternalDocument(object):
    def _init__(self, id):
       self.id = id

@ExternalDocumentApp.path(model=ExternalDocument, path='/documents/{id}')
def get_external_document(id):
    return ExternalDocument(id)

We don't declare any views for ExternalDocument as our code is not going to create representations for the external document, just create links to it. We need to mount it into our actual applicatino code so that we can use it:

@App.mount(path='external_documents', app=ExternalDocumentApp)
def mount_external_document_app():
    return ExternalDocumentApp()

Now we set up the link_prefix for ExternalDocumentApp to point to http://example.com:

@ExternalDocumentApp.link_prefix()
def external_link_prefix(request):
    return 'http://example.com'

As you can see, we've hardcoded http://example.com in it. Now if you're in some view code for your App, you can create a link to an ExternalDocument like this:

@App.json(model=SomeModel)
def some_model_default(self, request):
   return {
     'link': request.link(
          ExternalDocument('foo'),
          app=request.app.child('external_documents'))
   }

This will generate the correct link to the external document foo:

http://example.com/documents/foo

Simplification

You can make this simpler by using a defer_links directive for your App (introduced in Morepath 0.7):

@App.defer_links(model=ExternalDocument)
def defer_document(app, obj):
    return app.child('external_documents')

We've now told Morepath that any ExternalDocument objects need to have their link generated by the mounted external_documents app. This allows you to write link generation code that's a lot simpler:

@App.json(model=SomeModel)
def some_model_default(self, request):
   return {
     'link': request.link(ExternalDocument('foo'))
   }

In review

As I said previously, this is total overkill for an external application as simple as the hypothetical one I described. But this technique of modeling an external application can be very useful in specific circumstances:

  • This is declarative code. If you are dealing with a lot of different kind of links to an external application, it can be worthwhile to properly model it in your application, instead of spreading more failure-prone link construction code all over the place.

  • If you have to deal with an external application that for some reason is expected to change its structure (or hostname) in the future. By explicitly modeling what you link to, you can easily adjust all the outgoing links in your application when that change happens.

  • Consider a Morepath application that has a sub-application, mounted into it in the same process. You now decide to run this sub-application in a separate process, with a separate hostname. To do this you break out the code out into its own project so you can run it separately.

    In this case you already have declarative link generation to it. In the original project, you create a hollowed-out version of the sub-application that just has the path directives that describe the link structure. You then hardcode the new hostname using link_prefix.

    The code that links to it in the original application will now automatically update to point to the sub-application on the new host.

    This way you can break a larger application into multiple separate pieces pretty easily!

Conclusion

If you've read all the way to the end, I hope you've enjoyed that and aren't completely overwhelmed by these options! Just remember: these are advanced use cases. Morepath grows with your application. It is simple for simple things, but is there for you when you do have more complex requirements.

Better REST with Morepath 0.8

Today I released Morepath 0.8 (CHANGES). In this release Morepath has become faster, simpler and more powerful at the same time. I like it when I can do all three in a release!

I'll get faster and simpler out of the way fast, so I can go into the "more powerful", which is what Morepath is all about.

Faster

I run this simple benchmark once every while to make sure Morepath's performance is going in the right direction. The benchmark does almost nothing: it just sends the text "Hello world" back to the browser from a view on a path.

It's still useful to try such a small benchmark, as it can help show how much your web framework is doing to send something that basic back to the browser. In July when I presented Morepath at EuroPython, I measured it. I was about as fast as Django then at this task, and was already significantly faster than Flask.

I'm pleased to report that Morepath 0.8 is 50% faster than in July. At raw performance on this benchmark, we have now comfortably surpassed Django and are leaving Flask somewhere in the distance.

Morepath is not about performance -- it's fast enough anyway, other work will dominate in most real-world applications, but it's nice to know.

Performance is relative of course: Pyramid for instance is still racing far ahead on this benchmark, and so is wheezy.web, the web framework from which I took this benchmark and hacked up.

Simpler

Morepath 0.8 is running on a new engine: a completely refactored Reg library. Reg was originally inspired by zope.interface (which Pyramid uses), but it has since evolved almost beyond recognition into a powerful generic dispatch system.

In Reg 0.9, the dispatch system has been simplified and generalized to also let you dispatch on the value of arguments as well as their classes. Reg 0.9 also lifts the restriction that you have to dispatch on all non-key keyword arguments. Reg could also cache lookups to make things go faster, but this now also works for the new non-class-based dispatch.

Much of Morepath's flexibility and power is due to Reg. Morepath 0.9's view lookup system has been rewritten to make use of the new powers of Reg, making it both faster and more powerful.

Enough abstract talk: let's look at what implementing a REST web service looks like in Morepath 0.8.

The Power of Morepath: REST in Morepath

Scenario

Here's the scenario we are going to implement.

Say you're implementing a REST API (also known as a hypermedia API).

You want to support the URL (hostname info omitted):

/customers/{id}

When you access it with a GET request, you get JSON describing the customer with the given id, or if it doesn't exist, 404 Not Found.

There's also the URL:

/customers

This represents a collection of customers. You want to be able to GET it and get some JSON information about the customers back.

Moreover, you want to POST JSON to it that represents a new customer, to add it a customer to the collection.

The customer JSON at /customers/{id} looks like this:

{
  "@id": "/customers/0",
  "@type": "Customer",
  "name": "Joe Shopper"
}

What's this @id and @type business? They're just conventions (though I took them took from the JSON-LD standard). @id is a link to the customer itself, which also uniquely identifies this customer. @type describes the type of this object.

The customer collection JSON at /customers looks like this:

{
  "@id": "/customers",
  "@type": "CustomerCollection"
  "customers": ['/customers/0', '/customers/1'],
  "add": "/customers",
}

When you POST a new customer @id is not needed, but it gets added after POST. The response to a POST should be JSON representing the new customer we just POSTed, but now with the @id added.

Implementing this scenario with Morepath

First we define a class Customer that defines the customer. In a real-world application this is backed by some database, perhaps using an ORM like SQLAlchemy, but we'll keep it simple here:

class Customer(object):
    def __init__(self, name):
        self.id = None  # we will set it after creation
        self.name = name

Customer doesn't know anything about the web at all; it shouldn't have to.

Then there's a CustomerCollection that represents a collection of Customer objects. Again in the real world it would be backed by some database, and implemented in terms of database operations to query and add customers, but here we show a simple in-memory implementation:

class CustomerCollection(object):
     def __init__(self):
         self.customers = {}
         self.id_counter = 0

     def get(self, id):
         return self.customers.get(id)

     def add(self, customer):
         self.customers[self.id_counter] = customer
         # here we set the id
         customer.id = self.id_counter
         self.id_counter += 1
         return customer

customer_collection = CustomerCollection()

We register this collection at the path /customers:

@App.path(model=CustomerCollection, path='/customers')
def get_customer_collection():
    return customer_collection

We register Customer at the path /customers/{id}:

@App.path(model=Customer, path='/customers/{id}'
          converters={'id': int})
def get_customer(id):
    return customer_collection.get(id)

See the converters bit we did there? This makes sure that the {id} variable is converted from a string into an integer for you automatically, as internally we use integer ids.

We now register a dump_json that can transform the Customer object into JSON:

@App.dump_json(model=Customer)
def dump(self, request):
   return {
      '@type': 'Customer',
      '@id': self.id,
      'name': self.name
   }

Now we are ready to implement a GET (the default) view for Customer, so that /customer/{id} works:

@App.json(model=Customer)
def customer_default(self, request):
    return self

That's easy! It can just return self and let dump_json take care of making it be JSON.

Now let's work on the POST of new customers on /customers.

We register a load_json directive that can transform JSON into a Customer instance:

@App.load_json()
def load(json, request):
    if json['@type'] == 'Customer':
        return Customer(name=json['name'])
    return json

We now can register a view that handles the POST of a new Customer to the CustomerCollection:

@App.json(model=CustomerCollection,
          request_method='POST',
          body_model=Customer)
def customer_collection_post(self, request):
    return self.add(request.body_obj)

This calls the add method we defined on CustomerCollection before. body_obj is a Customer instance, converted from the incoming JSON. It returns the resulting Customer instance which is automatically transformed to JSON.

For good measure let's also define a way to transform the CustomerCollection into JSON:

@App.dump_json(model=CustomerCollection)
def dump_customer_collection(self, request):
    return {
        '@id': request.link(self),
        '@type': 'CustomerCollection',
        'customers': [
            request.link(customer) for customer in self.customers.values()
        ],
        'add': request.link(self),
    }

request.link automatically creates the correct links to Customer instances and the CustomerCollection itself.

We now need to add a GET view for CustomerCollection:

@App.json(model=CustomerCollection)
def customer_collection_default(self, request):
    return self

We done with our implementation. Check out a working example on Github. To try it out you could use a commandline tool like wget or curl, or Chrome's Postman extension, for instance.

What about HTTP status codes?

A good REST API sends back the correct HTTP status codes when something goes wrong. There's more to HTTP status codes than just 200 OK and 404 Not Found.

Now with a normal Python web framework, you'd have to go through your implementation and add checks for various error conditions, and then return or raise HTTP errors in lots of places.

Morepath is not a normal Python web framework.

Morepath does the following:

/customers and /customers/1

200 Ok (if customer 1 exists)

Well, of course!

/flub

404 Not Found

Yeah, but other web frameworks do this too.

/customers/1000

404 Not Found (if customer 1000 doesn't exist)

Morepath automates this for you if you return None from the ``@App.path`` directive.

/customers/not_an_integer

400 Bad Request

Oh, okay. That's nice!

PUT on /customers/1

405 Method Not Allowed

You know about this status code, but does your web framework?

POST on /customers of JSON that does not have @type Customer
422 Unprocessable Entity

Yes, 422 Unprocessable Entity is a real HTTP status code, and it's used in REST APIs -- the Github API uses it for instance. Other REST API use 400 Bad Request for this case. You can make Morepath do this as well.

Under the hood

Here's the part of the Morepath codebase that implements much of this behavior:

@App.predicate(generic.view, name='model', default=None, index=ClassIndex)
def model_predicate(obj):
    return obj.__class__


@App.predicate_fallback(generic.view, model_predicate)
def model_not_found(self, request):
    raise HTTPNotFound()


@App.predicate(generic.view, name='name', default='', index=KeyIndex,
               after=model_predicate)
def name_predicate(request):
    return request.view_name


@App.predicate_fallback(generic.view, name_predicate)
def name_not_found(self, request):
    raise HTTPNotFound()


@App.predicate(generic.view, name='request_method', default='GET',
               index=KeyIndex, after=name_predicate)
def request_method_predicate(request):
    return request.method


@App.predicate_fallback(generic.view, request_method_predicate)
def method_not_allowed(self, request):
    raise HTTPMethodNotAllowed()


@App.predicate(generic.view, name='body_model', default=object,
               index=ClassIndex, after=request_method_predicate)
def body_model_predicate(request):
    return request.body_obj.__class__


@App.predicate_fallback(generic.view, body_model_predicate)
def body_model_unprocessable(self, request):
    raise HTTPUnprocessableEntity()

Don't like 422 Unprocessable Entity when body_model doesn't match? Want 400 Bad Request instead? Just override the predicate_fallback for this in your own application:

class MyApp(morepath.App):
    pass

@MyApp.predicate_fallback(generic.view, body_model_predicate)
def body_model_unprocessable_overridden(self, request):
    raise HTTPBadRequest()

Want to have views respond to the HTTP Accept header? Add a new predicate that handles this to your app.

Now what are you waiting for? Try out Morepath!

Morepath 0.7: new inter-app linking

I've just released Morepath 0.7!

What is Morepath? Morepath is a Python web framework. It tries to be especially good at implementing modern, RESTful backends. It is very good at creating hyperlinks. It is easy to use, but still lets you write flexible, maintainable and reusable code. Morepath is very extensively documented.

So what's new in Morepath 0.7? The CHANGES doc as usual has the details, but I'll give an overview here.

New features for JSON handling

Morepath 0.7 introduces new ways to deal with JSON. There are two new directives, dump_json and load_json. By using these you can teach Morepath how to automatically convert incoming JSON to a Python object, and outgoing Python objects to JSON. See the JSON and objects documentation for more.

Mounting and linking

The big change in Morepath 0.7 however involves the way mounting works, and how you can link between applications. This introduces a few breaking changes if you were using these features before. The CHANGES document provides documentation that will tell you how to adjust your code.

I'm very happy with the new change. It cleans up several APIs. I believe this makes them both easier to understand while at the same time significantly cleaning up the implementation. It also introduces a powerful new feature for inter-app linking: deferred links.

In brief, Morepath lets you mount one application into another:

import morepath

class RootApp(morepath.App)
    pass

class SubApp(morepath.App):
    pass

@RootApp.mount(path='sub', app=SubApp)
def mount_subapp():
    return SubApp()

Now the SubApp application, which can be its own whole different thing (its instance is a WSGI application), is mounted under the RootApp application. When you go to /sub, SubApp takes over.

This doesn't work just for simple sub-paths like sub, but also for parameterized paths. Consider this:

class WikiApp(morepath.App):
    def __init__(self, wiki_id):
        self.wiki_id = wiki_id

@UserApp.mount(path='/users/{username}/wiki', app=WikiApp)
def mount_wiki(username):
    return WikiApp(wiki_id=wiki_id_for_username(username)

Here's we've mounted a wiki app into a user app. When you go to /users/foo/wiki, the wiki app for user foo takes over, with its own routes, views, and the like. The wiki app doesn't need to know about the user app, and the user app just needs to know how to mount the wiki app.

Morepath is very good at linking: it knows how to construct a link to an object instance. So, if you want to link to a particular WikiPage instance from within the wiki app, you'd simply write this:

request.link(some_wiki_page)

What if you wanted to create a link to a wiki page from the user app? Just linking to the wiki page will fail, as the user app doesn't know how to create links to wiki pages. But you can tell it to create a link to an object in the wiki app explicitly:

wiki_app = request.app.child(WikiApp, username='foo')
request.link(some_wiki_page, app=wiki_app)

If you are going to write a lot of such links, this can get boring. Morepath introduces a new defer_links directive to help automate this:

@UserApp.defer_links(model=WikiPage)
def defer_links_wiki_page(app, obj):
    return app.child(WikiApp(obj.wiki_id))

You have told Morepath that when it wants to create a link to a wiki page it should consult a mounted wiki app? Which wiki id to use is determined by inspecting the wiki page object -- it's assumed it knows in which wiki it belongs in this example.

Now you can just write this in the user app to link to wiki pages:

request.link(some_wiki_page)

Read the nesting applications documentation for more details.

They say something I don't like so they must be lying!

Assuming deceit

People frequently conclude inaccurately that someone who criticizes their community in some way is not only wrong, but also actively malicious; a liar, a manipulator, a troll. This faulty conclusion can lead to a lot of trouble.

I ran into such reasoning again recently, and I thought I'd try to explain to myself how I think normal human impulses can lead to such faulty conclusions. In case they prove useful to anyone else, I'll share my thoughts here.

So let's sketch out how I could react when someone criticizes my community in some way.

  • Someone says something negative about my community. It doesn't have to be extremely negative. Perhaps they just say that some people in that community are nasty to them, or that there's room for some improvement.
  • But I like my community! I've had pleasant experiences! This community helps define my identity. This hits me hard!
  • I don't want to believe these negative things! Cognitive dissonance kicks in.
  • This person must be wrong. What they are saying is false.
  • I need to convince myself and others of this. So I look for evidence. Since I've already drawn the conclusion that this person is wrong, I'm going to easily convinced by this evidence.
  • I just need to find one flaw. I focus on one thing in all the things they say. Perhaps it seems implausible to me. Maybe it has no evidence to back it up. Or maybe I can find some evidence against it.
  • In fact, I don't need to go looking for this flaw myself. Someone else in my community probably has found it already. I can just go with that.
  • I try to focus any discussion concerning this criticism on this one flaw I perceive. I ignore all the rest.
  • In time my community comes up with a whole list of perceived flaws. I focus on these details, not on the whole, and not on those details that do seem to be true.
  • We should be on guard against manipulators, psychopaths, and trolls!
  • Why would someone spread such falsehoods?
  • These flaws are evidence of their deliberate attack on my community!
  • I conclude that this person is lying to maliciously attack my community! Or they are lying to garner sympathy! Or both!

Most of us want to be in a community, we want to preserve and defend it, we want to think well of ourselves and our community, we are on guard against deception, we want the truth and we are inclined to believe people in our community before we believe outsiders. It's very human to go down this path, and people go down it all the time.

But these normal human impulses can lead us to conclude that a whistleblower or a victim or a constructive critic is lying. We conclude that this lie is part of a malicious attack. It is often easier to believe this than to deal with the criticism in another way.

We could instead choose to accept the criticism as partially or completely valid. We do not have to accept the criticism however: even when we reject it we could still have a calm discussion about the topic. We could also conclude that this person has very strange notions without thinking that they are lying or malicious. But unfortunately it's often easier to conclude the other person is a liar or a troll.

Negative consequences

This is obviously a bad outcome for the whistleblower or critic -- and even worse for a victim. It's also a bad outcome for our community -- this defensive behavior can block growth and improvement.

This is bad enough. If only it ended there. But this defensive reaction can then give additional reasons to criticize the community, and this then generates more defensive reactions. A vicious cycle has started.

A community can then build up a convincing list of apologetic arguments over time, entrenching it far more in one position than where it was when the whole thing started. The reactions back and forth can generate a whole lot more "evidence" that people are indeed viciously attacking the community. By that stage, there will be real vicious attacks if there weren't already before.

We end up with a lot of people badly hurt, bitter enemies, broken communities, maybe actual violence.

Compensate

I think there are a number of things we as individuals can do to compensate against these human impulses, to prevent us from going down this path. One useful principle is Postel's Law, adapted from electronic to human communication:

Be careful in what you say, be liberal in what you accept from others.

Another useful notion is Hanlon's razor:

Never attribute to malice that which is adequately explained by stupidity.

I'll give some variations here:

  • Treat criticism as intended to be constructive, even when it doesn't look constructive to you.
  • Before you conclude attack, try context, misinterpretation, perspectives, experiences, disagreement, hyperbole, mistake and being wrong.
  • Don't assume malicious conspiracy. Look for other reasons why people would all act in a certain way.
  • Assume that the attack isn't one, until the evidence for attack becomes vastly overwhelming.
  • When faced with criticism, err on the side of the positive interpretation.
  • When confronted with falsehood, assume a mistake before lying.
  • Do not judge a person before you perceive them.
  • Consider walking away instead of fighting.
  • Your community is strong enough to do without your defense. Your community becomes stronger by constructive endeavors.

There is a possible problem with this strategy. Not everyone is well-intentioned. There are real liars, trolls, manipulators and psychopaths out there. There are those among us who want to try to fan the flames for their own amusement. I think being generous to others in our interpretations can reduce their power to do so. Maybe I'll talk a bit more about this in the future.

Life at the Boundaries: Conversion and Validation

In software development we deal with boundaries between systems.

Examples of boundaries are:

  • Your application code and a database.
  • Your application code and the file system.
  • A web server and your server-side application code.
  • A client-side application and the browser DOM.
  • A client-side application in JavaScript and the web server.

It's important to recognize these boundaries. You want to do things at the boundaries of our application, just after input has arrived into your application across an outer boundary, and just before you send output across an inner boundary.

If you read a file and what's in that file is a string representing a number, you want to convert the string to a number as soon as possible after reading it, so that the rest of your codebase can forget about the file and the string in it, and just deal with the number.

Because if you don't and pass a filename around, you may have to open that file multiple times throughout your codebase. Or if you read from the file and leave the value as a string, you may have to convert it to a number each time you need it. This means duplicated code, and multiple places where things can go wrong. All that is more work, more error prone, and less fun.

Boundaries are our friends. So much so that programming languages give us tools like functions and classes to create new boundaries in software. With a solid, clear boundary in place in the middle of our software, both halves can be easier to understand and easier to manage.

One of the most interesting things that happen on the boundaries in software is conversion and validation of values. I find it very useful to have a clear understanding of these concepts during software development. To understand each other better it's useful to share this understanding out loud. So here is how I define these concepts and how I use them.

I hope this helps some of you see the boundaries more clearly.

Following a HTML form submit through boundaries

Let's look at an example of a value going across multiple boundaries in software. In this example, we have a web form with an input field that lets the user fill in their date of birth as a string in the format 'DD-MM-YYYY'.

I'm going to give examples based on web development. I also give a few tiny examples in Python. The web examples and Python used here only exist to illustrate concepts; similar ideas apply in other contexts. You shouldn't need to understand the details of the web or Python to understand this, so don't go away if you don't.

Serializing a web form to a request string

In a traditional non-HTML 5 HTTP web form, the input type for dates is text`. This means that the dates are in fact not interpreted by the browser as dates at all. It's just a string to the browser, just like adfdafd. The browser does not know anything about the value otherwise, unless it has loaded JavaScript code that checks whether it the input is really a date and shows an error message if it's not.

In HTML 5 there is a new input type called date, but for the sake of this discussion we will ignore it, as it doesn't change all that much in this example.

So when the user submits a form with the birth date field, the inputs in the form are serialized to a longer string that is then sent to the server as the body of a POST request. This serialization happens according to what's specified in the form tag's enctype attribute. When the enctype is multipart/form-data, the request to the server will be a string that looks a lot like this:

POST /some/path HTTP/1.1
Content-type: multipart/form-data, boundary=AaB03x

--AaB03x
content-disposition: form-data; name="birthdate"

21-10-1985
--AaB03x--

Note that this serialization of form input to the multipart/form-data format cannot fail; serialization always succeeds, no matter what form data was entered.

Converting the request string to a Request object

So now this request arrives at the web server. Let's imagine our web server is in Python, and that there's a web framework like Django or Flask or Pyramid or Morepath in place. This web framework takes the serialized HTTP request, that is, the string, and then converts it into a request object.

This request object is much more convenient to work with in Python than the HTTP request string. Instead of having one blob of a string, you can easily check indidivual aspects of the request -- what request method was used (POST), what path the request is for, what the body of the request was. The web framework also recognizes multipart/form-data and automatically converts the request body with the form data into a convenient Python dictionary-like data structure.

Note that the conversion of HTTP request text to request object may fail. This can happen when the client did not actually format the request correctly. The server should then return a HTTP error, in this case 400 Bad Request, so that the client software (or the developer working on the client software) knows something went wrong.

The potential that something goes wrong is one difference between conversion and serialization; both transform the data, but conversion can fail and serialization cannot. Or perhaps better said: if serialization fails it is a bug in the software, whereas conversion can fail due to bad input. This is because serialization goes from known-good data to some other format, whereas conversion deals with input data from an external source that may be wrong in some way.

Thanks to the web framework's parsing of web form into a Python data structure, we can easily get the field birthdate from our form. If the request object was implemented by the Webob library (like for Pyramid and Morepath), we can get it like this:

 >>> request.POST['birthdate']
'21-10-1985'

Converting the string to a date

But the birthdate at this point is still a string 21-10-1985. We now want to convert it into something more convenient to Python. Python has a datetime library with a date type, so we'd like to get one of those.

This conversion could be done automatically by a form framework -- these are very handy as you can declaratively describe what types of values you expect and the framework can then automatically convert incoming strings to convenient Python values accordingly. I've written a few web form frameworks in my time. But in this example we'll do it it manually, using functionality from the Python datetime library to parse the date:

>>> from datetime import datetime
>>> birthdate = datetime.strptime(request.POST['birthdate'], '%d-%m-%Y').date()
datetime.date(1985, 10, 21)

Since this is a conversion operation, it can fail if the user gave input that is not in the right format or is not a proper date Python will raise a ValueError exception in this case. We need to write code that detects this and then signal the HTTP client that there was a conversion error. The client needs to update its UI to inform the user of this problem. All this can get quite complicated, and here again a form framework can help you with this.

It's important to note that we should isolate this conversion to one place in our application: the boundary where the value comes in. We don't want to pass the birth date string around in our code and only convert it into a date when we need to do something with it that requires a date object. Doing conversion "just in time" like that has a lot of problems: code duplication is one of them, but even worse is that we would need worry about conversion errors everywhere instead of in one place.

Validating the date

So now that we have the birth date our web application may want to do some basic checking to see whether it makes sense. For example, we probably don't expect time travellers to fill in the form, so we can safely reject any birth dates set in the future as invalid.

We've already converted the birth date from a string into a convenient Python date object, so validating that the date is not in the future is now easy:

>>> from datetime import date
>>> birthdate <= date.today()
True

Validation needs the value to be in a convenient form, so validation happens after conversion. Validation does not transform the value; it only checks whether the value is valid according to additional criteria.

There are a lot of possible validations:

  • validate that required values are indeed present.
  • check that a value is in a certain range.
  • relate the value to another value elsewhere in the input or in the database. Perhaps the birth date is not supposed to be earlier than some database-defined value, for instance.
  • etc.

If the input passes validation, the code just continues on its merry way. Only when the validation fails do we want to take special action. The minimum action that should be taken is to reject the data and do nothing, but it could also involve sending information about the cause of the validation failure back to the user interface, just like for conversion errors.

Validation should be done just after conversion, at the boundary of the application, so that after that we can stop worrying about all this and just trust the values we have as valid. Our life is easier if we do validation early on like this.

Serialize the date into a database

Now the web application wants to store the birth date in a database. The database sits behind a boundary. This boundary may be clever and allow you to pass in straight Python date objects and do a conversion to its internal format afterward. That would be best.

But imagine our database is dumb and expects our dates to be in a string format. Now the task is up to our application: we need transform the date to a string before the database boundary.

Let's say the database layer expects date strings in the format 'YYYY-MM-DD'. We then have to serialize our Python date object to that format before we pass it into the database:

>>> birthdate.strftime('%Y-%m-%d')
'1985-10-21'

This is serialization and not conversion because this transformation always succeeds.

Concepts

So we have:

Transformation:
Transform data from one type to another. Transformation by itself cannot fail, as it is assumed to always get correct input. It is a bug in the software if it does not. Conversion and serialization both do transformation.
Conversion:
Transform input across a boundary into a more convenient form inside that boundary. Fails if the input cannot be transformed.
Serialization
Transform valid data as output across a boundary into a form convenient to outside. Cannot fail if there are no bugs in the software.
Validation:
Check whether input across a boundary that is already converted to convenient form is valid inside that boundary. Can fail. Does not transform.

Reuse

Conversion just deals with converting one value to another and does not interact with the rest of the universe. The implementation of a converter is therefore often reusable between applications.

The behavior of a converter typically does not depend on state or configuration. If conversion behavior does depend on application state, for instance because you want to parse dates as 'MM-DD-YYYY' instead of 'DD-MM-YYYY', it is often a better approach to just swap in a different converter based on the locale than to have the converter itself to be aware of the locale.

Validation is different. While some validations are reusable across applications, a lot of them will be application specific. Validation success may depend on the state of other values in the input or on application state. Reusable frameworks that help with validation are still useful, but they do need additional information from the application to do their work.

Serialization and parsing

Serialization is transformation of data to a particular type, such as a string or a memory buffer. These types are convenient for communicating across the boundary: storing on the file system, storing data in a database, or passing data through the network.

The opposite of serialization is deserialization and this is done by parsing: this takes data in its serialized form and transforms it into a more convenient form. Parsing can fail if its input is not correct. Parsing is therefore conversion, but not all conversion is parsing.

Parsing extracts information and checks whether the input conforms to a grammar in one step, though if you treat the parser as a black box you can view these as two separate phases: input validation and transformation.

There are transformation operations in an application that do not serialize but can also not fail. I don't have a separate word for these besides "transformation", but they are quite common. Take for instance an operation that takes a Python object and transforms it into a dictionary convenient for serialization to JSON: it can only consist of dicts, lists, strings, ints, floats, bools and None.

Some developers argue that data should always be kept in such a format instead of in objects, as it can encourage a looser coupling between subsystems. This idea is especially prevalent in Lisp-style homoiconic language communities, where even code is treated as data. It is interesting to note that JSON has made web development go in the direction of more explicit data structures as well. Perhaps it is as they say:

Whoever does not understand LISP is doomed to reinvent it.

Input validation

We can pick apart conversion and find input validation inside. Conversion does input validation before transformation, and serialization (and plain transformation) does not.

Input validation is very different from application-level validation. Input validation is conceptually done just before the convenient form is created, and is an inherent part of the conversion. In practice, a converter typically parses data, doing both in a single step.

I prefer to reserve the term "validation" for application-level validation and discuss input validation only when we talk about implementing a converter.

But sometimes conversion from one perspective is validation from another.

Take the example above where we want to store a Python date in a database. What if this operation does not work for all Python date objects? The database layer could accept dates in a different range than the one supported by the Python date object. The database may therefore may therefore be offered a date that is outside of its range and reject it with an error.

We can view this as conversion: the database converts a date value that comes in, and this conversion may fail. But we can also view this in another way: the database transforms the date value that comes in, and then there is an additional validation that may fail. The database is a black box and both perspectives work. That comes in handy a little bit later.

Validation and layers

Consider a web application with an application-level validation layer, and another layer of validation in the database.

Maybe the database also has a rule to make sure that the birth date is not in the future. It gives an error when we give a date in the future. Since validation errors can now occur at the database layer, we need to worry about properly handling them.

But transporting such a validation failure back to the user interface can be tricky: we are on the boundary between application code and database at this point, far from the boundary between application and user interface. And often database-level validation failure messages are in a form that is not very informative to a user; they speak in terms of the database instead of the user.

We can make our life easier. What we can do is duplicate any validation the database layer does at the outer boundary of our application, the one facing the web. Validation failures there are relatively simple to propagate back to the user interface. Since any validation errors that can be given by the database have already been detected at an earlier boundary before the database is ever reached, we don't need to worry about handling database-level validation messages anymore. We can act as if they don't exist, as we've now guaranteed they cannot occur.

We treat the database-level validation as an extra sanity check guarding against bugs in our application-level code. If validation errors occur on the database boundary, we have a bug, and this should not happen, and we can just report a general error: on the web this is a 500 internal server error. That's a lot easier to do.

The general principle is: if we do all validations that the boundary to a deeper layer already needs at a higher layer, we can effectively the inner boundary as not having any validations. The validations in the deeper layer then only exist as extra checks that guard against bugs in the validations at the outer boundary.

We can also apply this to conversion errors: if we already make sure we clean up the data with validations at an outer boundary before it reaches an inner boundary that needs to do conversions, the conversions cannot fail. We can treat them as transformations again. We can do this as in a black box we can treat any conversion as a combination of transformation and validation.

Validation in the browser

In the end, let's return to the web browser.

We've seen that doing validation at an outer boundary can let us ignore validation done deeper down in our code. We do validation once when values come into the web server, and we can forget about doing them in the rest of our server code.

We can go one step further. We can lift our validation out of the server, into the client. If we do our validation in JavaScript when the user inputs values into the web form, we are in the right place to give really accurate user interface feedback in easiest way possible. Validation failure information has to cross from JavaScript to the browser DOM and that's it. The server is not involved.

We cannot always do this. If our validation code needs information on the server that cannot be shared securily or efficiently with the client, the server is still involved in validation, but at least we can still do all the user interface work in the client.

Even if we do not need server-side validation for the user interface, we cannot ignore doing server-side validation altogether, as we cannot guarantee that our JavaScript program is the only program that sends information to the server. Through that route, or because of bugs in our JavaScript code, we can still get input that is potentially invalid. But now if the server detects invalid information, it does not need do anything complicated to report validation errors to the client. Instead it can just generate an internal server error.

If we could somehow guarantee that only our JavaScript program is the one that sends information to the server, we could forgo doing validation on the server altogether. Someone more experienced in the arts of encryption may be able to say whether this is possible. I suspect the answer will be "no", as it usually is with JavaScript in web browsers and encryption.

In any case, we may in fact want to encourage other programs to use the same web server; that's the whole idea behind offering HTTP APIs. If this is our aim, we need to handle validation on the server as well, and give decent error messages.