Life at the Boundaries: Conversion and Validation    Posted: 2014-09-26 13:50


In software development we deal with boundaries between systems.

Examples of boundaries are:

  • Your application code and a database.
  • Your application code and the file system.
  • A web server and your server-side application code.
  • A client-side application and the browser DOM.
  • A client-side application in JavaScript and the web server.

It's important to recognize these boundaries. You want to do things at the boundaries of our application, just after input has arrived into your application across an outer boundary, and just before you send output across an inner boundary.

If you read a file and what's in that file is a string representing a number, you want to convert the string to a number as soon as possible after reading it, so that the rest of your codebase can forget about the file and the string in it, and just deal with the number.

Because if you don't and pass a filename around, you may have to open that file multiple times throughout your codebase. Or if you read from the file and leave the value as a string, you may have to convert it to a number each time you need it. This means duplicated code, and multiple places where things can go wrong. All that is more work, more error prone, and less fun.

Boundaries are our friends. So much so that programming languages give us tools like functions and classes to create new boundaries in software. With a solid, clear boundary in place in the middle of our software, both halves can be easier to understand and easier to manage.

One of the most interesting things that happen on the boundaries in software is conversion and validation of values. I find it very useful to have a clear understanding of these concepts during software development. To understand each other better it's useful to share this understanding out loud. So here is how I define these concepts and how I use them.

I hope this helps some of you see the boundaries more clearly.

Following a HTML form submit through boundaries

Let's look at an example of a value going across multiple boundaries in software. In this example, we have a web form with an input field that lets the user fill in their date of birth as a string in the format 'DD-MM-YYYY'.

I'm going to give examples based on web development. I also give a few tiny examples in Python. The web examples and Python used here only exist to illustrate concepts; similar ideas apply in other contexts. You shouldn't need to understand the details of the web or Python to understand this, so don't go away if you don't.

Serializing a web form to a request string

In a traditional non-HTML 5 HTTP web form, the input type for dates is text`. This means that the dates are in fact not interpreted by the browser as dates at all. It's just a string to the browser, just like adfdafd. The browser does not know anything about the value otherwise, unless it has loaded JavaScript code that checks whether it the input is really a date and shows an error message if it's not.

In HTML 5 there is a new input type called date, but for the sake of this discussion we will ignore it, as it doesn't change all that much in this example.

So when the user submits a form with the birth date field, the inputs in the form are serialized to a longer string that is then sent to the server as the body of a POST request. This serialization happens according to what's specified in the form tag's enctype attribute. When the enctype is multipart/form-data, the request to the server will be a string that looks a lot like this:

POST /some/path HTTP/1.1
Content-type: multipart/form-data, boundary=AaB03x

--AaB03x
content-disposition: form-data; name="birthdate"

21-10-1985
--AaB03x--

Note that this serialization of form input to the multipart/form-data format cannot fail; serialization always succeeds, no matter what form data was entered.

Converting the request string to a Request object

So now this request arrives at the web server. Let's imagine our web server is in Python, and that there's a web framework like Django or Flask or Pyramid or Morepath in place. This web framework takes the serialized HTTP request, that is, the string, and then converts it into a request object.

This request object is much more convenient to work with in Python than the HTTP request string. Instead of having one blob of a string, you can easily check indidivual aspects of the request -- what request method was used (POST), what path the request is for, what the body of the request was. The web framework also recognizes multipart/form-data and automatically converts the request body with the form data into a convenient Python dictionary-like data structure.

Note that the conversion of HTTP request text to request object may fail. This can happen when the client did not actually format the request correctly. The server should then return a HTTP error, in this case 400 Bad Request, so that the client software (or the developer working on the client software) knows something went wrong.

The potential that something goes wrong is one difference between conversion and serialization; both transform the data, but conversion can fail and serialization cannot. Or perhaps better said: if serialization fails it is a bug in the software, whereas conversion can fail due to bad input. This is because serialization goes from known-good data to some other format, whereas conversion deals with input data from an external source that may be wrong in some way.

Thanks to the web framework's parsing of web form into a Python data structure, we can easily get the field birthdate from our form. If the request object was implemented by the Webob library (like for Pyramid and Morepath), we can get it like this:

 >>> request.POST['birthdate']
'21-10-1985'

Converting the string to a date

But the birthdate at this point is still a string 21-10-1985. We now want to convert it into something more convenient to Python. Python has a datetime library with a date type, so we'd like to get one of those.

This conversion could be done automatically by a form framework -- these are very handy as you can declaratively describe what types of values you expect and the framework can then automatically convert incoming strings to convenient Python values accordingly. I've written a few web form frameworks in my time. But in this example we'll do it it manually, using functionality from the Python datetime library to parse the date:

>>> from datetime import datetime
>>> birthdate = datetime.strptime(request.POST['birthdate'], '%d-%m-%Y').date()
datetime.date(1985, 10, 21)

Since this is a conversion operation, it can fail if the user gave input that is not in the right format or is not a proper date Python will raise a ValueError exception in this case. We need to write code that detects this and then signal the HTTP client that there was a conversion error. The client needs to update its UI to inform the user of this problem. All this can get quite complicated, and here again a form framework can help you with this.

It's important to note that we should isolate this conversion to one place in our application: the boundary where the value comes in. We don't want to pass the birth date string around in our code and only convert it into a date when we need to do something with it that requires a date object. Doing conversion "just in time" like that has a lot of problems: code duplication is one of them, but even worse is that we would need worry about conversion errors everywhere instead of in one place.

Validating the date

So now that we have the birth date our web application may want to do some basic checking to see whether it makes sense. For example, we probably don't expect time travellers to fill in the form, so we can safely reject any birth dates set in the future as invalid.

We've already converted the birth date from a string into a convenient Python date object, so validating that the date is not in the future is now easy:

>>> from datetime import date
>>> birthdate <= date.today()
True

Validation needs the value to be in a convenient form, so validation happens after conversion. Validation does not transform the value; it only checks whether the value is valid according to additional criteria.

There are a lot of possible validations:

  • validate that required values are indeed present.
  • check that a value is in a certain range.
  • relate the value to another value elsewhere in the input or in the database. Perhaps the birth date is not supposed to be earlier than some database-defined value, for instance.
  • etc.

If the input passes validation, the code just continues on its merry way. Only when the validation fails do we want to take special action. The minimum action that should be taken is to reject the data and do nothing, but it could also involve sending information about the cause of the validation failure back to the user interface, just like for conversion errors.

Validation should be done just after conversion, at the boundary of the application, so that after that we can stop worrying about all this and just trust the values we have as valid. Our life is easier if we do validation early on like this.

Serialize the date into a database

Now the web application wants to store the birth date in a database. The database sits behind a boundary. This boundary may be clever and allow you to pass in straight Python date objects and do a conversion to its internal format afterward. That would be best.

But imagine our database is dumb and expects our dates to be in a string format. Now the task is up to our application: we need transform the date to a string before the database boundary.

Let's say the database layer expects date strings in the format 'YYYY-MM-DD'. We then have to serialize our Python date object to that format before we pass it into the database:

>>> birthdate.strftime('%Y-%m-%d')
'1985-10-21'

This is serialization and not conversion because this transformation always succeeds.

Concepts

So we have:

Transformation:
Transform data from one type to another. Transformation by itself cannot fail, as it is assumed to always get correct input. It is a bug in the software if it does not. Conversion and serialization both do transformation.
Conversion:
Transform input across a boundary into a more convenient form inside that boundary. Fails if the input cannot be transformed.
Serialization
Transform valid data as output across a boundary into a form convenient to outside. Cannot fail if there are no bugs in the software.
Validation:
Check whether input across a boundary that is already converted to convenient form is valid inside that boundary. Can fail. Does not transform.

Reuse

Conversion just deals with converting one value to another and does not interact with the rest of the universe. The implementation of a converter is therefore often reusable between applications.

The behavior of a converter typically does not depend on state or configuration. If conversion behavior does depend on application state, for instance because you want to parse dates as 'MM-DD-YYYY' instead of 'DD-MM-YYYY', it is often a better approach to just swap in a different converter based on the locale than to have the converter itself to be aware of the locale.

Validation is different. While some validations are reusable across applications, a lot of them will be application specific. Validation success may depend on the state of other values in the input or on application state. Reusable frameworks that help with validation are still useful, but they do need additional information from the application to do their work.

Serialization and parsing

Serialization is transformation of data to a particular type, such as a string or a memory buffer. These types are convenient for communicating across the boundary: storing on the file system, storing data in a database, or passing data through the network.

The opposite of serialization is deserialization and this is done by parsing: this takes data in its serialized form and transforms it into a more convenient form. Parsing can fail if its input is not correct. Parsing is therefore conversion, but not all conversion is parsing.

Parsing extracts information and checks whether the input conforms to a grammar in one step, though if you treat the parser as a black box you can view these as two separate phases: input validation and transformation.

There are transformation operations in an application that do not serialize but can also not fail. I don't have a separate word for these besides "transformation", but they are quite common. Take for instance an operation that takes a Python object and transforms it into a dictionary convenient for serialization to JSON: it can only consist of dicts, lists, strings, ints, floats, bools and None.

Some developers argue that data should always be kept in such a format instead of in objects, as it can encourage a looser coupling between subsystems. This idea is especially prevalent in Lisp-style homoiconic language communities, where even code is treated as data. It is interesting to note that JSON has made web development go in the direction of more explicit data structures as well. Perhaps it is as they say:

Whoever does not understand LISP is doomed to reinvent it.

Input validation

We can pick apart conversion and find input validation inside. Conversion does input validation before transformation, and serialization (and plain transformation) does not.

Input validation is very different from application-level validation. Input validation is conceptually done just before the convenient form is created, and is an inherent part of the conversion. In practice, a converter typically parses data, doing both in a single step.

I prefer to reserve the term "validation" for application-level validation and discuss input validation only when we talk about implementing a converter.

But sometimes conversion from one perspective is validation from another.

Take the example above where we want to store a Python date in a database. What if this operation does not work for all Python date objects? The database layer could accept dates in a different range than the one supported by the Python date object. The database may therefore may therefore be offered a date that is outside of its range and reject it with an error.

We can view this as conversion: the database converts a date value that comes in, and this conversion may fail. But we can also view this in another way: the database transforms the date value that comes in, and then there is an additional validation that may fail. The database is a black box and both perspectives work. That comes in handy a little bit later.

Validation and layers

Consider a web application with an application-level validation layer, and another layer of validation in the database.

Maybe the database also has a rule to make sure that the birth date is not in the future. It gives an error when we give a date in the future. Since validation errors can now occur at the database layer, we need to worry about properly handling them.

But transporting such a validation failure back to the user interface can be tricky: we are on the boundary between application code and database at this point, far from the boundary between application and user interface. And often database-level validation failure messages are in a form that is not very informative to a user; they speak in terms of the database instead of the user.

We can make our life easier. What we can do is duplicate any validation the database layer does at the outer boundary of our application, the one facing the web. Validation failures there are relatively simple to propagate back to the user interface. Since any validation errors that can be given by the database have already been detected at an earlier boundary before the database is ever reached, we don't need to worry about handling database-level validation messages anymore. We can act as if they don't exist, as we've now guaranteed they cannot occur.

We treat the database-level validation as an extra sanity check guarding against bugs in our application-level code. If validation errors occur on the database boundary, we have a bug, and this should not happen, and we can just report a general error: on the web this is a 500 internal server error. That's a lot easier to do.

The general principle is: if we do all validations that the boundary to a deeper layer already needs at a higher layer, we can effectively the inner boundary as not having any validations. The validations in the deeper layer then only exist as extra checks that guard against bugs in the validations at the outer boundary.

We can also apply this to conversion errors: if we already make sure we clean up the data with validations at an outer boundary before it reaches an inner boundary that needs to do conversions, the conversions cannot fail. We can treat them as transformations again. We can do this as in a black box we can treat any conversion as a combination of transformation and validation.

Validation in the browser

In the end, let's return to the web browser.

We've seen that doing validation at an outer boundary can let us ignore validation done deeper down in our code. We do validation once when values come into the web server, and we can forget about doing them in the rest of our server code.

We can go one step further. We can lift our validation out of the server, into the client. If we do our validation in JavaScript when the user inputs values into the web form, we are in the right place to give really accurate user interface feedback in easiest way possible. Validation failure information has to cross from JavaScript to the browser DOM and that's it. The server is not involved.

We cannot always do this. If our validation code needs information on the server that cannot be shared securily or efficiently with the client, the server is still involved in validation, but at least we can still do all the user interface work in the client.

Even if we do not need server-side validation for the user interface, we cannot ignore doing server-side validation altogether, as we cannot guarantee that our JavaScript program is the only program that sends information to the server. Through that route, or because of bugs in our JavaScript code, we can still get input that is potentially invalid. But now if the server detects invalid information, it does not need do anything complicated to report validation errors to the client. Instead it can just generate an internal server error.

If we could somehow guarantee that only our JavaScript program is the one that sends information to the server, we could forgo doing validation on the server altogether. Someone more experienced in the arts of encryption may be able to say whether this is possible. I suspect the answer will be "no", as it usually is with JavaScript in web browsers and encryption.

In any case, we may in fact want to encourage other programs to use the same web server; that's the whole idea behind offering HTTP APIs. If this is our aim, we need to handle validation on the server as well, and give decent error messages.

Comments

BowerStatic 0.4 released!    Posted: 2014-09-08 16:10


What's BowerStatic? It's a little WSGI framework application for Python that is easy to plug into any WSGI web framework. What you can do with it is declare in Python code that you want some Bower package included in the web page. It knows about dependencies and such. Like Fanstatic but for Bower.

http://bowerstatic.readthedocs.org/en/latest/

I've released BowerStatic 0.4. This fixes a bug and clears away some technical debt that's been accumulating in BowerStatic for a little while and was causing bugs. Goes to show that instead of doing workarounds, be bold and refactor things a bit more heavily -- life gets better and refactoring doesn't have to take a lot of time, especially if you have proper automated tests.

BowerStatic has Morepath integration in the form of more.static.

Comments

Morepath 0.6 released!    Posted: 2014-09-08 15:55


What's Morepath? Morepath is your friendly neighborhood web framework with super powers. It lets you easily create links between resources, and offers a range of mechanisms that allow you to better organize and reuse code. Morepath is geared towards this modern age of the web where more and more UI logic is moving into JavaScript, into the browser -- it does this by being great at creating RESTful hypermedia APIs.

Besides a few documentation fixes, Morepath 0.6 has a minor improvement and a major improvement.

Both improvements have to do with a relatively obscure use case that I ran into lately. Application composition should be an important feature in a modern web framework, and so does linking, but we only rarely see things like this. That we run into use cases like this goes to show just how far Morepath is ahead in exploring this area. See nesting applications and linking to things in other apps for more information on these subsystems of Morepath.

The major improvement is the ability to link to other applications by the name under which they've been mounted into their parent. By default the name is the path under which they were mounted. Imagine you have the following URL space:

/v1/
/v1/a
/v2/a

You can model this as two applications, A and B that are mounted under a core application mounted at v1. That would look like this in Morepath:

class V1(morepath.App):
    pass

# makes a root object exist under /v1
@app.path(path='/v1')
class Root(object):
    pass

class A(morepath.App):
    pass

class B(morepath.App):
    pass

// mounts everything in app A under /v1/a
@V1.mount(app=A, path='a')
def a_context():
    return {}

// mounts everything in app B under /v1/b
@V1.mount(app=B, path='b')
def b_context():
    return {}

Consider how you'd make a link from app A to a resource in app B given this setup. In Morepath before 0.6, you'd have to write:

request.parent.child(B).link(obj)

This would create a link to whatever obj is (which depends on its path), for instance:

/v1/b/items/3

The minor improvement is that we realized the .parent.child combination happens a lot and we've introduced a new sibling method to combine them in one step:

request.sibling(B).link(obj)

Now considers what happens when a new incompatible version of your overall API arises, because you've changed something fundamentally in app B. Perhaps items appear on a /foos path instead of an /items path, like:

/v2/b/foos/3

You've not changed anything in app A though. What you'd like to do is mount the new B and the old A into a V2 app and have everything work as expected:

class V2(morepath.App):
    pass

// mounts everything in app A under /v2/a
@V2.mount(app=A, path='a')
def a_context():
    return {}

// mounts everything in app NewB under /v2/b
@V2.mount(app=NewB, path='b')
def b_context():
    return {}

But this is problematic, as we have a hardcoded dependency on app B in app A in the link generation code. Now we'd like to link to app NewB instead of B. But we'd want the original v1 URLs to still work as before, so we can't just modify app A so to include a link to NewB. So in /v1/a we'd like links to look like this:

/v1/b/items/3

But in /v2/a we'd like links to go to the new place in NewB:

/v2/b/foos/3

The solution is the new ability to find mount applications by name instead of by class. By the default the name is the same as the path argument you give in the mount directive.

If you write linking code in app A to read like this:

request.sibling('b').link(obj)

there is no more hardcoded dependency on app B. Instead the system now relies on the sibling app mounted under b to create the link, whatever it may be. And if A is mounted under /v1 the sibling will be B, but if it's mounted under /v2 the sibling will be NewB. So the links will be correct in both cases, and we're saved!

Comments

Morepath 0.5(.1) and friends released!    Posted: 2014-08-28 17:10


I've just released a whole slew things of things, the most important is Morepath 0.5, your friendly neighborhood Python web framework with superpowers!

What's new?

There are a a bunch of new things in the documentation, in particular:

Also available is @reg.classgeneric. This depends on a new feature in the Reg library.

There are a few bug fixes as well.

For more details, see the full changelog.

Morepath mailing list

I've documented how to get in touch with the Morepath community. In particular, there's a new Morepath mailing list!

Please do get in touch!

Other releases

I've also released:

  • Reg 0.8. This is the generic function library behind some of Morepath's flexibility and power.
  • BowerStatic 0.3. This is a WSGI framework for including static resources in HTML pages automatically, using components installed with Bower.
  • more.static 0.2. This is a little library integrating BowerStatic with Morepath.

Morepath videos!

You may have noticed I linked to Morepath 0.5.1 before, not Morepath 0.5. This is because I had to as I was using a new youtube extension that gave me a bit too much on readthedocs. I replaced that with raw HTML, which works better. The Morepath docs now include two videos.

  • On the homepage is my talk about Morepath at EuroPython 2014 in July. It's a relatively short talk, and gives a good idea on what makes Morepath different.
  • If you're interested in the genesis and history behind Morepath, and general ideas on what it means to be a creative developer, you can find another, longer, video on the Morepath history page. This was taken last year at PyCon DE, where I had the privilege to be invited to give a keynote speech.

Comments

New HTTP 1.1 RFCs versus WSGI    Posted: 2014-08-19 12:35


Recently new HTTP 1.1 RFCs were published that obsolete the old HTTP 1.1 RFCs. They are extensively rewritten.

Unfortunately the WSGI PEP 3333 refers to something only made explicit in the old version of the RFCs, but which is much harder to find in the new versions of the RFCs. I thought I'd leave a report of my investigations here so that others who may run into this in the future can find it.

WSGI is a protocol that's like HTTP but isn't quite HTTP. In particular WSGI defines its own iterator-based way to send larger responses out in smaller parts. It therefore cannot deal with so-called "hop-by-hop" headers, which try to control this behavior on a HTTP level. The WSGI spec says a WSGI application must not generate such headers.

This is relevant when you're dealing with a WSGI-over-HTTP proxy. This is a special WSGI application that talks to an underlying HTTP server. It presents itself as a normal WSGI application.

The underlying HTTP server could very well be sending out stuff like such as Transfer-Encoding: chunked. The WSGI spec does not allow a WSGI application to send them out though, so a WSGI proxy must strip these headers out.

So what headers are to be stripped out? The WSGI spec refers to section 13.5.1 in now-obsolete RFC 2616.

This nicely lists hop-by-hop headers:

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • TE
  • Trailers
  • Transfer-Encoding
  • Upgrade

That RFC also says:

"All other headers defined by HTTP/1.1 are end-to-end headers."

and then confusingly:

"Other hop-by-hop headers MUST be listed in a Connection header, (section 14.10) to be introduced into HTTP/1.1 (or later)."

which one is it, HTTP 1.1? I guess that's one of the reasons this text got rewritten.

In the new rewritten version of HTTP 1.1, this list is gone. Instead it specifies for some headers (such as TE and Upgrade) that these should be added to the Connection field. A HTTP proxy can then strip out the headers listed in Connection, and then also strip out Connection itself.

Confusingly, while the new RFC 7230 refers to the concept of 'hop-by-hop' early on, and also say this in the change notes in A.2:

"Also, "hop-by-hop" header fields are required to appear in the Connection header field; just because they're defined as hop- by-hop in this specification doesn't exempt them."

it doesn't actually say any headers are hop-by-hop anywhere else. Instead it mandates some headers should be added to Connection.

But wait: Transfer-Encoding is not to be listed in the Connection header, as it's not hop-by-hop. At least, not anymore. I've seen it described as 'hopX-by-hopY', but not in the RFC. This is, I think, because a HTTP proxy could let these through without having to remove them. But not for a WSGI over HTTP proxy: it MUST remove Transfer-Encoding, as WSGI applications have no such concept.

I think the WSGI PEP should be updated in terms of the new HTTP RFC. It should make explicit that some headers such as Transfer-Encoding must not be specified by a WSGI app, and that no headers that must be listed in Connection can be specified by a WSGI app, or something like that.

Relevant mailing list thread:

http://lists.w3.org/Archives/Public/ietf-http-wg/2014JulSep/thread.html#msg1710

Comments

Against "contrib"    Posted: 2014-08-11 12:55


It's pretty common for an open source project to have a "contrib" directory as part of its project structure. This contains useful code donated to the project by outsiders. It seems innocuous. A contrib section, why not?

I don't like contrib. A contrib directory gives the signal that "yes, we carry this source code around, but it's not really part of our project". What does that mean? Why is it even part of your project at all then? Why isn't this code distributed in library form instead? I'd much prefer the project to be smaller instead, as in that case I wouldn't have to worry about the contrib code at all.

Perhaps in the case of your project, placing code in contrib doesn't really mean "it's not really part of our project". Perhaps the code in contrib is meant to be a fully supported part of project's codebase. If so, why use the name "contrib" at all? It doesn't signal anything functional -- it only signals something about origins, which is why people should suspect any claim that it's a fully integral part of the project. Projects, instead of dumping something in contrib, just put that code in its appropriate place and really own it.

Arguments for contrib

One argument for a contrib section is that by placing code there, the tests are automatically run for it each time you run the tests in the core code. This way a project is in a position to fix obvious breakages in this code before release.

There's a problem with this approach: more subtle breakages run the risk of being undetected, and nobody is clearly in charge of guarding against that, because the code isn't really owned by the project or the contributor anymore. It's in this weird contrib half-way house.

Besides, we have plenty of experience as an open source community with developing extension code that lives outside of a project. Making sure extensions don't break and get fixed when they do requires communication between core library authors and extension authors. I think it's mostly an illusion that by placing the code in contrib you could do away with such communication -- if a project really wants to do away with communication, really own the code.

Placing code in contrib is not a substitute for communication.

That's not to say the current infrastructure cannot be improved to help communication. For instance, in the Python world the devpi project is exploring ways to automatically run the tests for dependent projects to see whether you caused any breakage in them.

Another argument for a contrib section has to do with discovery. As a user of your project I can look through contrib for anything useful there. I don't have to go and google for it instead. Of course googling is really easy anyway, but...

If you want to make discovery easy, then add a list of useful extensions to your project to the project's documentation. Many projects with a contrib directory do this anyway. But that already takes care of discovery; no reason to add the code to "contrib".

And again, infrastructure can to help support this -- it is useful to be able to discover what projects depend on a project. Linux package managers generally can tell you this, but I can see how language-specific ecosystems can offer more support for this too. For a Python specific example, it would be useful if PyPI had an easy way to discover all projects that depend on another one.

Effects on contribution

As an open source project developer you should want to attract contributions to your project. When you add code to "contrib", you tell a contributor "your contribution is not a full and equal part of this project". That's not a great way to expand your project's list of core contributors...

And you are a new contributor who wants to improve something in the contrib of a project, who do you even talk to? You might be worried that the project owner will say: sorry, that code is in contrib, I don't care about improving it. Since people are less confident that the project even cares about code in "contrib", that discourages them from trying to contribute to that code

Summary

Don't add code to a "contrib" section of your project. "contrib", paradoxically, can have a chilling effect on contribution. Either maintain that code externally entirely, or make your project really own that code.

Comments

On Naming In Open Source    Posted: 2014-07-29 16:30


Here are some stories on how you can go wrong with naming, especially in open source software.

Easy

Don't use the name "easy" or "simple" in your software as it won't be and people will make fun of it.

Background

People tend to want to use the word 'easy' or 'simple' when things really are not, to describe a facade. They want to paper over immense complexity. Inevitably the facade will be a leaky abstraction, and developers using the software are exposed to it. And now you named it 'easy', when it's anything but not. Just don't give in to the temptation in the first place, and people won't make fun of it.

Examples

easy_install is a Python tool to easily and automatically install Python packages, similar to JavaScript npm or Ruby gems. pip is a more popular tool these days that does the same. easy_install hides, among many other complicated things, a full-fledged web scraper that follows links onto arbitrary websites to find packages. It's "easy" until it fails, and it will fail at one point or another.

SimpleItem is an infamous base class in Zope 2 that pulls in just about every aspect of Zope 2 as mixin classes. It's supposed to make it easy to create a new content type for Zope. The amount of methods made available is truly intimidating and anything but simple.

Demo

Don't use the word "demo" or "sample" in your main codebase or people will depend on it and you will be stuck with it forever.

Background

It's tempting in some library or framework consisting of many parts to want to expose an integrated set of pieces, just as an example, within that codebase itself. Real use of it will of course have the developers integrating those pieces themselves. Except they won't, and now you have people using Sample stuff in real world code.

The word Sample or Demo is fine if the entire codebase is a demo, but it's not fine as part of a larger codebase.

Examples

SampleContainer was a part of Zope 3 that serves as the base class of most actual container subclasses in real world code. It was just supposed to demonstrate how to do the integration.

Rewrite

Don't reuse the name of software for an incompatible rewrite, unless you want people to be confused about it.

Background

Your software has a big installed base. But it's not perfect. You decide to create a new, incompatible version, without a clear upgrade path. Perhaps you handwave the upgrade path "until later", but that then never happens.

Just name the new version something else. Because the clear upgrade path may never materialize, and people will be confused anyway. They will find documentation and examples for the old system if they search for the new one, and vice versa. Spare your user base that confusion.

The temptation to do this is great; you want to benefit from popularity of the name of the old system and this way attract users to the shiny new system. But that's exactly the situation where doing this is most confusing.

Examples

Zope 3: there was already a very popular Zope 2 around, and then we decide to completely rewrite it and named it "Zope 3". Some kind of upgrade path was promised but conveniently handwaved. Immense confusion arose. We then landed pieces of Zope 3 in the old Zope 2 codebase, and it took years to resolve all the confusion.

Company name

If you want a open source community, don't name the software after your company, or your company after the software.

Background

If you have a piece of open source software and you want an open source community of developers for it, then don't name it after your company. You may love your company, but outside developers get a clear indication that "the Acme Platform" is something that is developed by Acme. They know that as outside developers, they will never gain as much influence on the development of that software as developers working at Acme. So they just don't contribute. They go to other open source software that isn't so clearly allied to a single business and contribute there. And you are left to wonder why developers are not attracted to work on your software.

Similarly, you may have great success with an open source project and now want to name your own company after it. That sends a powerful signal of ownership to other stakeholders, and may deter them from contributing.

Of course naming is only a part of what makes an open source project look like something a developer can safely contribute to. But if you get the naming bit wrong, it's hard to get the rest right.

Add the potential entanglement into trademark politics on top of it, and just decide not to do it.

Examples

Examples omitted so I won't get into trouble with anyone.

Comments

My visit to EuroPython 2014    Posted: 2014-07-28 12:00


I had a fun time at EuroPython 2014 in Berlin last week. It was a very well organized conference and I enjoyed meeting old friends again as well as meeting new people. Before I went I was a bit worried with the amount of attendees it'd feel too massive; I had that experience at a PyCon in the US a few years ago. But I was pleasantly surprised it didn't -- it felt like a smaller conference, and I liked it.

Another positive thing that stood out was a larger diversity; there seemed to be more people from central and eastern Europe there than before, and most of all, there were more women. It was underscored by a 13 year old girl giving a lightning talk -- that was just not going to happen at EuroPython 5 years ago.

This is a very positive trend and I hope it continues. I know it takes a lot of work on the part of the organizers to get this far.

I gave a talk at EuroPython myself this year, and I think it went well:

Comments

Morepath 0.4.1 released (with Python 3 fixes)    Posted: 2014-07-08 12:55


I just released Morepath 0.4.1. This fixes a regression with Python 3 compatibility and has a few other minor tweaks to bring test coverage back up to 100%.

I had broken Python 3 support in Morepath 0.4. I'm still not in the habit of running 'tox' before a release, so I find out about these problems too late.

I'll go into a bit of detail about this issue, as it's a mildly amusing example of writing Python code being more complicated than it should be.

Morepath 0.4 broke in Python 3 because I introduced a metaclass for the morepath.App class. I usually avoid metaclasses as they are a source of unpredictability and complexity, but the best solution I saw here was one. It's a very limited one.

One task of the metaclass is to attach to the class with Venusian. Venusian is a library that lets you write decorators that don't execute during import time but later. This is nice as import time side effects can be a source of trouble.

Venusian also lets you attach a callback to a Python object (such as a class) outside of a decorator. That's what I was doing; attaching to a class, in my metaclass.

Venusian determines in what context the decorator was called, such as module-level and class-level, so you can use that later. For this it inspects the Python stack frame of its caller.

My first attempt to make the metaclass work in Python 3 was to use the with_metaclass functionality from the future compatibility layer. I am using this library anyway in Reg, which is a dependency of Morepath, so using it would not introduce a new dependency for Morepath.

Unfortunately after making that change my tests broke in both Python 2 and Python 3. That's not an improvement over having the tests being broken in just Python 2!

It appears that with_metaclass introduces a new stack frame into the mix somewhere, which breaks Venusian's assumptions. Now Venusian's attach has a depth argument to determine where in the stack to check, so I increased the stack depth by one and ran the tests again. Less tests broke than before, but quite a few still did. I think the cause is that the stack depth of with_metaclass is just not consistent for whatever reason.

Digging around in the future package I saw it includes a copy of six, another compatibility layer project. six has a name close to my heart -- long ago I originated the Five project for compatibility between Zope 2 and Zope 3.

That copy of six had another version of with_metaclass. I tried using future.util.six.with_metaclass, and hey, it all started working suddenly. All tests passed, in both Python 2 and Python 3. Yay!

Okay then, I figured, I don't want to depend on a copy of six that just happens to be lying about in future. It's not part of its public API as far as I understand. So I figured I should introduce a new dependency for Morepath after all, on six. It's not a big deal; Morepath's testing dependencies include WebTest, and this already has a dependency on six.

But when I pulled in six proper, I got a newer version of it than the one in future.util.six, and it caused the same test breakages as with future. Argh!

So I copied the code from old-six into Morepath's compat module. It's a two-liner anyway. It works for me. Morepath 0.4.1 done and released.

But I don't know why six had to change its version, and why future's version is different. It worries me -- they probably have good reasons. Are those reasons going to break my code at some point in the future?

Being a responsible open source citizen, I left bug reports about my experiences in the six and future issue trackers:

https://bitbucket.org/gutworth/six/issue/83/with_meta-and-stack-frame-issues#comment-11125428

https://github.com/PythonCharmers/python-future/issues/75

I much prefer writing Python code. Polyglot is an inferior programming language as it introduces complexities like this. But Polyglot is what we got.

Comments

Morepath 0.4 and breaking changes    Posted: 2014-07-07 16:15


I've just released Morepath 0.4!

Morepath 0.4 is a Python web framework that's small ("micro") and packs a lot of power. There are a lot of facilities for application reuse. And as opposed to most web frameworks, it actually has some intelligence about generating hyperlinks to objects.

Morepath 0.4 has a breaking change to the way application reuse works. Don't worry, you can fix your code by making a few minor changes. In short, Morepath application objects are now classes, not instances, and you can instantiate this class to get a WSGI object. See the CHANGES for a lot of details on what happened and what you need to do.

The big win is that application reuse in Morepath has become Python subclassing, and that making a WSGI application (even a parameterized one) is just instantiating the class.

The other win is that Morepath gained even more extensibility features, namely the ability for Morepath extension to introduce new Morepath directives (the decorators you see everywhere in Morepath examples). But I can't talk too much about that until I document them properly.

Along with the new Morepath, I've also made the initial release of BowerStatic (announcement). BowerStatic is the WSGI framework that lets you easily include bower-installed resources in your web page and do the right thing with caching (forever, thank you, but on a separate URL for each version).

How does that relate to Morepath, you may ask? Well, today I've also released the Morepath integration for BowerStatic, more.static. I've described in the Morepath documentation what to do to get it working in your Morepath project. The reason Morepath 0.4 had the breaking change was in part to support more.static, which needed the ability to introduce a new Morepath directive among other things.

Comments

Contents © 2005-2014 Martijn Faassen | Twitter | Github | Gittip