Skip to main content

under-engineering, over-engineering, right-engineering

I just ran into a post in a series called "Tools of The Effective Developer". This one is called Make It Work - First!.

The posting contains very good advice. I call the two extremes described "under-engineering" and "over-engineering". It's in part a personality trait as the original posting says, but in part it's also something that changes (hopefully!) as a developer matures.

Inexperienced developers who are still learning the basics of the craft tend to under-engineer. They're happy already when they get something working. They frequently don't even realize there are better ways to do it.

Under-engineering frequently results in buggy, crufty, overcomplicated or over-verbose code that's hard to read and hard to test. It works, but it makes developers miserable, including most likely the developer who originated it.

More experienced developers who are comfortable with their craft frequently end up over-engineering. This problem is compounded if the developer is fairly isolated from having actual users. These developers are comfortable with the details of programming, and they're aware of (at least dogmas of) right and wrong ways of doing things. They therefore have an interest in powerful, flexible code, and a tendency to anticipate future features.

Over-engineering frequently results in code that overanticipates tremendously. It's full of all kinds of interesting features and pluggability points that end up not being needed in the future at all. On the other hand, features that may be essential to the task at hand might not be very well done, as they got lost between all the other ones. Frequently even the (over)anticipated features end up not working properly should you ever need them in the future. This is because there was no real world application to actually exercise them yet when they were being written, causing them to be buggy, or slightly-off.

This stage is probably an essential phase that good developers have to go through. With even more experience these developers can hopefully reach a balance: right-engineering. This is the ability to know when to keep things simple, and to know when not to. These developers really understand the value of good practices like high-feedback iterations and test-driven development. As the original post recommends, typically the right approach is indeed to go for the solution that works first: a simple one. I must note here that 'simple' is a relative concept - a good developer can make things simple that are definitely complicated for others.

I'll also note that sometimes customers affect this balance. Customers might be of a "just make it work" mindset, leaving little time to developers to get things right. This is a well-known phenomenon, much complained about by developers. Less well-known is that in other cases, customers might also overanticipate future needs, complicating the design of a piece of software before they even run it and actually gain more experience with their real world needs. A good developer tries to steer both types of customers to the right place somewhere in the middle.

Right-engineering is difficult. "Make it work first" is indeed one of the most important ingredients to reach success. Right-engineering is more the application of good development strategies than it is pure coding skill. You can't "become" a right-engineer after some period of learning and growth, resulting in you always finding the right balance between over-engineering and under-engineering for the job. What can happen is that a developer will adopt strategies that increase the chances of finding that balance.

A developer should recognize that the right amount of engineering (and what to actually engineer) is highly dependent on how the software is actually going to be used. The best way to find out how software is going to be used is often to actually use it: use your api in automated (doc)tests, get the customer to use prototypes and development versions, and get the customer to put the software in production when it's good enough, instead of when it's optimal. Software development is frequently software evolution. Get feedback and go back and improve your code.

Happy birthday Grok!

Grok the codebase is 1 year old this week! One year ago, from the 12th of october until the 17th of october 2006, we started the Grok project in Halle, Germany. The design discussions for Grok are older - they go back to EuroPython. And the Zope 3 project, the base without which Grok would be nothing, started development in 2001.

But one has to start somewhere, and in that sprint Grok really came together. Four hackers sat together. A beamer projected our code on a wall. We started to hammer out the design and code of Grok. At the beginning of the sprint there was a rough sketch of a design and no code. At the end of that sprint, we had Grok. Grok's caveman theme was born at this sprint. In case you missed it, here is my report on that first sprint.

Over the last year, much has happened. We've had more sprints. We've presented Grok at conferences. We have given Grok tutorials. We built up a thriving Grok community centered around our grok-dev mailing list. We have had a successful summer of code with two projects to improve Grok. To our delight, many clever contributors joined us to help improve Grok, which has come a very long way. Two weeks ago, a group of sprinters even visited the "spiritual home" of Grok: Neanderthal in Germany.

Grok is far more than just a codebase - it's a community and a mission: we want to make the powerful Zope 3 technologies appealing to "cavemen" - by which I mean myself and other people who want to have power without having their heads explode.

Some features that Grok has today:

  • Lively, responsive development community. We're here, we have a mission and we're not going to stop.
  • Grok is documented: on http://grok.zope.org we have a tutorial and howtos, and a lot more coming.
  • Store your Python objects in a powerful, industry strength object database, the ZODB. (or a relational database, should you prefer)
  • Grok has a powerful UI that allows you to browse Grok-based applications and introspect their objects to see what their APIs are and where they are implemented.
  • Powerful schema-driven form generation system.
  • Index your Python objects for fast search, including full-text facilities.
  • Zope 3's powerful security infrastructure, optimized for usability.
  • Support for XMLRPC and JSON out of the box.
  • Upwards compatibility with Zope 3: the huge range of powerful Zope 3 components will work with Grok.
  • Easy to install using grokproject. This is not just an installation tool, but sets up a reproducible application buildout for you. This means no long INSTALL.txt files to install a Grok-based application (or Grok itself). It's just a few commands to get up and running, and the system downloads what's needed from PyPI or other Python package indexes.
  • Use the powerful Zope 3 component architecture to create flexible, pluggable code without writing a lot of separate ZCML XML code to hook things together.
  • Declarative Python code without a lot of meta-class magic. Just define a class and it's registered in the system.
  • Well-defined separate library for extending Grok and other Python applications: Martian.

Some features that are currently in the works:

  • A new website, and then a lot more documentation.
  • WSGI support.
  • REST support for Grok.
  • Pluggable template languages for Grok, starting with Genshi.
  • Downwards compatibility with Zope 3: use Grok-based code in other Zope 3 projects.
  • In general, we will continue on our mission to expose more powerful Zope 3 technology in a developer-friendly way. Zope 3's component software story works: There is a lot of Zope 3 technology available already, and more is written every day!

So, happy birthday Grok! It was a very good first year, and we'll have a great next year too, I'm sure of it. Come and join us if you'd like.

report from the Neanderthal Grok sprint (day 1-3)

Hello from the fourth Grok sprint! I wanted to write this report tuesday but didn't get around to it. The Neanderthal Grok sprint is now more than halfway over: monday and tuesday were full sprint days. Wednesday was our day off visiting Neanderthal itself, but we did do some more sprinting in the evening at the hotel.

So what have we worked on so far?

  • The grok.zope.org website. We've have a team working on reimplementing this website on top of Plone 3, so we lose the current maintenance bottlenecks in getting new Grok tutorials published and modified. We've put the contents in a Plone 3 website, using PloneHelpCenter, and we've also been working on integrating a layout for grok.zope.org into a Plone skin. Hopefully our experiences with grok.zope.org will also be helpful in the efforts to update the wider zope.org website, which we're keeping track of. Having a good web presence is really important for Grok so I'm thrilled with the progress we've made.
  • megrok.kss is KSS integration for Grok. KSS makes it really easy to make your web pages dynamic without writing javascript. megrok.kss makes it really easy to use KSS with Grok. There is more work to do (it needs to be released, for instance), but using KSS can now be accomplished with very little code.
  • We've made a lot of progress on the reference documentation for Grok. Both the content itself as well as the method to maintain and publish it were improved. JW, also here at the sprint, already discussed this extensively in his blog entry about the sprint.
  • JW also mentions a smaller task: Grok now doesn't grok modules that are called 'ftests' and 'tests' by default. Generally one doesn't want the test setup to be grokked when running a live application, after all.
  • Work is being done to extend and improve the Grok admin UI, in particular to show information about which views are available for a content object. Also one of our conclusions of these discussions is that "admin UI" is not a very good name. It's really a UI dedicated to serving developers while they develop a Grok-based application. We haven't figured out a good new name for this yet.
  • Genshi integration. Grok uses Zope Page Templates (ZPT) as its main template language. Genshi is a templating language quite a bit like ZPT, but with a lot of improvements and new ideas. We think it could be a good fit for Grok, so we're working on ways to it easy to use Genshi with Grok. This involves making Grok's templating system more pluggable and investigating how to make our templating use cases (include images, reuse templates, etc) work with Genshi.
  • Some work was done to investigate making grokkers produce Zope 3 configuration actions. This needs to be done to make it possible to use Grok-based code with pure Zope 3 projects (of course we can alreasy use all of Zope 3 with Grok, but we need to be a good Zope 3 citizen).
  • Finally, the most critically important task of the sprint: pinning down the versions of the packages that Grok depends on, so that when we release the next version of Grok, grokproject and buildouts will work and continue working no matter what new releases of our dependencies appear. We really need to make the process of installing Grok reliable.

As I mentioned, yesterday was our day off. We went to the nice Neanderthal museum in Neanderthal. We had a great time. Our patient tour guide probably got an unusual amount of observations and comments from us bunch of geeks (me...) on human evolution. :)

We have a day and a half of sprint left at this time, which is good, as there's lots to work on still!

the challenges of version management in an eggified world

Zope 3, and Grok in the last few months have been switching to a brave new eggified world of installation. The idea is that you compose your Zope application from a large amount of smaller packages, each providing their own components. I've sometimes described this Zope as an integrated megaframework. Zope is an integrated framework where packages follow common coding conventions, and the component architecture defines a way for packages to work with each other. Grok tries to step up by aiming for an integrated feel for developers. At the same time, Zope is a megaframework, allowing you to swap in best of breed components as they come available. Don't like zope.formlib? Swap in z3c.form for your form generation needs instead.

So how does eggified Zope work? You use zc.buildout to manage your development project. This buildout gathers eggs together in one place, looking at requirements in the setup.py of the various packages, and sets other programs like a start server script, a test runner, and so on. eggs that aren't installed locally yet will be downloaded from the Python cheeseshop and other locations. eggs aren't installed system-wide, keeping the system python nice and clean. What's more, different projects can easily use different versions of the same library. Since zc.buildout is easily extensible with new recipes, many complicated needs can be covered. To make initial installation of a buildout easier, Philipp von Weitershausen has developed zopeproject and grokproject to help you set up new Zope 3 or Grok (pick your flavor) projects.

Being at the forefront with eggs and buildouts means we also have some interesting challenges. I'll describe one that has been biting the Grok project more than once recently. This post describes the various concerns that we have with version management, and a proposed solution.

So, what is our problem? A while back, we made the 0.10 release of Grok. Grok is a framework and depends on many Zope 3 packages (among others). This is specified in the setup.py of Grok, like this:

install_requires=[
   ... long list ...
  'zope.proxy',
   ... more dependencies ...
   ]

Unfortunately, this approach has a problem. If someone releases a new version of, for instance, zope.proxy to the cheeseshop, newer installations of Grok will try to use this version instead of the version that we tested Grok with.

This is asking for trouble: we have made a release, but what people actually install keeps changing! No wonder that we've had several breakages of Grok 0.10 as people accidentally broke backwards compatibility, or mistakenly released broken eggs. Since these packages are also used by other Zope 3 applications besides Grok, we cannot ask these people to stop making such releases - this is a megaframework and individual packages should be allowed to evolve at a different pace.

How to go about fixing this problem? The simplest approach would be, whenever we release a new version of Grok, to hardcode a full list of the packages we depend on with exact version numbers in the install_requires section of Grok's setup.py, like this:

install_requires=[
   ... long list ...
  'zope.proxy == 3.4.0',
   ... more long list ...
   ]

Doing this would mean that anyone who installed Grok would get exactly those versions, nothing else. If someone tells us: I used Grok 0.10, we know exactly what that means.

Unfortunately it also locks in application developers into those versions exactly. If a bugfix release of zope.proxy comes out, the application developer that uses Grok cannot start using this new version, but instead will need to wait for a new release of Grok that depends on this newer version of zope.proxy. While that's often a good approach anyway, hardcoding the version dependencies does limit the developers that build their applications (or frameworks) on top of Grok.

There's another problem. Grok isn't the only Zope 3 package that uses these packages. zope.component for instance depends on zope.interface. If zope.component hardcodes a dependency on a particular version of zope.interface, the Grok developers would need to wait for a new release of zope.component in order to get a bugfix in zope.interface too. And remember where we came from: the whole idea of our megaframework approach was to have the flexibility to recombine components, and this would be blocking it.

Components, and frameworks, ideally should have weak dependency requirements to be maximally usable, allowing individual developers or framework developers to use the versions they want to. But on the other hand, if someone uses a framework, it should continue to work tomorrow. If someone releases a framework, it should remain installable tomorrow. If someone communicates to someone else about framework versions (important in open source software), they shouldn't have to give a list of 50 version numbers, but just one.

We therefore have two different requirements pulling in different directions. On the one hand you don't want to lose flexibility, on the other hand if you want to have a community working and reusing chunks of software, you want to be able to rely on stability, and frequently you even want to count on bug for bug compatibility.

To allow flexibility, instead of hardcoding version numbers in install_requires in setup.py, you only loosely specify them. You say, for instance, that zope.component requires zope.interface, but not which version. If you know that your version of zope.component needs a feature that's only in zope.interface 3.2 or later, you'd write zope.interface >= 3.2.

Now we're back at our original problem, however: we got flexibility, but damage stability. What if someone makes a new release of some dependency of Grok?

zc.buildout has a feature that can help us pin versions down for our particular application. We could ask all the people who use Grok for their applications to put the following section in their application's buildout.cfg:

[buildout]
...
versions = grok-0.10

[grok-0.10]
... long list ...
zope.proxy = 3.4.0
... more dependency specifications ...

This can be made work well if you don't use a framework like Grok but instead develop an application from scratch that uses a long list of components. But in the case of Grok we want the framework to specify these dependencies. We don't want to require all application developers to replicate a long list of dependencies in their buildout.cfg. It's easy to make mistakes, it's hard to communicate about such lists to everybody, and what do people do when they need to perform an upgrade to a newer version of Grok? They'd need to get a new long list and edit their buildout.cfg again. It'd be a lot nicer if they only had to deal with the change of a single version number instead.

Zope 3 has a culture of where making developers figure things out for themselves is okay if this leads to maximum flexibility for everybody. Grok has a different approach: it tries to make things easy, too. Telling developers to maintain long lists of version numbers is not good enough for Grok, and probably not good enough for other frameworks built on Zope 3 either.

So, we need a way to have the Grok framework developers maintain this list of versions in a central place, and allow all application developers that use Grok to reuse this list.

zc.buildout does have a feature to help us here too: it can include bits of external buildouts into your own, using a URL. You can use a pattern like this in your buildout.cfg:

extends = http://grok.zope.org/releases/grok-0.11.cfg

What this would mean is that developers that use Grok place this in their own buildout.cfg, and we maintain the list of versions under that URL. When a new release of Grok happens, we create a new URL and ask people who want to upgrade to update their buildout.cfg to reuse that:

extends = http://grok.zope.org/releases/grok-0.12.cfg

That should be the only modification they need to make if they didn't hardcode any dependencies in their setup.py. And if they want to override a version number they can still do so in their own versions section, so we retain flexibility.

This is likely the approach we are going to use in the near future. It's pretty good, but not ideal, so I'm going to go into some of the drawbacks of this next.

For one, this doesn't work in locations you don't have internet access, such as on a train. Now this problem exists for egg downloads as well, but in typical buildout setups, you'd have a lot of eggs available in your home directory available that you downloaded previously, so there's a good chance you can still create a new Grok project even while you're on a laptop in the train.

Another problem is that the release managers of Grok will have to deal with two release artifacts instead of one: besides the usual, easy, automatic package upload using python setup.py sdist upload whichg places the new version on the cheeseshop, we will now also need to maintain a list of dependencies somewhere and create a new URL whenever we release Grok. We also need to communicate this new URL to our userbase, and this is different from the usual Python dependency mechanism, which is defined in setup.py. This isn't a major problem, but it makes the release process more cumbersome nonetheless so it's less than ideal.

There is another potential drawback to this approach. Dependency relationships form a tree. Grok may depend on zope.component, which in turn depends on zope.interface. In order to pin down the version for zope.interface, we would need to do this inside Grok. This it not a big problem for such a small dependency, but when frameworks start to depend on frameworks it will start to be cumbersome to create a unified list of dependencies. This may sound theoretical, but in the Zope world it's common to have frameworks that depend on frameworks: Plone depends on CMF which depends on Zope, for instance. If someone were to write a CMS in Grok, they would need to maintain and publish their own list of version requirements for that CMS, which would include the entire list for Grok. It'd be nicer if they could just say: here's my list, and for the rest, please reuse Grok's list.

I think many of these problems could be resolved if we could specify this list in a package's setup.py instead of on a URL. A package would have an optional extra section in their setup.py besides install_requires, perhaps something like install_recommends. This section would contain version number recommendations that have been known to work with this package. Tools like buildout could then choose to make use of this information, but the developer is also free to ignore it. This would solve our "URLs cannot be accessed on a train" problem. It would allow us to do simple release management again with a single release artifact: all the information will be available in the egg instead of on a separate URL. It would also open the door for smart tools which can combine version recommendations from various packages into a longer list.

Jim Fulton, creator of zc.buildout, told me that I'd need to convince the distutils-sig for the need for a install_recommends section first, and wished me luck. So I hope I can get some of the distutils people to read this blog entry. :)

Update: I've just made an install_recommends proposal to the distutils-sig.

Well-kept secrets of Zope

Zope is a web framework that comes equipped with powerful, apparently secret, features. Some of the things Zope has had for literally years other web frameworks are only evolving today. And in other cases, Zope comes equipped with features that other web framework communities are currently only dreaming about.

If Zope has such powerful features, why are they a secret? I think it's a combination of an unfortunate (but partially well-deserved) reputation Zope has in large parts of the Python community, and Zope isolating itself in its own community (it managed to build a large community many years ago).

Before I start off yet another discussion with people burned by Zope 2 in the past: Zope 3 is not Zope 2. It's not crufty. It is hard to approach, but we're fixing this with Grok, which cuts down on the complexity hard.

Now let's go into some features that Zope has and that we seem to be keeping a secret very carefully from the rest of the world.

Zope 3 has unicode everywhere

In july 2007, the Django project started to support unicode. I'm happy to say that the Zope 3 project has supported unicode throughout since 2002. Consistently. Without backwards compatibility cruft. When the Django project gains this support, I, who doesn't really follow Django developments, learned about it from various blogs. We in the Zope community don't tell anyone as that'd spoil the surprise. Or something.

Zope 3 has a built-in form generation and validation system

Last year, I heard a lot about ToscaWidgets. This year I hear a lot about Django newforms, replacing Django's older form generation system.

Development of Zope 3's declarative schema system and form generation system started, guess when, in 2002, based on ideas we had been exploring for a few years by then in the Zope 2 context. We've had 5 years of experience with it since then. This led to the evolution of a new form generation system (on top of the existing, solid, declarative schema system) in 2004. This year we've seen a further evolution of this system with z3c.form (a fourth new forms iteration evolving the work that had gone before).

So, Zope has a headstart of years of experience. We keep this hidden within our own community, because otherwise it'd be like, telling!

An object database

CouchDb is gaining some attention recently. While not an object database, it promises to store documents, not relational database tables. Recently we've also witnessed some ORM Wars in various Python blogs.

Meanwhile, some people in the Rails world have been thinking... Wouldn't it be nice if instead of all this impedance mismatch between objects and relational databases we could use a true object database? Wouldn't that be cool?

It's time for me to yawn and say "been there, done that". People might somehow have missed it, but Zope 3 is equipped with an ACID-compliant, clusterable object database, the ZODB, that has been under development since 1997. The Zope people know the benefits of document-oriented, object databases for web applications. We've worked in this impedance-mismatch-free world for so long that we know the drawbacks too, and thus have built Zope 3 extensions to work with SQLAlchemy as well.

Buildout

We have been putting together complicated web applications from many different bits and pieces for a while now in the Zope world. Setting up a development environment or rolling out a deployment is quite an involved job often involving endless INSTALL.txt files. Setuptools and eggs offer a lot of help here, and the Zope project has been embracing them in a major way. They still leave a lot of stuff to do by hand however, especially if non-library components are involved such as web servers.

Enter zc.buildout. zc.buildout is an extensible Python system for assembling Python applications from multiple parts. It will work for any Python project: I've used it with TurboGears, and to set up a game development environment, among others. Zope itself is of course a major user of it, and it was forged in the fires of long experience and requirements of the Zope community. It's a lot to chew on to learn, but I believe any significant Python development project can benefit from using it. I believe buildout is another example of where the Zope project is tackling problems years ahead of many others within the Python community. And now we've told you about it.

To conclude

These features shouldn't be a secret. We should shout it off the rooftops: the Zope community in many ways is still pushing the frontiers of Python web framework development.

Zope 3 has powerful features, and now also has an easy entry point. If you want an easy way to learn about the future of web development in the Python world, please try it out in the form of Grok, which promises to make Zope 3 safe for the cave man or woman in all of us.

Communicating with core developers on the Python 3 transition

It has been made abundantly clear to me that some core developers did not appreciate my previous communications concerning my worries surrounding transition to Python 3.

I thought matters were resolved and wanted to let the issue rest for a while, but today I became aware that they are not. Therefore this attempt to address this as part of the public record.

Some basics concerning my intended message: doom is not at hand. The sky is not falling. Do not be alarmed. Instead, be prepared. Transitioning your code to Python 3 (through Python 2.6) will in many cases, especially for the larger, more widely used, or less well-tested codebases, not be a cakewalk. The community will be in transition for a period of years and we need to prepare ourselves so we can best deal with this. During this transition, there will be more difficulties in code reuse between projects on different sides of the 2 to 3 transition.

Some apologies are in order. I apologize if I implied that the core developers had not thought their transition plans through. I hope the core developers will believe (by careful reading of my previous communications in my own blog and others) that this was not my intent. My intent was:

  • express my own worries concerning the impact on the community.
  • trying to help prepare the community for the upcoming transition phase: it's not going to be a cakewalk.
  • express how I would like core developers to communicate the transition. Core developers will naturally want to attract developers to the Python 3 platform. My fear is that the core developers, by communicating their enthusiasm, will sometimes inadvertently cause Python users to underestimate the costs and length of time of the transition. This might lead to an insufficiently prepared community. (in my own experience with transitions one needs to be almost overabundantly clear on transition issues, or people may misunderstand - I understand this is not everybody's experience)

My last point was unfortunately easily misunderstood and in the heat of the discussion I used words that were clearly too strong. For instance, I should not have used the word "fork" when describing my worries - this is clearly a loaded word. I apologize for my use of this word.

I should also have made it far more clear that I was responding to core developers who addressed me, and not have overgeneralized to other core developers who did not participate in this conversation.

Let this be clear: I do believe that the core developers have thought through the transition plans and the implications thereof. I believe that the transition weighs heavily on their hearts. Too much I thought that it was a given in this debate that the core developers are not careless or stupid about the transition. I should have made it more explicit, as several people understood me to make this implication. I apologize for this inadvertent implication.

(Note that this doesn't mean I'm not worried anymore, or that I can't disagree with any core developer on this. It's just that I trust they're smart, experienced and responsible people. I also hope that this trust is reciprocal)

I now also see how discussing communication strategies with core developers can in itself be interpreted as being insulting to their communication skills. My apologies for this as well. I did not wish to make that implication. It was a combination of the heat of the debate, and my inexpertly trying to help with this, given some of my experiences and thoughts in this matter with Zope 2 to Zope 3 transition (which is different from the Python one in many, many ways). In the future I will try to avoid this problem by instead do my own communication surrounding this topic, and not address the core developers about their communication.

I should be wiser, but as someone stung heavily on his foray in Python 3 politics I cannot resist a sting in the tail. A note of warning to those who might wish to follow me in discussing in public their worries surrounding possible negative aspects concerning the Python 3 transition. Be very clear in your communication. If you are not clear enough, prepare to be flamed and perhaps even distrusted. It's not a comfortable position to be in.

the purpose to my "whinging" about the transition to Python 3

Collin Winter writes:

"I’m getting pretty sick of seeing blog posts and mailing lists threads endlessly bemoaning that, “the core developers…are causing a huge risk to the Python community by splitting it asunder for a period of years“. Gloom, doom, pox and peril, blah blah blah."

Collin's quoting me, so that means he's getting pretty sick of me, among others. I don't think I've been exactly endless about this though. I've been talking about this for a few days now, so Collin in getting sick already seems to be displaying his rather fragile constitution. :)

I also see the following in the comments to Collin's blog by a certain Matt:

"The complainant you reference obviously won’t be happy until Guido drops the Py3K idea altogether, or at least delivers a personal apology. The rest of us will move forward with our clear migration plan, and our optimism for the next version of Python."

Clearly he hasn't read what I said very well. I'll quote my previous blog entry:

"There is a significant cost in doing this. I think the Python community can bear this cost. I am not sure whether the cost is worth the gain, but Guido thinks it's a good idea, and he does have enough credit with me to trust his judgement. I understand the attraction of cleaning up the language in a backwards incompatible fashion."

Again, I understand that the core developers feel it necessary to break backwards compatibility. I can see where this is coming from. I sympathize with the goals. (besides that, radical cleanups are fun and exciting to language developers too. that's fine) I remember talking to Guido years ago about the kind of unicode reformation that Python 3 is going to introduce. I also know that Python 3 is not dropping out of the sky, and I have been fully aware that backwards compatibility is going to be broken for some time now.

My comments are a note of caution amongst enthusiasm. I'm not saying "stop in your tracks", I'm just trying to inform the community that Python 3 is not only a good thing, but also carries significant risks. I think it is good to have a bit wider awareness of that.

I also think that Collin better fortify himself with some vitamins, because he will likely remember this as the gentle beginnings. :) A bit of community push-back to the core language developers about this doesn't seem to be an entirely bad thing. After all, we're users of this great language and you're going to ask a lot from the community with this change.

Now Collin writes that he has worked on technology to mitigate the risk. I applaud his efforts. It also shows to me that Collin is taking the difficulty seriously. Unfortunately the tone of his blog entry is rather dismissive, which is less good, though I understand why. He's been working getting this done, and now he gets random people like me that complain before they've even tried things.

Still, I'd prefer a slightly different tone expressing what I trust is the true attitude of the language develpers, that is: "we hear you, we agree that this is a very serious issue, we realize what we're going to put the community through, and we're working on it."

Now to a few points where I think we could learn from each other's perspective.

Collin writes: "I have absolutely no pity for anyone trying to migrate to Python 3 without a test suite; you’re doing something fundamentally stupid and we will not bend over backwards to save your dumb ass."

I see a lot of Python code out there in the real world being used in production that doesn't have very good test coverage. I think a little bit of pity for the dumb asses who decided to commit to the Python language without writing proper unit test suites would do the core developers good.

Some notes on test driven development: widespread adoption of test-driven development in the Python world goes back to 2001 or thereabouts - the unittest module made it to the library about then, I believe. Python is about a decade older than that, and developing without a test suite was a normal enough way to write Python code throughout the 90s.

Lots of actual production code is not very well tested. You can say people who wrote this code are stupid, but I don't think that's very productive position to take. Sometimes people writing the code have reasonable excuses for less than 100% test coverage. For instance, it is difficult to properly test user interface code, which can take up a significant portion of a codebase. A good example of this is game code. There is quite a lively subcommunity of game developers in the Python world, and I don't think many games have good automated test coverage.

Sometimes people act out of expediency, or are in a hurry, or don't know any better even though they should, or simply disagree about testing altogether - do we want to say those people shouldn't be part of our community?

I must also note that Python has for a long time been actively promoted to beginners who are learning how to program. Beginners are unlikely to be very test-happy yet. They may also have produced code bases they consider to be valuable. To alienate this audience would be against the spirit of Python.

(By the way, I wonder what the plan is for Python C extensions. Are they going to break? Are people to rewrite their C code? Is SWIG going to work? I know that Pyrex will need a major update, for instance, so that would mean difficulty in porting lxml.)

Instead of calling lots of people dumb and having no pity for them, which sounds like the language developers would like to ignore their problems altogether, it might be better treat these people with a bit of understanding. Hopefully there are at least minor things that can be done to help these people migrate towards Python 3. I'd like to ask the language developers to keep their eyes open for these possibilities. Just showing some patience and understanding will get you a long way even if you can't do anything to help them concretely, by the way.

Now on to preparedness. The porting difficulties and risks lead me to believe that we are going to have to deal with a situation of two Pythons for quite a few years. Codebases written for one Python will be inaccessible to the other Python without additional effort to port things back and forth across the barrier. Let us, as a community, go into this situation with open eyes. Let us be aware of the risks and the costs. Let the language developers display patience and understanding, because they realize they are asking a lot from the communities that use this language. A community prepared will be able to handle the transition difficulties a lot better than a community going into it with blind optimism.

Colin writes: "If you think you can do better, show us the code. Talk is cheap."

I'm not saying I can do better. I'm trying to inform the community that we're in for a ride, that this ride is going to take a while, and the ride won't be all be fun, even if we believe the destination is worth it.

I can show Collin a lot of code, by the way, but I don't need to: just study the cheeseshop. Talk is cheap, indeed. All that code is not.

Python 3 worries: feedback

I've received a lot of feedback to my previous blog entry. I stated there that I'm worried about the costs of breaking backwards compatibility in Python 3, and its cost to the Python community. I'm glad I received this feedback, because the topic bears a bit of attention.

There was a whole series of comments saying the transition cost wouldn't be as high as I estimated. More in the range of switching from Python 2.4 to Python 2.5, perhaps, someone said. I'm not sure what to say to that, just that I am surprised by this statement. Since Python 2.5 aims to be backwards compatible, and indeed most Python software previously written runs fine on Python 2.5, and this is explicitly not so with Python 3, it's like comparing apples and oranges to me.

I've also seen people say that we have been there in the past, during the transition of Python 1.x to Python 2.x, and that this turned out fine. I am not sure whether the people saying that were there at the time, but I was. Python 2.x did not break backwards compatibility with Python 1.x, and the transition between 1.5 and 2.0 in was smaller than some we have seen afterwards (new style classes in, if I have it correctly, 2.2).

My reference to the Perl 6 transition drew some voices from the Perl community. I was assuming Perl 6 was going to break backwards compatibility with Perl 5 in a bigger way than Python 3 is going to break backwards compatibility with Python 2, since their language is changing so much. That's only partially true: while the language changes, the plan has apparently always been to produce a runtime that runs both Perl 5 and Perl 6 code and provide module-level interoperability. I do not know how far this plan is along. It does make the Perl 6 transition less risky in that respect compared to Python 3.

I still figure the Perl 6 transition is very risky simply because it's been taking so very long, which is likely driving people to look for other languages in the mean time. Python 3, being less ambitious, hopefully is finalized more quickly than that.

I wonder whether a dual-runtime model is something that was considered for Python 3. You could then upgrade one module at the time. The maintenance burden to the language developers would be increased for a longer period. Since the language developers are increasing the maintenance burden of all developers using Python with Python 3, perhaps I am not too concerned about making the maintenance burden of the core developers a bit higher. :)

Now to respond to some points made by Brett Cannon in response to my worries:

"In the end it all doesn't matter. Python 2.x is not going anywhere, so even if Py3K turns out to be a flop Python will live on. But if Py3K does do well (and I expect it will in the end), Python will be better for it."

I am not worried about Python 3 turning out to be a flop. I'm worried about the disruption and costs it will cause to the community, flop or not.

I disagree strongly with the statement that "in the end it all doesn't matter". It does matter that all the Python code in the world is going to be broken on Python 3. It does matter that all Python developers are going to have to invest time and effort in transition. The risk that we end up with two python communities for a significant period, with all the confusion surrounding it, does matter.

There is a significant cost in doing this. I think the Python community can bear this cost. I am not sure whether the cost is worth the gain, but Guido thinks it's a good idea, and he does have enough credit with me to trust his judgement. I understand the attraction of cleaning up the language in a backwards incompatible fashion.

You, the core developers, are causing a huge risk to the Python community by splitting it asunder for a period of years, and increase the code maintenance costs of all Python developers significantly due to this transition. What I don't want to hear is "in the end it all doesn't matter". I want to hear is that you are aware of the trouble you're putting us all through and that it does matter. I want to hear that getting the transition plan right weighs heavily on your hearts.

The core developers should be fully aware of the very heavy cost of their plans. I hope they're going to do their utmost best to reduce this cost to a minimum. An expression of understanding of the gravity of the situation will put me far more at ease than saying the transition plan is "pretty damn good" and that you can simply continue to use Python 2.x for as long as you like anyway.

Brief Python 3000 thoughts

Briefly, some of my Python 3000 thoughts. I see quite a bit of enthusiasm in some other blogs. I'm not very enthusiastic. While I understand the need to be able to break backwards compatibility, I am worried about Python forking into two parallel versions (2.x and 3.x) for many, many years. Such a fork can't be good for the Python community.

Why do I worry about Python forking? Because porting code to Python 3 will be hard. The porting instructions discuss the need for a good unit test suite before getting started. This is very wise. In practice of course, while I write unit tests for code all the time, lots of the code I work with doesn't have a great test coverage. There's lots and lots and lots of valuable Python code like that out there. What is going to happen to it? How is it going to be ported?

Even with test coverage, porting over code is going to be a massive amount of work for larger code bases. Considering that Zope 3, a clean, modern, extremely well-tested code base (one of the best test coverages in the Python world) is still on Python 2.4 as it takes time and effort to port to Python 2.5, what is going to happen with Python 3? What does this say about codebases with far less tests?

The result in many cases is that people will put off porting code, as it's too costly. It won't be easy to motivate a customer to pay for porting activity that will bring no new features to them whatsoever. People will therefore continue to run this code on Python 2.x. Since Python 2.x code doesn't work on Python 3.x, it won't be accessible to people who made the jump. Since Python 3.x code doesn't work on Python 2.x, it won't be accessible to those with existing code bases who can't make the jump any time soon. As a result, two Python communities for a period of what I expect to be 5 to 10 years.

I realize there are no answers to these worries beyond those that have already been given. Breaking backwards compatibility may be necessary. It just doesn't make me have a "birthday feeling", as I saw described in another blog. Python 3 is a serious risk to the Python community. Not by far as much as, say, Perl 6, is to the Perl community, but a significant risk nonetheless.

(I also am worried that Abstract Base Classes are throwing away some key benefits of zope.interface, but that is not really a blocker - at least zope.interface can be made to work on Python 3).

Debugging strategy: easy stuff first

I've been writing software for quite a while now, but the software I write still has bugs. One uses various strategies to avoid bugs, but bugs still creep in. Bugs happens all the time. I sometimes believe only programmers are truly aware how flawed human reasoning really is and how many mistakes a person can make without even noticing. Us prrogrammers are confronted with our own mistakes every day. The real world tends to be more forgiving of slight mistakes than the virtual world of software.

So, your software has a bug.

When you have a bug, you look for its cause. Often you have a pretty good idea about what the problem is, and you know you can fix it quickly. You go in and fix the code, and, perhaps, hopefully, are able to add an automatic test so the code can't break again in the same way in the future. Done.

Often though you have no clue what the problem is. This can happens after you've looked at a likely cause and found out it wasn't that after all. "That's bizarre! That shouldn't happen!" You don't know where to start debugging. You have some ideas about likely causes, perhaps, but you know you are going to have to sit down for this one. What to do then?

One thing that has helped me in this case is the following simple strategy: check the things that are easy to check first. Not because these things are likely to be the cause of the problem -- they may in fact be quite unlikely -- but simply because they're easy to check.

Check stupid things. Perhaps you didn't save the file. Perhaps you didn't restart your program before testing. Check whether you're really working on the right files. Perhaps some symlinks got crossed. Are the versions of the libraries you're using really correct? Check whether the web application that has the bug is really actually being run by the server you thought it was. Perhaps there's another server hanging around serving the buggy app and all your efforts have no effect. Check whether the API you know so very well really works in the way you thought it did. It might be just a one line program to check. Yeah, it's unlikely, but you won't waste a lot of time, so just do it.

Why should we be checking unlikely causes for our bugs? Even if the chance that something is the cause is very small, the chance still exists. And since it's quick and easy to check, just make sure. If it was the cause of the bug, you're lucky and might've saved yourself hours of head scratching only to slap your forehead in the end. If it wasn't the cause of the bug, at least you've not wasted a lot of time to exclude it, and you can move on.

Don't check unlikely causes that are hard to check first. You may end up checking those anyway, as it could be them after all. But not initially. First check the easy stuff.

Also don't check the likely causes first that are hard to verify. You think it's probably a threading issue? Oh no, that's difficult to debug! What if that wasn't the real cause for the bug after all? You've just spent hours testing for it. What if it turned out the bug was really caused by something you thought unlikely to be it, and you could've excluded it with just a minute of work? You would've wished you had spent that minute straight away. It wouldn't have been a big loss if not, and you might've hit the jackpot.

So check the things that are easy to check first.

Of course the best way to deal with bugs is not to create them in the first place. If you find yourself dealing with the same type of bug over and over again, consider whether you can change your way of working to avoid them altogether. But as we all know, there will always be bugs...