Benji York is doing good work

Benji York is doing good work

Benji York, a relatively recent Zope Corporation employee, first appeared on my "does interesting stuff" radar due to his work with Zope 3 testing: zope.testbrowser, which promises to bring to Zope 3 testing something I played with before in the context of Silva but never really made very usable: functional testing of the web frontend of code on the server side (utilizing mechanize). Benji's innovation to make this really easy is by integrating it with doctests.

Zope 3 is an extremely well tested framework already and through the efforts of Jim Fulton, Tim "doctest" Peters, Benji York and others is continuing to push the boundaries in this area. At the same time, by using doctest in ever expanding ways, they also pushing forward software documentation at the same time. I haven't seen any other Python project with a stronger emphasis on testing, except possibly PyPy. Zope 3's rock solid, and we know it, as we keep banging on it with more tests.

This is not the only thing Benji has been up to though; today I discovered his new weblog, where he says good stuff on a topic close to my heart: Zope 3 marketing.

Moreover he is actually doing something about it. He's written the beginnings of a Zope 3 quickstart guide that should help people on their way when learning Zope 3. Perhaps it can eventually lead to a "20 minute screencast" on Zope 3. Such screencasts seem to be an essential framework marketing tool in this day and age...

Credit where credit's due

Credit where credit's due

Thank you Phillip Eby for giving credit where's credit's due in your comment on this article.

Zope's ancestor Bobo could do something almost a decade ago that's getting popular now: object publishing. Jim Fulton deserves a lot of credit for that. This also makes me wonder whether there are things in Zope 3 that will become popular about a decade from now... Perhaps some people would like to join us to help find out?

I also think Jim and Paul deserve credit for helping to turn Phillip Eby into a python programmer.

hurry library in the Zope 3 base

hurry library in the Zope 3 base

Since various people were curious to see especially the little query language we wrote on top of the Zope 3 catalog, I've just put up the generic libraries we developed for documentlibrary project online, at least in svn in the Zope 3 base at codespeak, here:

http://codespeak.net/svn/z3/hurry/trunk

I've done this in the "tradition" of Zope corporation's placing of zope.formlib and zc.catalog in the Zope svn repository. That is, we at Infrae feel the libraries are useful and want to share them to obtain feedback, but more communication and work is necessary before these could be accepted into common codebases like the Zope 3 core or the Z3ECM project. And if pieces don't end up there that's fine too; it works just fine as an independent library right now.

The hurry library ("written because we were in a hurry to use these features") contains:

hurry.query:

higher level query system built on top of the Zope 3 catalog. Some inspiration came from Dieter Maurer's AdvancedQuery for Zope 2. See src/hurry/query/query.txt for documentation. Is undoubtedly still much less powerful than Dieter's system for Zope 2, and isn't very optimized, but does make using the catalog quite a bit more pleasant.

hurry.workflow:

a simple but quite nifty workflow system. See src/hurry/workflow/workflow.txt for documentation.

hurry.file

advanced file widget which tries its best to behave like other widgets. See the doctest in src/hurry/file/browser/file.txt for some documentation.

On the workflow system I need to make an extra comment. We are aware various powerful workflow systems are under development for Zope 3. At the time we started the documentlibrary project however none of them seemed fully baked. After some communication with some of the people working on the workflow systems and realizing none appeared ready, we decided to quickly roll our own.

I like its simplicity and it supports quite powerful features, such as multi-version workflow. It's entirely geared towards the featureset of the documentlibrary application we're working on but is generic enough to be also applicable in other contexts. It's however no match for the systems that are currently being brewed up for Zope 3.

Have fun with hurry! Let's try to channel discussion on this through the zope3-dev mailing list for the time being.

Practical experience with Zope 3

Practical experience with Zope 3

Practical experience with Zope 3

Jeff Shell has posted a very interesting blog entry on his experiences with Zope 3. Here at Infrae we've also been working with Zope 3 for a few months now and I thought this would be a good opportunity to share some of our experiences.

Maturity

I can affirm now from practical experience that Zope 3(.1) is mature enough to develop real applications, and a very nice environment to work with. It's a Pythonic system that doesn't get in the way of the Python programmer.

That said, some pieces weren't as fully baked as we would like. Some of these holes have been plugged by zope.formlib and zc.catalog, Zope 3 extensions both created by Zope corporation. (more particular s later). We've been also trying to plug some other holes (query engine, smart file upload widget) in some work we hope to release at some stage the following months. Hopefully Zope 3.2, due for the end of this year, will make the out of the box experience more complete.

Must-use extensions

zope.formlib: I would very, very strongly recommend anyone developing Zope 3 to use zope.formlib instead of the built-in form system as this vastly improves the form experience. Ignore browser:editform and browser:addform in zope.app.form. I recommend zope.formlib being put in the Zope 3 core as soon as possible, though that does require Zope 3 to run on python 2.4, which zope.formlib requires.

zc.catalog: I would also recommend looking at zc.catalog as it has nice functionality equivalent to Zope 2's keyword index (called SetIndex). More goodies are also in there, but I've yet to explore them. This needs Python 2.4 as well.

Python 2.4

Zope corporation turns out to be using Python 2.4 with Zope 3.1 for their own developments. This works fine and you can do the same. I hope we can do an 'official' shift to Python 2.4 with Zope 3.2, though that will require a security review of the to-be-released Zope 2.9, which will include 3.2... I think it would be odd that the main developers of Zope 3 are all using Python 2.4 and we'd have to support Python 2.3 only for Zope 2.9 compatibility.

Security system

The Zope 3 security system is the one place where I keep tripping up. This is a shame as the rest of the system is very nice and Pythonic.

On the one hand, I can't really blame Zope 3 too much for this, as whenever Zope 3 security complains there indeed is something wrong with my security declarations... So, part of this this reflects more the fundamental difficult nature of security than something wrong with Zope 3. Zope 3 just forces you to have to think about it.

That said, I still think we have a problem here. I'm very afraid the current system turns off beginners too much, including Python programmers coming to Zope 3 for the first time. As a small example, right now, a beginner would have no idea how to start to debug these -- I had to ask on the mailing lists myself before I understood that I needed to turn off suppression of unauthorized errors in the error logging for instance. Sometimes too the errors are rather obscure; I've had a case where an annotation returned None but only because an underlying attribute wasn't accessible, not because it wasn't there. This might be a bug in the annotation system -- I want an unauthorized error.

It's also too easy to forget declarating security when you add a new field or method, which makes for a frustrating "why do I need to be in 3 places to change this" programming experience. Since the rest of the Zope 3 experience is pretty nice and feels Pythonic, this is a shame.

The good news is that the Zope 3 security system is replacable with something else. I don't think "just write your own security policy" is the answer to my complaints, as writing a security policy is hard and most people will rely on the default, but it does make experimenting a lot easier. I hope people will help me think about this issue.

Pluggability

Zope 3 excels at pluggability. I could plug without having to modify code whenever I wanted to. As I said above, even the security policy is pluggable.

While already great, it's not perfect -- implementing your own source of groups for instance requires you to fake implement some container API you don't actually use. That said, a fairly easy workaround was possible. The email message delivery system in the core also needs some work to be as flexible as I'd like. But again, a doable workaround was possible.

Reusable code extraction

Jeff Shell already mentioned that Zope 3 makes it easier to build an extensible framework while actually building something useful for a customer; Zope 3 gives a lot of flexibility and extensibility right out of the box without much effort for the application developer. This I think is great news for the long term maintainability and extensibility of Zope 3 applications.

In addition, I can say that extraction of reusable code from Zope 3 projects into reusable libraries is much, much nicer than doing it in Zope 2. That doesn't mean it's actually easy; writing reusable code is always hard, but it's now much more doable. This is one of the coolest things about Zope 3.

As I already mentioned above, we now got a little library with some improved file widgets, and a query language on top of the catalog. We also have a little workflow engine; we know various others are being worked on, but I was too much in a hurry to try adapting any of the 'big' ones.

Framework extraction from practical applications is often the best way to build truly useful reusable components, so Zope 3's vastly improved extractability of reusable components is great news. I hope that this improved extractability of reusable components will eventually result in a "cloud" of components that the many applications to be built on top of Zope 3 can start sharing. I envision a process where the truly popular and useful packages can, after a period of revision, become accepted into various larger "framework of frameworks" or "distributions", such as the Zope 3 core and Z3ECM.

XML, context and nuance

XML, context and nuance

Originally this was buried in a comment to an article on Uche's weblog, but he suggested I post it on my own weblog, here goes. The history of this long running discussion is here.

I think for PJE to speak about people detecting (or not) 'context and nuance' in what amounts to a rant is a bit confusing. I appreciated the rant quite a bit, but the exact thing that was missing from it was context and nuance; it wouldn't be a good rant otherwise. As he says in a comment to my article, his position is more nuanced than his tone was.

The one bit of context I got before he actually announced it was that he was probably looking at the Chandler source code...

I'd seen the XML bit of his "Python is not Java" post quoted by a few other other Python programmers, and I was responding to people who picked up on that bit. The problem I have is more with the tone than the content. XML is certainly far from a panacea, but it cannot be ignored either. My problem with the rant, and the way the rant was being picked up, is that his tone is giving an excuse to other Python programmers to ignore XML with disdain. I don't believe that's an attitude that is useful.

Finally, to Uche: I hope the backlash to XML can be kept from being too viscious. Viscious backlash to me has a connotation with uninformed lashing out, while what is needed is constructive, though strong, criticism.

What is Pythonic?

What is Pythonic?

What the heck does "pythonic" mean?

This was a question asked a few months ago, on, of all places, the EuroPython mailing list, which is mainly used to plan the EuroPython conference. It was an interesting question though. I realized I've seen the word been used a lot, but that I've hardly seen any attempts to explain what it means. In the thread that ensued, different people, including myself, gave their own answers. I rewrote my answer for my weblog, as it may benefit others.

"Pythonic" is a vague concept, but not necessarily that much more vague than concepts like "intelligence" or "life", which, when you try to actually define them, tend to be slippery. That they're hard to define doesn't mean that they're useless though; humans work well with messy definitions. "Pythonic" means something like "idiomatic Python", but now we'll need to describe what that actually means.

Over time, as the Python language evolved and the community grew, a lot of ideas arose about how to use Python the right way. The Python language actively encourages a large number of idioms to accomplish a number of tasks ("the one way to do it"). In turn, new idioms that evolved in the Python community has have in turn influenced the evolution of the language to support them better. The introduction of the dictionary .get() method, which combines in one operation what would be done with a combination of has_key() and item access before, can be considered an example of such an evolution.

Idioms are frequently not straightforwardly portable from another programming language. For example, the idiomatic way to perform an operation on all items in a list in C looks like this:

for (i=0; i < mylist_length; i++) {
   do_something(mylist[i]);
}

The direct equivalent in Python would be this:

i = 0
while i < mylist_length:
   do_something(mylist[i])
   i += 1

That, however, while it works, is not considered Pythonic. It's not an idiom the Python language encourages. We could improve it. A typical idiom in Python to generate all numbers in a list would be to use something like the built-in range() function:

for i in range(mylist_length):
   do_something(mylist[i])

This is however not Pythonic either. Here is the Pythonic way, encouraged by the language itself:

for element in mylist:
   do_something(element)

A frequent question on comp.lang.python involves how to pass or modify references directly, something that is not possible in Python; there's just assignment (and its close relatives the import, class and def statements). This is undoubtedly sometimes driven by the desire to write code that returns multiple values from a function. The idiomatic way to do this in C and a number of other languages is to pass to this function pointers or references:

void foo(int* a, float* b) {
    *a = 3;
    *b = 5.5;
}

...
int alpha;
int beta;
foo(&alpha, &beta);

It's possible in Python to hack up strategies to pass function results through arguments, such as like this:

def foo(a, b):
    a[0] = 3
    b[0] = 5.5

alpha = [0]
beta = [0]
foo(alpha, beta)
alpha = alpha[0]
beta = beta[0]

This is however considered to be screamingly unpythonic, as the idiomatic way to return multiple values from a function is quite different and looks much nicer. It exploits tuples and tuple unpacking:

def foo():
    return 3, 5.5

alpha, beta = foo()

Code that is not Pythonic tends to look odd or cumbersome to an experienced Python programmer. It may also be overly verbose and harder to understand, as instead of using a common, recognizable, brief idiom, another, longer, sequence of code is used to accomplish the desired effect. Since the language tends to support the right idioms, non-idiomatic code frequently also executes more slowly.

To be Pythonic is to use the Python constructs and datastructures with clean, readable idioms. It is Pythonic is to exploit dynamic typing for instance, and it's definitely not Pythonic to introduce static-type style verbosity into the picture where not needed. To be Pythonic is to avoid surprising experienced Python programmers with unfamiliar ways to accomplish a task.

The word "Pythonic" can also be applied beyond low-level idioms. For a library or framework to be Pythonic is to make it as easy and natural as possible for a Python programmer to pick up how to perform a task. A library or framework, although written in Python, could be considered unpythonic if it necessitated programmers using it to write cumbersome or non-idiomatic Python code. Perhaps it's not using constructs Python offers, such as classes, even though they would make the library more convenient or easier to understand. Possibilities like being allowed to pass functions and methods as arguments to functions might be overlooked where they could be handy. A class defined in a library might be trying to do its best to enforce information hiding like you have in a language like Java, while Python more operates under the looser strategy of 'advisory locking', where attributes are typically available but the programmer is hinted about their privacy by a leading underscore.

Of course, when you get to such a larger scale, to libraries and frameworks, it gets more contentious whether something is Pythonic or not. There are still some guidelines though. One is that of lesser verbosity: APIs of Python libraries tend to be smaller and more lightweight than those of Java libraries doing the same thing. Python library which have a heavy-weight, overelaborate API are not considered to be very "Pythonic". The W3C XML DOM API, for instance, which has been implemented in Python quite a few times, is not considered to be Pythonic. Some people think it's "Java-esque", though from what I heard it's in fact not considered very Java-like either by many Java programmers...

A Python-based framework can be considered Pythonic if it doesn't try to reinvent the wheel too much where there already language idioms to accomplish the same thing. It should also follow common Python conventions concerning idioms.

Of course the problem is that frameworks, being frameworks, almost inevitably try to introduce patterns and ways of doing things that may not be familiar if you're used to smaller applications. That's how you exploit the power of a framework. Zope 2, a framework I'm intimately familiar with, is an example of a framework that definitely introduces a lot of particular ways of doing things that you don't run into so often elsewhere. Acquisition is an example. As a result, it's not considered very Pythonic by many experienced Python programmers.

It's difficult to create a Pythonic framework. It isn't helped by that the fact that the notion of what is cool, idiomatic, good Python code has evolved quite significantly over the years. Features like generators, sets, unicode strings and datetimes are now considered Pythonic. Zope 2 is an example of a framework that definitely shows its age there, and in part it cannot be blamed for it, as it was first developed in 1997 or so. Considering that, it's holding up very well indeed, thank you.

An example of a new trend in Pythonicness that I witnessed myself in recent years is the movement towards standardizing idioms of package and module structure in Python. Newer codebases like Twisted, Zope 3, and PyPy all more or less follow this pattern:

  • package and modules names are brief, lowercase, and singular

  • packages are frequently namespace packages only, i.e. have empty __init__.py files.

I've also tried to follow this convention in libraries I wrote, such as lxml.

Sometimes I think the condemnation of software as 'unpythonic' may be somewhat unfair and may obscure other positive aspects of the software. A less powerful framework that is easy to pick up for a Python programmer may be considered more Pythonic than a far more powerful system that takes more of a time investment to learn.

Finally, for another, complementary perspective on what is Pythonic design, try the following in a python interpreter:

import this

the why of lxml

the why of lxml

Today I read an article about libxslt on O'Reilly's xml.com. It demonstrates the power of libxslt; it's a cool library. It also demonstrates why I wrote lxml: writing Python code that correctly uses libxml2/libxslt's bindings directly is difficult.

The example in the article goes like this:

# xsltprocs.py: send an XML source document through a
# pipeline of multiple XSLT stylesheets.

import sys
import libxml2
import libxslt

args = len(sys.argv)

if args <  3:
    print "Pipeline an XML document through a series "
    print "of XSLT stylesheets. Usage:\n"
    print "  xsltprocs.py infile.xml stylesheet1.xsl   [stylesheet2.xsl...]"
    sys.exit(0)

sourceXMLFile = sys.argv[1]
sourceDoc = libxml2.parseFile(sourceXMLFile)

for xsl in range (2,args):
    # Read in stylesheet.
    styleDoc = libxml2.parseFile(sys.argv[xsl])
    style = libxslt.parseStylesheetDoc(styleDoc)
    # Apply stylesheet to sourceDoc, save in result.
    result = style.applyStylesheet(sourceDoc, None)
    # Result becomes new sourceDoc in case we send it
    sourceDoc = result   # through another stylesheet.

print result

style.freeStylesheet()
sourceDoc.freeDoc()

What it does is pipe a single XML document through multiple phases of XSLT transformation. It works, though with my version of libxml2 think the last line should say:

print result.serialize()

as otherwise you don't get the proper XML output as expected. Better yet, it should be serialized through the last XSLT sheet's serialization functionality as it may have things to say about the serialization process.

It however has a memory bug. It doesn't matter in this context, as it's just a script, but it might start to matter quickly in a long-running process. What happens is that at the end of the script, the document and the XSLT sheet are cleaned up manually, but the intermediate results or stylesheets never are.

It's an easy mistake to make. Python programmers aren't supposed to have to worry about manual memory management. I rewrote the script to use lxml:

# xsltprocs.py: send an XML source document through a
# pipeline of multiple XSLT stylesheets.

import sys
from lxml import etree

args = len(sys.argv)

if args <  3:
    print "Pipeline an XML document through a series "
    print "of XSLT stylesheets. Usage:\n"
    print "  xsltprocs.py infile.xml stylesheet1.xsl [stylesheet2.xsl...]"
    sys.exit(0)

sourceXMLFile = sys.argv[1]
sourceDoc = etree.parse(sourceXMLFile)

for xsl in range (2,args):
    # Read in stylesheet.
    styleDoc = etree.parse(sys.argv[xsl])
    style = etree.XSLT(styleDoc)
    # Apply stylesheet to sourceDoc, save in result.
    result = style.apply(sourceDoc)
    # Result becomes new sourceDoc in case we send it
    sourceDoc = result   # through another stylesheet.

print style.tostring(result)

This doesn't look much simpler than the pure libxml2/libxslt example (more involved examples would), but as you see the memory management logic is gone, as lxml takes care of this automatically. Moreover, the memory management logic is correct, or that's a bug in lxml.

Changing the Python default encoding considered harmful

Changing the Python default encoding considered harmful

Ian Bicking complains about unicode in Python and wants to change the default encoding in his Python application, and wonders why Python makes it so hard to change it.

It's very tempting to change the default encoding, and I messed around with it too when I first explored Python unicode issues a few years ago. However, I now think that changing the default encoding in a Python application is not the right way to go. If you do so, you run the risk of writing applications or libraries that aren't going to work correctly on any other system.

For a slightly involved example take the case of Silva and PlacelessTranslationService.

Silva (a Zope-based CMS) a few years ago went through a painful transition to use unicode inside throughout. The ZPublisher can be configured to encode any unicode response to UTF-8. For input, we make sure everything is decoded into unicode.

This all worked pretty well, though of course we did find 'leaks' once every while due to oversights in not doing the right encoding bit. The leaks are aggrevated by the fact that Zope 2 isn't very unicode pure as a framework.

Then we installed PlacelessTranslationService. This had been developed for Plone, which does not use unicode the Python way. Instead, as I understand it, it stores its content as UTF-8, and then the codebase has a numer of hacks to make it deal with unicode strings too. Not by changing the default encoding, but by overriding an important StringIO that gets used by the Zope Page Template engine to do something very similar -- encode to UTF-8 any unicode that gets passed to the page template engine.

Suddenly we were again in a mire of unicode-related bugs. Our assumption that the output of a page template was a unicode string was broken by PlacelessTranslationService, and this caused things to break in subtle ways. Desperate hacking ensued... (Five 1.1's i18n support should eventually fix this)

Changing the default encoding is tempting, but you're really going to be in trouble if you're going to give code that does string concatenation to anyone else. Imagine you've written an XML processing library and you happily concatenate UTF-8 strings with unicode strings in its internals -- it'd almost certainly not work correctly as soon as I use it in my application, unless I change the default encoding as well.

The best way to deal with unicode is to make sure that everything that enters your application (from the filesystem, from the web, or a database) is decoded into unicode, and everything that leaves your application is encoded (preferably to UTF-8).

Thinking it was easier before unicode came along is probably slightly deceptive -- you would've run into worse problems as soon as your system had to deal with more encodings than latin-1. String encoding issues just are hard.

That's not to say the situation with Python's unicode support isn't frustrating. I've thought long and hard about this when I suffered through this, but I couldn't really think of a better solution than the route Python took. If Python didn't have to worry about backwards compatibility I'd suggest making all strings unicode such as Java did, and introducing a separate for storing bytes, but that wasn't possible.

I do agree that life might've been easier if the default encoding of Python had been set to UTF-8 instead of to ascii. On the one hand this is catching more errors. If you're willing to break the ease of backporting code to older Python versions, I believe if, say, Python 2.5, shipped with a default encoding of UTF-8, it wouldn't actually break anything. But if I did it for my Python, I'd have problems soon as I gave my code to someone else.

Zope community evaluating JSR-170

Zope community evaluating JSR-170

JSR-170 or the Java Content Repository API, appeared on my radar somewhere early in 2004. JSR-170 is interesting to creators of content management systems because it promises a common API for accessing and manipulating CMS content. Implementing this API in a CMS could bring a number of potential benefits to it, such as interoperability, learnability, not reinventing the wheel, and so on. I go deeper into possible reasons for adopting any specification in my previous article Criteria for evaluating specifications.

JSR-170: http://jcp.org/aboutJava/communityprocess/final/jsr170/index.html

Criteria for evaluating specifications: http://faassen.n--tree.net/blog/view/weblog/2005/03/01/0

JSR-170 has some counts against it. For starters, it's huge. When I first looked at it, its design was also somewhat shaky (and it was even huger), but in the recent versions I did notice large improvements. Finally, from the perspective of the Python world, not the Java world, a Java-based API has some counts against it. Implementing a Java API on top of a Python-based system is not easy if you want the interoperability benefits.

That said, I still think there are possible benefits, so I mentioned it a few times in the Zope world for evaluation. This has been happening recently:

  • Stefane Fermigier has been posting about it on his blog.

  • Tres Seaver has been playing with an implementation on top of Zope 3, and also posted about it. He also mentions the impendence mismatch between Python and Java in the design of APIs.

I think it is an excellent development that members of the community looking outwards like this -- the Zope community has had a tendency to look inward in the past. Even if we choose not to adopt JCR or anything else we look at, at least our choice will have been considered and we will have learned something. This openness is also not unattractive to outsiders looking in at the Zope community.

extended catalog queries in Zope 3

extended catalog queries in Zope 3

Yesterday I managed to build something in just a few hours in Zope 3 that I wouldn't have been able to build so easily in Zope 2. What I've built is an extended query system for the Zope 3 catalogs.

A sample query looks like this:

q = zapi.getUtility(interfaces.IExtendedQuery)

t1 = ('catalog', 'fulltext')
a1 = ('catalog', 'a1')

r = q.searchResults(And(Text(t1, 'foo'), InSet(a1, [1, 2]))

which, providing the given catalog indexes exist, returns all objects that have attribute a1 set to either 1 or 2, and have 'foo' in their fulltext. All kinds of nice operators exist, such as Equals, NotEquals, InRange and InSet, and you can combine them nicely using And and Or. It also ought to work over multiple catalogs.

It's not the most efficient implementation, but performance should be decent enough, and all this just took a few hours to put together and is a few pages of code. In Zope 2 this would've caused me intense headscratching and I'd certainly not be done yet.

Why was this so easy in Zope 3? I think there are two reasons:

  • clean implementation of the catalog makes its code fairly easy to read and thus write similar code. The catalog implementation in Zope 2 is a horror to try understanding, in Zope 3 it really is surprisingly easy. This pattern holds for much of the Zope 3 code base.

  • Zope 3 gives every object an integer id using the IntIds utility, which helps making the catalog so clean. IntIds rock!

The code is hidden in a customer project right now. It's easy to extract and I'll look into extracting this so other people can take a look at this in the coming weeks.