Secret Weblog Highlights

Martijn Faassen

2019-09-24 22:45

This is an old blog by now. I started it in 2005. But I'm not old! No way!

Over the years I wrote a lot of stuff. Sprinkled throughout are entries that I think are still relevant. So if you'd like, join me in my little journey through the history of my secret weblog. Warning: it's mostly about software development in one way or another.

In a few places I will brag about my uncanny ability to invent future web development trends just in time -- around the same time other people are more successfully inventing them.

What is Pythonic?

What is Pythonic? from 2005 is one of the earliest entries in my blog, and one of the most popular ones. It's because it answers a question many who learn Python will ask: what the heck does "Pythonic?" mean?

Programming

Under-engineering, over-engineering, right-engineering talks about the careful balancing act we have to do as developers: what's the Goldilocks zone of engineering complexity for a particular problem?

Debugging Strategy: easy stuff first gives the following advice: even though the bug can't possibly be caused by a thing, if that thing is quick and easy to check, check anyway.

The Story of None is a series of posts about how to deal with None/null/undefined in software. It touches on guard clauses, validation and normalization.

Life at the Boundaries: Conversion and Validation goes into application layering and what we do at the boundaries. A lot of creative software development is establishing and guarding boundaries in applications and frameworks.

Punctuated Equilibrium in Software is a case study of the conceptual changes that happened over the course of a few years in one of the software libraries I wrote.

And just recently I wrote Refactoring to Multiple Exit Points.

Python 3 transition

Last year (2018), I read an article on lwn that said the following:

The switch from Python 2 to 3 is a huge job; one might guess that it is orders of magnitude larger than anyone had anticipated back in the heady days of Python 3000 (around 2007, say).

Anyone, you say? In 2007, I wrote Brief Python 3000 thoughts. The Greek myth of Cassandra resonates with me now. I am glad the end of this transition now appears to be in sight at last.

I think the Gravity of Python 2 (from 2014) is the most insightful article of the bunch I wrote on this topic - it talks about the invisible hands that were holding back Python 3 adoption. I think these forces apply in any widely used foundational software that breaks compatibility.

Modern client-side times

In an otherwise not very relevant article in 2009, I wrote this:

If I can count techniques I've been trying to pioneer myself: Template-driven development where the web browser renders the templates. This along with the notion of client-side views can lead to surprisingly clean rich client-side apps.

Client-side templates and views are now a huge thing on the web. But when I wrote the article, Backbone and Angular were almost a year away still.

In 2011, I wrote about my experiment in 2003, when I first tried to build a client-side template language:

I told other developers about it, and they all asked "why?". My answer was something like "I don't know man, it's just cool!"

(my apologies for the gendered language)

In the article I discuss how client-side frameworks affect the architecture of web applications.

In Modern Client-Side Times I explore what the client-side revolution means for the backend web framework.

I like writing retrospectives. If you like reading them, you can read my Seven Years: A Very Personal History of the Web, from 2017.

Open Source

In The importance of communication I tell the tale of how making lots of noise helped me in open source. I do make a lot of noise. I admit it can be a bit much. But it has also benefited me and it may benefit you.

I discuss how to handle ideas when they are offered to an open source project.

In a massive multi-part retrospective on the epic story of the rise and fall of Zope, I describe the evolution of that ancient web framework and my involvement in it. It's a story that's Dan Abramov Approved (tm)!

Names are important. I tell you how not to name software.

I also explain why I think open source projects shouldn't have "contrib" directories.

In 2015 I write about the history of reselect, a popular JS library I accidentally helped to call into existence.

Miscellaneous topics

On occasion I've also written about topics on my blog that aren't about software development.

In 2014's They say something I don't like so they must be lying! I observe how human behavior can make communities fight and how to compensate. The article ends like this:

Not everyone is well-intentioned. There are real liars, trolls, manipulators and psychopaths out there. There are those among us who want to try to fan the flames for their own amusement. I think being generous to others in our interpretations can reduce their power to do so. Maybe I'll talk a bit more about this in the future.

I haven't, yet. But it is, unfortunately, relevant.

Finally, in The Incredible Drifting Cyber I go into the wild evolution of the prefix cyber, and how it became unfun.

Conclusion

And this concludes the tour. I hope you enjoy reading some of my articles!

Refactoring to Multiple Exit Points

Martijn Faassen

2019-08-21 13:12

Comments

Introduction

Functions should have only a single entry point. We all agree on that. But some people also argue that functions should have a single exit that returns the value. More people don't seem to care enough about how their functions are organized. I think that makes functions a lot more complicated than they have to be. So let's talk about function organization and how multiple exit points can help.

I'm going to use Python in the examples, but these examples apply to many other languages such as JavaScript and Ruby as well, so do keep reading.

Starting point

Let's consider the following function:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            if item.match == "A":
                result = item.payload
            elif item.match == "B":
                continue
            else:
                if item.other == "C":
                    result = item.override
                else:
                    result = bar
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

It's a silly function, it's a hypothetical function, but there are plenty of functions with this kind of structure. They might not be born this way, but they've certainly grown into it. I find them difficult to follow. You can recognize them by one symptom already: quite a bit of indentation. You can also recognize them by trying to trace what happens in them; notice how your working memory fills up quickly.

Extract function from loop body

How would we go about refactoring it? The first step I would take is to extract the loop body into a separate function. You may say, why do so? Objections could be:

The loop body isn't reused in multiple places, so why should it be a function?
You have to manage function parameters whereas before all was conveniently available in the body of foo.

That is all so, but let's do it anyway and see what happens, and then get back to this in the end:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

def process_item(item, bar):
    if item.match == "A":
        result = item.payload
    elif item.match == "B":
        result = None
    else:
        if item.other == "C":
            result = item.override
        else:
            result = bar
    return result

We've had to extract two parameters - item and bar. It turns out process_item doesn't care about default. We've had to convert the continue to a result = None to keep things working properly, as now we always run into the if result is not None check whereas before we did not.

Multiple exit points

We notice that result is only touched once in each code path in process_item. This means we can convert the function to use multiple exit points with the return statement, so let's do that:

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    elif item.match == "B":
        return None
    else:
        if item.other == "C":
            return item.override
        else:
            return bar

Convert to guard clauses

That's still more complicated than it should be. Since we have early exit points, we can get rid of the elif and else clauses:

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    if item.match == "B":
        return None
    if item.other == "C":
        return item.override
    else:
        return bar

Some indentation is gone, which is a good sign. And we see another else we can get rid of now:

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    if item.match == "B":
        return None
    if item.other == "C":
        return item.override
    return bar

Pay attention to None

I think the return None case is special, so let's move that up. That's safe as A and B for item.match are mutually exclusive and this function has no side effects:

def process_item(item, bar):
    if item.match == "B":
        return None
    if item.match == "A":
        return item.payload
    if item.other == "C":
        return item.override
    return bar

This function is now a lot more regular. If you read it past return None you can forget about the case where item.match == "B", and then forget about the case where item.match == "A", and then forget about the case where item.other == "C". In the original version that was a lot harder to see.

Why pay attention to None?

This last reorganization of the guard clauses may seem like a useless action. But I pay special attention to None (or null or undefined or however your language may name the absence of value). If you organize the guard clauses that deal with None to come earlier, it makes your functions more regular and thus more easy to read.

It also triggers you to consider whether perhaps item.match == "B" is something you can handle at the call site, which can lead to further refactorings. Later we'll consider that further in a bonus refactoring.

Languages that have an Option or Maybe type such as Haskell and Rust make this more obvious and have special ways to handle these cases -- the language forces you. TypeScript also tracks tracks null/undefined in its type system. But in many other languages, such as Python, we're on our own. But we certainly still have to pay attention to None.

Back to process_items

Now let's look at the process_items function again:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

Multiple exit points

Let's first transform this so we return early when we can:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    else:
        return "No bar"
    if result is None:
        return default
    return result

Flip condition to create a guard

We can see clearly that "No bar" is returned if bar is None, so let's flip that condition:

def process_items(items, bar, default):
    result = None
    if bar is None:
        return "No bar"
    else:
        for item in items:
            result = process_item(item, bar)
            if result is not None:
                break
    if result is None:
        return default
    return result

We can now see the else clause is not needed anymore, so let's unindent the for loop. We also move result = None below that guard clause for bar is None, as it's not needed until that point:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    result = None
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            break
    if result is None:
        return default
    return result

So it turns out in the rest of the function we can completely forget about bar being None. That's good. Maybe that guard can even be removed if we can somehow guarantee the non-None nature of bar at the call site. But we can't determine that in this limited example. Let's go on refactoring this function a bit more.

Turn loop break into early return

We take a look at the break. If result is not None, we break. Then after that we check if result is None. This can only happen if the loop never breaked. If the loop did break we end up returning result.

So we can just as well do the return result immediately in the loop:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    result = None
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    if result is None:
        return default
    return result

Let's look at the bit of code past the end of the loop again. We know that result has to be None if it reaches there. It's initialized to None and the loop returns early if it's ever not None. So why do we even check whether result is None anymore? We can simply always return default:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    result = None
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    return default

We have no more business setting result to None before the loop starts. It's a local variable within the loop body now:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    return default

In review

Let's look at where we started and ended.

We started with this:

def process_items(items, bar, default):
    result = None
    if bar is not None:
        for item in items:
            if item.match == "A":
                result = item.payload
            elif item.match == "B":
                continue
            else:
                if item.other == "C":
                    result = item.override
                else:
                    result = bar
            if result is not None:
                break
    else:
        result = "No bar"
    if result is None:
        result = default
    return result

And we ended with this:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    for item in items:
        result = process_item(item, bar)
        if result is not None:
            return result
    return default

def process_item(item, bar):
    if item.match == "B":
        return None
    if item.match == "A":
        return item.payload
    if item.other == "C":
        return item.override
    return bar

The second version is much easier to follow, I think. (it's also a few lines less code, but that's not that important.)

In defense of single-use functions

So we created a process_item function even though we only use it in one place. Earlier asked why you would do such a thing. What benefits does that have?

We could convert the function to use guard clauses, removing a level of nesting and letting us come up with followup refactoring steps that simplified our code.
It's clearer to see what actually really matters in the loop and what doesn't, as it's spelled out in the parameters of the function.
We gave what happens in the for loop a name. process_item doesn't say much in this case, but in a real-world code base your function name can help you read your code more easily.
Maybe we'll end up reusing it after all!

It also can lead to interesting future refactorings as it's easier to see patterns. If you do OOP for instance, you may end up with a group of functions that all share the same set of arguments and this would suggest creating a class with methods. But let's leave OOP be and consider None.

A possible followup refactoring

We know bar cannot be None when process_item is called -- see our guard clause. If we know (or find a way to guarantee) that item.payload and item.override can never be None either, we can do this:

def process_items(items, bar, default):
    if bar is None:
        return "No bar"
    for item in items:
        if item.match != "B":
            return process_item(item, bar)
    return default

def process_item(item, bar):
    if item.match == "A":
        return item.payload
    if item.other == "C":
        return item.override
    return bar

Which then leads to the question whether we should filter items with item.match != "B" before they even reach process_items in the first case -- another potential refactoring.

All of these refactorings require knowledge of what's impossible in the code and the data -- its invariants. We don't know this in this contrived example. But in a real code base, you can find out. A static type system can help make these invariants explicit, but that doesn't mean that in a dynamically typed language we should forget about them.

Yes, I'm saying the same as what I said about None before -- whether something is nullable is an important example of an invariant.

Conclusion

It's sometimes claimed that not only should a function have a single entry point, but that it should also have a single exit. One could argue such from sense of mathematical purity. But unless you work in a programming language that combines mathematical purity with convenience (compile-time checked match expressions help), that point seems moot to me. Many of us do not. (and no, we can't easily switch either.)

Another argument for single exit points comes from languages like C, where you have to free memory you allocated in the end before you exit a function, and you want to have a single place where you do the cleanup. But again that's irrelevant to many of us that use languages with automated garbage collection.

I've hope to have shown to you that for many of us, in many languages, multiple exit points can make code a lot more clear. It helps to expose invariants and potential invariants, which can then lead to followup refactorings.

P.S. If you like this content, consider following @faassen on Twitter. That's me! Besides many other things, I sometimes talk about code there too.