Introduction
Functions should have only a single entry point. We all agree on that. But some
people also argue that functions should have a single exit that returns the
value. More people don't seem to care enough about how their functions are
organized. I think that makes functions a lot more complicated than they have
to be. So let's talk about function organization and how multiple exit points
can help.
I'm going to use Python in the examples, but these examples apply to many other
languages such as JavaScript and Ruby as well, so do keep reading.
Starting point
Let's consider the following function:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
if item.match == "A":
result = item.payload
elif item.match == "B":
continue
else:
if item.other == "C":
result = item.override
else:
result = bar
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
It's a silly function, it's a hypothetical function, but there are plenty of
functions with this kind of structure. They might not be born this way, but
they've certainly grown into it. I find them difficult to follow. You can
recognize them by one symptom already: quite a bit of indentation. You can also
recognize them by trying to trace what happens in them; notice how your working
memory fills up quickly.
Extract function from loop body
How would we go about refactoring it? The first step I would take is to extract
the loop body into a separate function. You may say, why do so? Objections
could be:
The loop body isn't reused in multiple places, so why should it be a
function?
You have to manage function parameters whereas before all was conveniently
available in the body of foo
.
That is all so, but let's do it anyway and see what happens, and then
get back to this in the end:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
result = process_item(item, bar)
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
def process_item(item, bar):
if item.match == "A":
result = item.payload
elif item.match == "B":
result = None
else:
if item.other == "C":
result = item.override
else:
result = bar
return result
We've had to extract two parameters - item
and bar
. It turns out
process_item
doesn't care about default. We've had to convert the
continue
to a result = None
to keep things working properly, as now we
always run into the if result is not None check whereas before we did not.
Multiple exit points
We notice that result is only touched once in each code path in
process_item
. This means we can convert the function to use multiple exit
points with the return
statement, so let's do that:
def process_item(item, bar):
if item.match == "A":
return item.payload
elif item.match == "B":
return None
else:
if item.other == "C":
return item.override
else:
return bar
Convert to guard clauses
That's still more complicated than it should be. Since we have early
exit points, we can get rid of the elif
and else
clauses:
def process_item(item, bar):
if item.match == "A":
return item.payload
if item.match == "B":
return None
if item.other == "C":
return item.override
else:
return bar
Some indentation is gone, which is a good sign. And we see another else
we
can get rid of now:
def process_item(item, bar):
if item.match == "A":
return item.payload
if item.match == "B":
return None
if item.other == "C":
return item.override
return bar
Pay attention to None
I think the return None
case is special, so let's move that up. That's safe
as A
and B
for item.match
are mutually exclusive and this function
has no side effects:
def process_item(item, bar):
if item.match == "B":
return None
if item.match == "A":
return item.payload
if item.other == "C":
return item.override
return bar
This function is now a lot more regular. If you read it past return None
you can forget about the case where item.match == "B"
, and then forget
about the case where item.match == "A"
, and then forget about the case
where item.other == "C"
. In the original version that was a lot harder to
see.
Why pay attention to None?
This last reorganization of the guard clauses may seem like a useless action.
But I pay special attention to None
(or null
or undefined
or
however your language may name the absence of value). If you organize the guard
clauses that deal with None
to come earlier, it makes your functions more
regular and thus more easy to read.
It also triggers you to consider whether perhaps item.match == "B"
is
something you can handle at the call site, which can lead to further
refactorings. Later we'll consider that further in a bonus refactoring.
Languages that have an Option
or Maybe
type such as Haskell and Rust
make this more obvious and have special ways to handle these cases -- the
language forces you. TypeScript also tracks tracks null/undefined in its type
system. But in many other languages, such as Python, we're on our own. But we
certainly still have to pay attention to None
.
See also my the Story of None.
Back to process_items
Now let's look at the process_items
function again:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
result = process_item(item, bar)
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
Multiple exit points
Let's first transform this so we return early when we can:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
result = process_item(item, bar)
if result is not None:
break
else:
return "No bar"
if result is None:
return default
return result
Flip condition to create a guard
We can see clearly that "No bar"
is returned if bar is None
, so let's
flip that condition:
def process_items(items, bar, default):
result = None
if bar is None:
return "No bar"
else:
for item in items:
result = process_item(item, bar)
if result is not None:
break
if result is None:
return default
return result
We can now see the else
clause is not needed anymore, so let's
unindent the for
loop. We also move result = None
below that
guard clause for bar is None
, as it's not needed until that point:
def process_items(items, bar, default):
if bar is None:
return "No bar"
result = None
for item in items:
result = process_item(item, bar)
if result is not None:
break
if result is None:
return default
return result
So it turns out in the rest of the function we can completely forget about
bar
being None
. That's good. Maybe that guard can even be removed if we
can somehow guarantee the non-None nature of bar
at the call site. But we
can't determine that in this limited example. Let's go on refactoring this
function a bit more.
Turn loop break into early return
We take a look at the break
. If result is not None
, we break. Then
after that we check if result is None
. This can only happen if the loop
never breaked. If the loop did break we end up returning result
.
So we can just as well do the return result
immediately in the loop:
def process_items(items, bar, default):
if bar is None:
return "No bar"
result = None
for item in items:
result = process_item(item, bar)
if result is not None:
return result
if result is None:
return default
return result
Let's look at the bit of code past the end of the loop again. We know that
result
has to be None
if it reaches there. It's initialized to None
and the loop returns early if it's ever not None. So why do we even check
whether result is None
anymore? We can simply always return default
:
def process_items(items, bar, default):
if bar is None:
return "No bar"
result = None
for item in items:
result = process_item(item, bar)
if result is not None:
return result
return default
We have no more business setting result
to None
before the loop starts.
It's a local variable within the loop body now:
def process_items(items, bar, default):
if bar is None:
return "No bar"
for item in items:
result = process_item(item, bar)
if result is not None:
return result
return default
In review
Let's look at where we started and ended.
We started with this:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
if item.match == "A":
result = item.payload
elif item.match == "B":
continue
else:
if item.other == "C":
result = item.override
else:
result = bar
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
And we ended with this:
def process_items(items, bar, default):
if bar is None:
return "No bar"
for item in items:
result = process_item(item, bar)
if result is not None:
return result
return default
def process_item(item, bar):
if item.match == "B":
return None
if item.match == "A":
return item.payload
if item.other == "C":
return item.override
return bar
The second version is much easier to follow, I think. (it's also a few lines
less code, but that's not that important.)
In defense of single-use functions
So we created a process_item
function even though we only use it in one
place. Earlier asked why you would do such a thing. What benefits does that
have?
We could convert the function to use guard clauses, removing
a level of nesting and letting us come up with followup refactoring steps
that simplified our code.
It's clearer to see what actually really matters in the loop and
what doesn't, as it's spelled out in the parameters of the function.
We gave what happens in the for
loop a name. process_item
doesn't
say much in this case, but in a real-world code base your function
name can help you read your code more easily.
Maybe we'll end up reusing it after all!
It also can lead to interesting future refactorings as it's easier to see
patterns. If you do OOP for instance, you may end up with a group of functions
that all share the same set of arguments and this would suggest creating a
class with methods. But let's leave OOP be and consider None
.
A possible followup refactoring
We know bar
cannot be None
when process_item
is called -- see our
guard clause. If we know (or find a way to guarantee) that item.payload
and
item.override
can never be None
either, we can do this:
def process_items(items, bar, default):
if bar is None:
return "No bar"
for item in items:
if item.match != "B":
return process_item(item, bar)
return default
def process_item(item, bar):
if item.match == "A":
return item.payload
if item.other == "C":
return item.override
return bar
Which then leads to the question whether we should filter items
with
item.match != "B"
before they even reach process_items in the first case
-- another potential refactoring.
All of these refactorings require knowledge of what's impossible in the code
and the data -- its invariants. We don't know this in this contrived example.
But in a real code base, you can find out. A static type system can help make
these invariants explicit, but that doesn't mean that in a dynamically typed
language we should forget about them.
Yes, I'm saying the same as what I said about None
before -- whether
something is nullable is an important example of an invariant.
Conclusion
It's sometimes claimed that not only should a function have a single entry
point, but that it should also have a single exit. One could argue such from
sense of mathematical purity. But unless you work in a programming language
that combines mathematical purity with convenience (compile-time checked match
expressions help), that point seems moot to me. Many of us do not. (and no, we
can't easily switch either.)
Another argument for single exit points comes from languages like C, where you
have to free memory you allocated in the end before you exit a function, and
you want to have a single place where you do the cleanup. But again that's
irrelevant to many of us that use languages with automated garbage collection.
I've hope to have shown to you that for many of us, in many languages, multiple
exit points can make code a lot more clear. It helps to expose invariants and
potential invariants, which can then lead to followup refactorings.
P.S. If you like this content, consider following @faassen on Twitter. That's
me! Besides many other things, I sometimes talk about code there too.