Refactoring to Multiple Exit Points
Introduction
Functions should have only a single entry point. We all agree on that. But some people also argue that functions should have a single exit that returns the value. More people don't seem to care enough about how their functions are organized. I think that makes functions a lot more complicated than they have to be. So let's talk about function organization and how multiple exit points can help.
I'm going to use Python in the examples, but these examples apply to many other languages such as JavaScript and Ruby as well, so do keep reading.
Starting point
Let's consider the following function:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
if item.match == "A":
result = item.payload
elif item.match == "B":
continue
else:
if item.other == "C":
result = item.override
else:
result = bar
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
It's a silly function, it's a hypothetical function, but there are plenty of functions with this kind of structure. They might not be born this way, but they've certainly grown into it. I find them difficult to follow. You can recognize them by one symptom already: quite a bit of indentation. You can also recognize them by trying to trace what happens in them; notice how your working memory fills up quickly.
Extract function from loop body
How would we go about refactoring it? The first step I would take is to extract the loop body into a separate function. You may say, why do so? Objections could be:
- The loop body isn't reused in multiple places, so why should it be a function?
- You have to manage function parameters whereas before all was
conveniently available in the body of
foo
.
That is all so, but let's do it anyway and see what happens, and then get back to this in the end:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
result = process_item(item, bar)
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
def process_item(item, bar):
if item.match == "A":
result = item.payload
elif item.match == "B":
result = None
else:
if item.other == "C":
result = item.override
else:
result = bar
return result
We've had to extract two parameters - item
and bar
. It turns out
process_item
doesn't care about
default. We've had to convert the
continue
to a result = None
to keep things working properly, as now
we always run into the if result is not
None check whereas before we did not.
Multiple exit points
We notice that result is only touched once in each code path in
process_item
. This means we can convert the function to use multiple
exit points with the return
statement, so let's do that:
def process_item(item, bar):
if item.match == "A":
return item.payload
elif item.match == "B":
return None
else:
if item.other == "C":
return item.override
else:
return bar
Convert to guard clauses
That's still more complicated than it should be. Since we have early
exit points, we can get rid of the elif
and else
clauses:
def process_item(item, bar):
if item.match == "A":
return item.payload
if item.match == "B":
return None
if item.other == "C":
return item.override
else:
return bar
Some indentation is gone, which is a good sign. And we see another
else
we can get rid of now:
def process_item(item, bar):
if item.match == "A":
return item.payload
if item.match == "B":
return None
if item.other == "C":
return item.override
return bar
Pay attention to None
I think the return None
case is special, so let's move that up. That's
safe as A
and B
for item.match
are mutually exclusive and this
function has no side effects:
def process_item(item, bar):
if item.match == "B":
return None
if item.match == "A":
return item.payload
if item.other == "C":
return item.override
return bar
This function is now a lot more regular. If you read it past
return None
you can forget about the case where item.match == "B"
,
and then forget about the case where item.match == "A"
, and then
forget about the case where item.other == "C"
. In the original version
that was a lot harder to see.
Why pay attention to None?
This last reorganization of the guard clauses may seem like a useless
action. But I pay special attention to None
(or null
or undefined
or however your language may name the absence of value). If you organize
the guard clauses that deal with None
to come earlier, it makes your
functions more regular and thus more easy to read.
It also triggers you to consider whether perhaps item.match == "B"
is
something you can handle at the call site, which can lead to further
refactorings. Later we'll consider that further in a bonus refactoring.
Languages that have an Option
or Maybe
type such as Haskell and Rust
make this more obvious and have special ways to handle these cases --
the language forces you. TypeScript also tracks tracks
null/undefined in its type system. But in
many other languages, such as Python, we're on our own. But we certainly
still have to pay attention to None
.
See also my the Story of None.
Back to process_items
Now let's look at the process_items
function again:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
result = process_item(item, bar)
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
Multiple exit points
Let's first transform this so we return early when we can:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
result = process_item(item, bar)
if result is not None:
break
else:
return "No bar"
if result is None:
return default
return result
Flip condition to create a guard
We can see clearly that "No bar"
is returned if bar is None
, so
let's flip that condition:
def process_items(items, bar, default):
result = None
if bar is None:
return "No bar"
else:
for item in items:
result = process_item(item, bar)
if result is not None:
break
if result is None:
return default
return result
We can now see the else
clause is not needed anymore, so let's
unindent the for
loop. We also move result = None
below that guard
clause for bar is None
, as it's not needed until that point:
def process_items(items, bar, default):
if bar is None:
return "No bar"
result = None
for item in items:
result = process_item(item, bar)
if result is not None:
break
if result is None:
return default
return result
So it turns out in the rest of the function we can completely forget
about bar
being None
. That's good. Maybe that guard can even be
removed if we can somehow guarantee the non-None nature of bar
at the
call site. But we can't determine that in this limited example. Let's go
on refactoring this function a bit more.
Turn loop break into early return
We take a look at the break
. If result is not None
, we break. Then
after that we check if result is None
. This can only happen if the
loop never breaked. If the loop did break we end up returning
result
.
So we can just as well do the return result
immediately in the loop:
def process_items(items, bar, default):
if bar is None:
return "No bar"
result = None
for item in items:
result = process_item(item, bar)
if result is not None:
return result
if result is None:
return default
return result
Let's look at the bit of code past the end of the loop again. We know
that result
has to be None
if it reaches there. It's initialized
to None and the loop returns early if
it's ever not None. So why do we even
check whether result is None
anymore? We can simply always return
default
:
def process_items(items, bar, default):
if bar is None:
return "No bar"
result = None
for item in items:
result = process_item(item, bar)
if result is not None:
return result
return default
We have no more business setting result
to None
before the loop
starts. It's a local variable within the loop body now:
def process_items(items, bar, default):
if bar is None:
return "No bar"
for item in items:
result = process_item(item, bar)
if result is not None:
return result
return default
In review
Let's look at where we started and ended.
We started with this:
def process_items(items, bar, default):
result = None
if bar is not None:
for item in items:
if item.match == "A":
result = item.payload
elif item.match == "B":
continue
else:
if item.other == "C":
result = item.override
else:
result = bar
if result is not None:
break
else:
result = "No bar"
if result is None:
result = default
return result
And we ended with this:
def process_items(items, bar, default):
if bar is None:
return "No bar"
for item in items:
result = process_item(item, bar)
if result is not None:
return result
return default
def process_item(item, bar):
if item.match == "B":
return None
if item.match == "A":
return item.payload
if item.other == "C":
return item.override
return bar
The second version is much easier to follow, I think. (it's also a few lines less code, but that's not that important.)
In defense of single-use functions
So we created a process_item
function even though we only use it in
one place. Earlier asked why you would do such a thing. What benefits
does that have?
- We could convert the function to use guard clauses, removing a level of nesting and letting us come up with followup refactoring steps that simplified our code.
- It's clearer to see what actually really matters in the loop and what doesn't, as it's spelled out in the parameters of the function.
- We gave what happens in the
for
loop a name.process_item
doesn't say much in this case, but in a real-world code base your function name can help you read your code more easily. - Maybe we'll end up reusing it after all!
It also can lead to interesting future refactorings as it's easier to
see patterns. If you do OOP for instance, you may end up with a group of
functions that all share the same set of arguments and this would
suggest creating a class with methods. But let's leave OOP be and
consider None
.
A possible followup refactoring
We know bar
cannot be None
when process_item
is called -- see our
guard clause. If we know (or find a way to guarantee) that
item.payload
and item.override
can never be None
either, we can do
this:
def process_items(items, bar, default):
if bar is None:
return "No bar"
for item in items:
if item.match != "B":
return process_item(item, bar)
return default
def process_item(item, bar):
if item.match == "A":
return item.payload
if item.other == "C":
return item.override
return bar
Which then leads to the question whether we should filter items
with
item.match != "B"
before they even reach
process_items in the first case --
another potential refactoring.
All of these refactorings require knowledge of what's impossible in the code and the data -- its invariants. We don't know this in this contrived example. But in a real code base, you can find out. A static type system can help make these invariants explicit, but that doesn't mean that in a dynamically typed language we should forget about them.
Yes, I'm saying the same as what I said about None
before -- whether
something is nullable is an important example of an invariant.
Conclusion
It's sometimes claimed that not only should a function have a single entry point, but that it should also have a single exit. One could argue such from sense of mathematical purity. But unless you work in a programming language that combines mathematical purity with convenience (compile-time checked match expressions help), that point seems moot to me. Many of us do not. (and no, we can't easily switch either.)
Another argument for single exit points comes from languages like C, where you have to free memory you allocated in the end before you exit a function, and you want to have a single place where you do the cleanup. But again that's irrelevant to many of us that use languages with automated garbage collection.
I've hope to have shown to you that for many of us, in many languages, multiple exit points can make code a lot more clear. It helps to expose invariants and potential invariants, which can then lead to followup refactorings.
P.S. If you like this content, consider following @faassen on Twitter. That's me! Besides many other things, I sometimes talk about code there too.