Python in Detail: Writing Fast Python Code
Welcome to the Writing Fast Python Code lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
In a head-to-head comparison, Python code will always be slower than C code. Python will also lose to Java and JavaScript in almost all cases, as well as to many other languages. But direct comparisons like this are the wrong way to compare performance, because they ignore the way that Python is written in real systems. To build fast Python systems, we need to write idiomatic Python code, taking advantage of its strengths.
To write efficient code in Python, we rely on efficient code in the standard library and in third-party libraries. Python code can be thought of as "glue" binding together efficient libraries written in C.
To see that in action, let's take a simple example: Python's built-in sets. How do sets impact the performance of our code?
We'll examine three possible ways to remove duplicates from a list:
unique1,unique2, andunique3. Later in the lesson, we'll benchmark the three solutions to see how fast they actually are.Solution 1: sort the list. In a sorted list, two values that are equal will also be adjacent to each other. For example, in the sorted list
[1, 2, 2, 3], the2s are adjacent. We can loop through a sorted list and remove any value that's equal to its neighbor. This solution is easy in almost every programming language.(Note that as a side-effect of using the built-in
sortedfunction on the list, our de-duplicating function also returns a sorted list. To keep these examples simple, we assume that the final list's order doesn't matter.)>
def unique1(values):# Sort the list first, so values that are equal to each other will also be# next to each other.values = list(sorted(values))result = []for index in range(len(values)):if index == 0 or values[index] != values[index - 1]:result.append(values[index])return resultunique1([2, 1, 1, 3, 4, 2, 5, 4])Result:
[1, 2, 3, 4, 5]
That works, but it's quite slow. The problem is that sorting is an expensive operation. We'll measure the actual runtime cost once we've seen the other solutions.
Solution 2: use a dictionary as a stand-in for a set. Because dictionaries can't have duplicate keys, we can build a dictionary where the keys are the values that we want to de-duplicate. The dictionary's values don't matter to us, so we can use
Noneas a placeholder value. This was common in Python code before sets were added in Python 2.3 in 2003.Python's dicts preserve key insertion order. Unlike the first solution, the de-duplicated list preserves the order from the original list.
>
def unique2(values):# Make a dictionary where each value in `values` is a key, and all keys map# to `None`.value_dict = {}for value in values:value_dict[value] = Nonereturn list(value_dict.keys())unique2([2, 1, 1, 3, 4, 2, 5, 4])Result:
[2, 1, 3, 4, 5]
That's faster than the sorted list approach. The code is also shorter, and makes better use of Python's very efficient dict implementation. In the past, Python code was full of clever dict uses like this. These tricks are less common now, but are still useful in some situations.
Solution 3: just use a set! Note that the next example returns a set, not a list.
>
def unique3(values):return set(values)unique3([2, 1, 1, 3, 4, 2, 5, 4])Result:
{2, 1, 3, 4, 5}The set version takes almost no code! In fact, it's not even worth the effort to write the
unique3function in real-world systems. All it does is call the built-inset(...), which we can call directly.Let's benchmark the three solutions by comparing their runtime on a list of a million elements. First, here are the three implementations in one place.
>
def unique1(values):# Sort the list first, so values that are equal to each other will also be# next to each other.values = list(sorted(values))result = []for index in range(len(values)):if index == 0 or values[index] != values[index - 1]:result.append(values[index])return resultdef unique2(values):# Make a dictionary where each value in `values` is a key, and all keys map# to `None`.value_dict = {}for value in values:value_dict[value] = Nonereturn list(value_dict.keys())def unique3(values):return set(values)To benchmark each function, we capture the current time (in seconds), then call the function, then capture the current time again.
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
from time import timedef benchmark(func):start = time()func()return time() - start# Make a big initial list so the functions take enough time to measure.values = list(range(1000000))runtimes = (benchmark(lambda: unique1(values)),benchmark(lambda: unique2(values)),benchmark(lambda: unique3(values)),)runtimesResult:
(Like other benchmarks in this course, we hard-coded the results above for consistency. Those are the times that we got, but every computer will perform differently.)
Solution 3, which uses a set, is both shorter and significantly faster than the other two! It takes about 40% as much time as the dict solution, and about 12% as much time as the sorted list solution.
These results show us the value of Python's rich set of built-ins and large standard library. Different problems require different tools, and Python comes with a huge number of tools.
This also shows us how Python approaches performance in general. Directly implementing algorithms like
unique1may be slower in Python than in C, Java, or JavaScript. But in most Python programs, we delegate the slow work to built-ins, the standard library, or third-party libraries. Those can be highly optimized, often by writing them in C. Ifset(...)is the limiting factor in our program, then we're lucky, becauseset(...)is very fast!Let's see another example, again using set. Many algorithms require us to remove values from a collection, one at a time. In the example below, we benchmark two solutions in the same way that we did above. The first solution uses a list and the second uses a set.
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
from time import time# Benchmark list `.remove`values1 = list(range(50000))start1 = time()for i in range(50000):values1.remove(i)runtime1 = time() - start1# Benchmark set `.remove`values2 = set(range(50000))start2 = time()for i in range(50000):values2.remove(i)runtime2 = time() - start2(runtime1, runtime2)Result:
This time, sets are 42 times faster than lists!
This pattern repeats over and over again across the Python ecosystem. We could emulate dict-like structures with lists, but Python dicts are faster, so we use them instead. We could emulate set-like structures with lists or dicts, but Python sets are faster, so we use them instead.
The same reasoning applies to the standard library. We could manually write code to pack and unpack binary data, but the standard library's
structmodule is fast, so we use that. We could write a heap queue data structure from scratch, but the standard library'sheapqmodule is fast, so we use that.And again, the same reasoning applies to third-party modules. We could build our own data frame class for doing data analysis, but the pandas package is fast, so we use that.
Here are two takeaways from this lesson. First, always look for the appropriate built-in or standard library module. The standard library is rich, and generally implemented very well. It can save us development time and result in faster systems.
Second, when writing algorithmic code, look for a third-party module that can do the job. The Python package ecosystem is large and rich, so it's likely that someone else has already solved your algorithmic problem. If it's a common problem, there's probably a highly-optimized package whose core is written in C or another low-level language.
All of the tools mentioned in the paragraphs above are written entirely in C, or in Python with C mixed in for the performance-sensitive parts. By using third-party tools for expensive algorithmic parts of our work, we avoid the cost of doing that work directly in Python.