Python in Detail: Itertools
Welcome to the Itertools lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Iterators are common in real-world Python code, so we often find ourselves combining them, splitting them, and filtering them. Fortunately, Python ships with
itertools, a set of utility functions to work with iterators. We'll explore a small sampling ofitertoolsfunctions in this lesson.Sometimes we need an iterator that counts up from a given starting value. We could write it as a generator.
>
def count(initial):i = initialwhile True:yield ii += 1counter = count(22)(next(counter), next(counter), next(counter))Result:
(22, 23, 24)
However, we don't have to write that code because
itertools.countalready does the same thing. It starts from 0 by default, or we can provide a starting value.>
import itertoolscounter = itertools.count(5)(next(counter), next(counter), next(counter))Result:
>
import itertoolscounter = itertools.count()(next(counter), next(counter), next(counter))Result:
(0, 1, 2)
Let's see
itertools.countin practice. In an earlier lesson, we useddefaultdictto generate IDs for users. The IDs themselves are created in aget_next_idfunction, which modifies a global variable. Here's the code we wrote in that lesson:>
from collections import defaultdictnext_id = 1def get_next_id():global next_idid = next_idnext_id += 1return iduser_ids = defaultdict(get_next_id)first_amir_id = user_ids["Amir"]betty_id = user_ids["Betty"]second_amir_id = user_ids["Amir"](first_amir_id, betty_id, second_amir_id)Result:
To generate IDs, we had to import
defaultdict, define a global variable, and write a function to increment that variable. The next example solves the same problem, but usesitertools.countinstead of ourget_next_idfunction. That lets us remove 5 lines of code.>
import collections, itertoolsnext_id_iterator = itertools.count(1)user_ids = collections.defaultdict(lambda: next(next_id_iterator))first_amir_id = user_ids["Amir"]betty_id = user_ids["Betty"]second_amir_id = user_ids["Amir"](first_amir_id, betty_id, second_amir_id)Result:
(1, 2, 1)
We made the code shorter, but that didn't require anything clever. We just made better use of the tools that Python gives us.
For convenience, the next few examples call
liston iterators to see their contents. But remember thatitertoolsfunctions return iterators, not lists!The
itertools.countiterator is infinitely long, so we can't convert it into a list. Trying to build that infinite list would eventually consume all available memory, then crash.However, we can slice the iterator to get only the section of it that we want. We've already seen list slicing:
some_list[start:end]. For iterators, we use theitertools.islicefunction to do the same thing.>
import itertoolslist(itertools.islice(itertools.count(), 2, 5))Result:
>
import itertoolsdef letters():for char in "Ms. Fluff":yield charlist(itertools.islice(letters(), 4, 9))Result:
Strictly speaking, we don't need that
lettersfunction. The string itself is an iterable.>
import itertoolslist(itertools.islice("Ms. Fluff", 4, 9))Result:
['F', 'l', 'u', 'f', 'f']
The
repeatfunction repeats a single value a certain number of times.>
import itertoolslist(itertools.repeat("a", 3))Result:
['a', 'a', 'a']
If we don't provide a number, the iterator always returns the same value.
>
import itertoolsalways_a = itertools.repeat("a")next(always_a)Result:
'a'
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
(next(always_a), next(always_a))Result:
('a', 'a') The
cyclefunction creates an infinite iterator that endlessly cycles through another iterator's values. When it gets to the end of the source iterator, it starts back at the beginning again.>
list(range(3))Result:
>
import itertoolscycling = itertools.cycle(range(3))list(itertools.islice(cycling, 0, 22))Result:
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
list(itertools.islice(cycling, 9184, 9188))Result:
In that example, we only got four list elements. But to get to those elements,
islicefirst had to consume 9,184 other iterator elements.The
takewhilefunction uses another function to filter an iterator. It calls the function on each iterated value, and returns an iterator that gives us the values where the function returnsTrue. When the function returnsFalse, the iterator immediately ends.>
import itertoolslist(itertools.takewhile(lambda n: n < 5, itertools.count()))Result:
>
import itertools# Remember: takewhile stops as soon as the function returns False!list(itertools.takewhile(lambda n: n % 2 == 0, range(0, 10)))Result:
[0]
One important thing about
takewhile: to find the firstFalsevalue, it needs to actually consume that value from the underlying iterator. (In the example above, that final value was 1.) The passed function returnsFalsefor that value, so it doesn't appear in the final iterator. But it also isn't left in the original iterator. It's simply lost.>
import itertoolsmy_iter = itertools.count()list(itertools.takewhile(lambda n: n <= 4, my_iter))first_unconsumed_value = next(my_iter)first_unconsumed_valueResult:
6
Sometimes we want the opposite: we only want the values starting from the first point where the function returns
True. For that, we can usedropwhile. It consumes (drops) all of the iterated values until the function is true. Then it iterates over all remaining values, regardless of what the function returns.>
import itertoolsmy_iter = itertools.dropwhile(lambda n: n < 5, itertools.count())(next(my_iter), next(my_iter))Result:
The
teefunction duplicates an iterator into two iterators. Each of the resulting iterators iterates over all of the values from the original iterator. In other words, consuming values from one iterator won't affect the other iterator.This is called "tee" by analogy to a "tee joint" in plumbing: a T-shaped section of pipe that splits one pipe into two pipes. It's an imperfect analogy: a given water molecule can either go left or right in a pipe, but not both, whereas
teesends each value to both iterators.>
import itertoolsnumbers = itertools.count()(my_iter_1, my_iter_2) = itertools.tee(numbers)(next(my_iter_1), next(my_iter_2))Result:
(0, 0)
We can provide an integer as an optional second argument to
teeto split it into even more iterators.>
import itertoolsnumbers = itertools.count()(my_iter_1, my_iter_2, my_iter_3, my_iter_4) = itertools.tee(numbers, 4)(next(my_iter_1), next(my_iter_2), next(my_iter_3), next(my_iter_4))Result:
Normally, iterators don't store all of their values in memory. That's why they're so useful for representing infinite sequences of data. However,
teeintroduces a new complication. Here's a demonstration that we'll analyze after seeing it.>
import itertoolsnumbers = itertools.count()(my_iter_1, my_iter_2) = itertools.tee(numbers)# Consume 1,000 numbers from my_iter_1.for _ in range(1000):next(my_iter_1)# my_iter_2 is still waiting at the first number.next(my_iter_2)Result:
0
If we continue to call
next(my_iter_2), we'll get 1, 2, etc. But where are those numbers coming from? We know that the originalitertools.countiterator was already iterated all the way to 1,000, so the numbers aren't coming from that original iterator.The answer is that
teestores all of those values in memory. But note that the two iterators returned byteeare still independent. Both of them will iterate over each value from the original iterator. Iterating one of the tee iterators won't affect the other tee iterator.Sometimes this can cause serious memory usage problems. Imagine that the code above continues to iterate over
my_iter_1, but never iteratesmy_iter_2. Python has to keep all of the iterated values in memory forever, just in case we start iteratingmy_iter_2in the future. Eventually, Python will run out of memory and crash. That's a problem, but it's also expected: when working with large data sets, memory management is always a concern, even in a language like Python.We've covered a few of the many itertools methods here. There are many more, so memorizing them all up front isn't a reasonable goal. Instead, we recommend checking the docs whenever you find yourself writing a function that transforms an iterator into another iterator. It's possible that the function is already written for you!