Python in Detail: Custom Iterators
Welcome to the Custom Iterators lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
In an earlier lesson, we created iterators by calling
iter(some_iterable). Then we consumed the iterators by callingnext(...)until it threw aStopIterationexception. But so far, we've only used iterators with built-in iterable types like lists and dictionaries.We can also create custom iterable data types using Python's favorite extension mechanism: dunder methods. When we call
iter(our_iterable), it callsour_iterable.__iter__(). Likewise,next(our_iterator)callsour_iterator.__next__().>
numbers = [10, 20, 30]next(iter(numbers))Result:
10
>
numbers = [10, 20, 30]numbers.__iter__().__next__()Result:
10
>
letters = ['a', 'b', 'c']letters_iterator = letters.__iter__()letters_iterator.__next__()letters_iterator.__next__()Result:
'b'
We call these two methods "the iterator protocol".
We'll write two classes to build our custom iterable data type. First is the iterable itself, which is the class that actually holds the data. This is analogous to other data structures that we can loop over, like lists, dictionaries, and ranges.
Second, we'll write a separate class for the iterator, which loops through (or "iterates over") data. We've already written code that relied on iterator classes, though we didn't call it out at the time. For example, when we call
iter(some_list), we get an instance of thelist_iteratorclass.>
iterator = iter([1, 2, 3])iterator.__class__.__name__Result:
When we call
iter(some_dict), we get an instance of thedict_keyiteratorclass that iterates over the keys.>
iterator = iter({"name": "Amir",})iterator.__class__.__name__Result:
To summarize: the iterable (like a list) stores the data. The iterator (like a
list_iterator) remembers where we are in the iteration process.We'll write a simple iterable class
Repeater, which repeats a value a certain number of times. For example,Repeater(5, 3)should repeat the value5three times, similar to[5, 5, 5].>
class Repeater:def __init__(self, value, count):self.value = valueself.count = countdef __iter__(self):# not implemented yet!passNext, we need an iterator that keeps track of where we are in the repetition process. We can do that with a simple counter attribute.
>
class Repeater:def __init__(self, value, count):self.value = valueself.count = countdef __iter__(self):return RepeaterIterator(self.value, self.count)class RepeaterIterator:def __init__(self, value, remaining):self.value = valueself.remaining = remainingdef __iter__(self):return selfdef __next__(self):if self.remaining == 0:raise StopIterationself.remaining -= 1return self.valueCalling
iteron aRepeatergives us aRepeaterIterator, which implements.__next__.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
many_fives = Repeater(5, 3)my_iter = iter(many_fives)one_five = next(my_iter)(one_five, my_iter.remaining)Result:
(5, 2)
By implementing the iterator protocol, we gain access to a lot of functionality. For example,
list(...)works on all iterators, includingRepeater. Internally,listcallsiterto get an iterator, then callsnextrepeatedly to collect all of the values in a new list.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
my_iter = Repeater(5, 3)list(my_iter)Result:
[5, 5, 5]
Our
Repeaterclass works anywhere that other iterables work. We can loop over it withforloops, or in a comprehension.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
results = []for value in Repeater(3, 2):results.append(value)resultsResult:
[3, 3]
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
[value * 2 for value in Repeater(10, 1)]Result:
[20]
Two more notes about
RepeaterIterator. First, why does it define.__iter__? That seems a bit strange: why would an iterator need a method that means "turn me into an iterator"? It's already an iterator!The answer is that Python code often expects an iterable (an object with a
.__iter__method). If the iterator itself has an.__iter__, then we can pass it to functions that expect iterators as well as functions that expect iterables. This blurs the line between "iterable" and "iterator", which can be a bit confusing, but it's also very convenient. For example, it means thatlist(...)usually works with both iterables and iterators.Second, take special note of the
.__next__method. The iterator stores its own progress, so every call tonext(...)changes the iterator. For example, we can callnext(my_iter)once to consume one value. When we later calllist(my_iter), only two values are left.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
repeater = Repeater(5, 3)my_iter = iter(repeater)next(my_iter)list(my_iter)Result:
[5, 5]
This shows us the iterator's main job: it holds the iteration state, remembering where we are in the iteration process.
If you look closely, our
.__next__method mirrors the way thatnext(...)works from the outside. When we callnext(...)on a non-empty iterator, we get the next value. Our.__next__method returns the next value, and it also decreases.remainingto prepare for future.__next__calls.When we call
next(...)on an empty iterator, it raisesStopIteration. Our.__next__method raisesStopIterationwhenself.remaining == 0.Here's a code problem:
Create a simple range iterable,
SimpleRange. It should take one argument,end, and iterate over the values from 0 toend, excludingend.Implement the iterable as two separate classes:
SimpleRangeandSimpleRangeIterator.class SimpleRange:def __init__(self, end):self.end = enddef __iter__(self):return SimpleRangeIterator(self.end)class SimpleRangeIterator:def __init__(self, end):self.current = 0self.end = enddef __next__(self):if self.current == self.end:raise StopIterationelse:value = self.currentself.current += 1return valuemy_range = SimpleRange(5)collected_values = list(my_range)iterator_is_same_object_as_iterable = my_range is iter(my_range)[collected_values, iterator_is_same_object_as_iterable]- Goal:
[[0, 1, 2, 3, 4], False]
- Yours:
[[0, 1, 2, 3, 4], False]
Two more notes on the iteration protocol. First, you might wonder whether we actually need separate
RepeaterandRepeaterIteratorclasses. It is possible to write them as a single class, but it's a bad idea. The code will break in confusing ways.The problem is that we might create many iterators from a single iterable. For example, we might write two nested
forloops looping over the same list.>
numbers = [1, 2]result = []for n1 in numbers:for n2 in numbers:result.append([n1, n2])resultResult:
Each loop keep tracks of which element it's looping over, so there must be two distinct iterators. If we implement the iterable and iterator as a single object, the two loops won't be able to track their progress independently, so we'll get the wrong result.
Our second and final note about the iteration protocol: a given iterator can only be iterated once. There's no generalized way to rewind it to an earlier point, or to reset it to the beginning.
Couldn't we just add something like a
.__prev__method or.__reset__method? For simple iterators, like iterators over lists, that might work. But what if we're iterating over a huge amount of data that's streamed to us over a network?The iteration process might run for days or even months, with a total data volume far larger than a single computer can store at once. That works because we process the data as it comes in, only considering a small amount at any given time. But to support rewinding, we'd have to store the entire history of all data that the iterator ever saw, which won't fit on the computer!
This is why the iteration protocol only lets us ask "what's the next element?" That's the simplest interface that still allows constructs like
forand list comprehensions to work.