Execute Program

Python in Detail: Custom Iterators

Welcome to the Custom Iterators lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • In an earlier lesson, we created iterators by calling iter(some_iterable). Then we consumed the iterators by calling next(...) until it threw a StopIteration exception. But so far, we've only used iterators with built-in iterable types like lists and dictionaries.

  • We can also create custom iterable data types using Python's favorite extension mechanism: dunder methods. When we call iter(our_iterable), it calls our_iterable.__iter__(). Likewise, next(our_iterator) calls our_iterator.__next__().

  • >
    numbers = [10, 20, 30]
    next(iter(numbers))
    Result:
    10Pass Icon
  • >
    numbers = [10, 20, 30]
    numbers.__iter__().__next__()
    Result:
    10Pass Icon
  • >
    letters = ['a', 'b', 'c']
    letters_iterator = letters.__iter__()
    letters_iterator.__next__()
    letters_iterator.__next__()
    Result:
    'b'Pass Icon
  • We call these two methods "the iterator protocol".

  • We'll write two classes to build our custom iterable data type. First is the iterable itself, which is the class that actually holds the data. This is analogous to other data structures that we can loop over, like lists, dictionaries, and ranges.

  • Second, we'll write a separate class for the iterator, which loops through (or "iterates over") data. We've already written code that relied on iterator classes, though we didn't call it out at the time. For example, when we call iter(some_list), we get an instance of the list_iterator class.

  • >
    iterator = iter([1, 2, 3])
    iterator.__class__.__name__
    Result:
  • When we call iter(some_dict), we get an instance of the dict_keyiterator class that iterates over the keys.

  • >
    iterator = iter({
    "name": "Amir",
    })
    iterator.__class__.__name__
    Result:
  • To summarize: the iterable (like a list) stores the data. The iterator (like a list_iterator) remembers where we are in the iteration process.

  • We'll write a simple iterable class Repeater, which repeats a value a certain number of times. For example, Repeater(5, 3) should repeat the value 5 three times, similar to [5, 5, 5].

  • >
    class Repeater:
    def __init__(self, value, count):
    self.value = value
    self.count = count

    def __iter__(self):
    # not implemented yet!
    pass
  • Next, we need an iterator that keeps track of where we are in the repetition process. We can do that with a simple counter attribute.

  • >
    class Repeater:
    def __init__(self, value, count):
    self.value = value
    self.count = count

    def __iter__(self):
    return RepeaterIterator(self.value, self.count)

    class RepeaterIterator:
    def __init__(self, value, remaining):
    self.value = value
    self.remaining = remaining

    def __iter__(self):
    return self

    def __next__(self):
    if self.remaining == 0:
    raise StopIteration
    self.remaining -= 1
    return self.value
  • Calling iter on a Repeater gives us a RepeaterIterator, which implements .__next__.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    many_fives = Repeater(5, 3)
    my_iter = iter(many_fives)
    one_five = next(my_iter)
    (one_five, my_iter.remaining)
    Result:
    (5, 2)Pass Icon
  • By implementing the iterator protocol, we gain access to a lot of functionality. For example, list(...) works on all iterators, including Repeater. Internally, list calls iter to get an iterator, then calls next repeatedly to collect all of the values in a new list.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    my_iter = Repeater(5, 3)
    list(my_iter)
    Result:
    [5, 5, 5]Pass Icon
  • Our Repeater class works anywhere that other iterables work. We can loop over it with for loops, or in a comprehension.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    results = []
    for value in Repeater(3, 2):
    results.append(value)
    results
    Result:
    [3, 3]Pass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    [value * 2 for value in Repeater(10, 1)]
    Result:
    [20]Pass Icon
  • Two more notes about RepeaterIterator. First, why does it define .__iter__? That seems a bit strange: why would an iterator need a method that means "turn me into an iterator"? It's already an iterator!

  • The answer is that Python code often expects an iterable (an object with a .__iter__ method). If the iterator itself has an .__iter__, then we can pass it to functions that expect iterators as well as functions that expect iterables. This blurs the line between "iterable" and "iterator", which can be a bit confusing, but it's also very convenient. For example, it means that list(...) usually works with both iterables and iterators.

  • Second, take special note of the .__next__ method. The iterator stores its own progress, so every call to next(...) changes the iterator. For example, we can call next(my_iter) once to consume one value. When we later call list(my_iter), only two values are left.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    repeater = Repeater(5, 3)
    my_iter = iter(repeater)
    next(my_iter)
    list(my_iter)
    Result:
    [5, 5]Pass Icon
  • This shows us the iterator's main job: it holds the iteration state, remembering where we are in the iteration process.

  • If you look closely, our .__next__ method mirrors the way that next(...) works from the outside. When we call next(...) on a non-empty iterator, we get the next value. Our .__next__ method returns the next value, and it also decreases .remaining to prepare for future .__next__ calls.

  • When we call next(...) on an empty iterator, it raises StopIteration. Our .__next__ method raises StopIteration when self.remaining == 0.

  • Here's a code problem:

    Create a simple range iterable, SimpleRange. It should take one argument, end, and iterate over the values from 0 to end, excluding end.

    Implement the iterable as two separate classes: SimpleRange and SimpleRangeIterator.

    class SimpleRange:
    def __init__(self, end):
    self.end = end

    def __iter__(self):
    return SimpleRangeIterator(self.end)

    class SimpleRangeIterator:
    def __init__(self, end):
    self.current = 0
    self.end = end

    def __next__(self):
    if self.current == self.end:
    raise StopIteration
    else:
    value = self.current
    self.current += 1
    return value
    my_range = SimpleRange(5)
    collected_values = list(my_range)
    iterator_is_same_object_as_iterable = my_range is iter(my_range)
    [collected_values, iterator_is_same_object_as_iterable]
    Goal:
    [[0, 1, 2, 3, 4], False]
    Yours:
    [[0, 1, 2, 3, 4], False]Pass Icon
  • Two more notes on the iteration protocol. First, you might wonder whether we actually need separate Repeater and RepeaterIterator classes. It is possible to write them as a single class, but it's a bad idea. The code will break in confusing ways.

  • The problem is that we might create many iterators from a single iterable. For example, we might write two nested for loops looping over the same list.

  • >
    numbers = [1, 2]
    result = []

    for n1 in numbers:
    for n2 in numbers:
    result.append([n1, n2])

    result
    Result:
  • Each loop keep tracks of which element it's looping over, so there must be two distinct iterators. If we implement the iterable and iterator as a single object, the two loops won't be able to track their progress independently, so we'll get the wrong result.

  • Our second and final note about the iteration protocol: a given iterator can only be iterated once. There's no generalized way to rewind it to an earlier point, or to reset it to the beginning.

  • Couldn't we just add something like a .__prev__ method or .__reset__ method? For simple iterators, like iterators over lists, that might work. But what if we're iterating over a huge amount of data that's streamed to us over a network?

  • The iteration process might run for days or even months, with a total data volume far larger than a single computer can store at once. That works because we process the data as it comes in, only considering a small amount at any given time. But to support rewinding, we'd have to store the entire history of all data that the iterator ever saw, which won't fit on the computer!

  • This is why the iteration protocol only lets us ask "what's the next element?" That's the simplest interface that still allows constructs like for and list comprehensions to work.