Execute Program

Python in Detail: Customizing Indexing

Welcome to the Customizing Indexing lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • Sometimes we want our own classes to work with the indexing operators, like some_obj[some_key] = some_value. Python lets us customize indexing behavior via four dunder methods:

    • When we access some_object[key], Python calls .__getitem__(key).
    • When we set some_object[key] = value, Python calls .__setitem__(key, value).
    • When we ask key in some_object, Python calls .__contains__(key).
    • When we del some_object[key], Python calls .__delitem__(key).
  • In this lesson, we'll implement these four methods for our own dict-like data structure, SlowDict. To query or update a SlowDict, we'll need to scan through the entire list. That's very inefficient, which is why it's called SlowDict!

  • >
    class SlowDict:
    def __init__(self):
    self.items = []

    def __getitem__(self, key):
    for (existing_key, value) in self.items:
    if key == existing_key:
    return value

    def __setitem__(self, key, value):
    self.items.append((key, value))

    def __contains__(self, key):
    keys = [key for (key, value) in self.items]
    return key in keys

    def __delitem__(self, key):
    if key not in self:
    raise KeyError(key)
    self.items = [(k, v) for (k, v) in self.items if k != key]
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    cat_ages = SlowDict()
    cat_ages["Keanu"] = 2
    cat_ages["Keanu"]
    Result:
    2Pass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    cat_ages = SlowDict()
    cat_ages["Ms. Fluff"] = 4
    "Ms. Fluff" in cat_ages
    Result:
    TruePass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    cat_ages = SlowDict()
    cat_ages["Ms. Fluff"] = 4
    del cat_ages["Ms. Fluff"]
    "Ms. Fluff" in cat_ages
    Result:
    FalsePass Icon
  • SlowDict works in those simple cases. However, the current version contains two subtle bugs. See if you can spot them before we continue.

  • The first bug is that .__setitem__ always appends an entry to self.items, even if key was already present earlier in the list. That can result in multiple entries for the same key.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    cat_ages = SlowDict()
    cat_ages["Keanu"] = 2
    cat_ages["Keanu"] = 3
    cat_ages.items
    Result:
  • When we call .__getitem__, we get the first matching entry in self.items. That gives us the oldest value for that key, but we want the most recently set value!

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    cat_ages = SlowDict()
    cat_ages["Keanu"] = 2
    cat_ages["Keanu"] = 3
    cat_ages["Keanu"]
    Result:
    2Pass Icon
  • The second bug is even more subtle. When a key doesn't exist, a regular dictionary raises a KeyError, whereas SlowDict.__getitem__ returns None. That happens because the loop iterates over every key-value pair in self.items. If no key matches, the function ends, implicitly returning None.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    cat_ages = SlowDict()
    cat_ages["Keanu"] = 2
    # This line returns a value, but we'd rather have it error.
    cat_ages["Ms. Fluff"]
    Result:
    NonePass Icon
  • In some languages, like Ruby and JavaScript, this wouldn't be considered a bug. But in Python, there's a strong convention that we should always raise a KeyError when a key doesn't exist. It's even specified in the official documentation: "For mapping types, if key is missing (not in the container), KeyError should be raised."

  • ("Containers" are data types that hold other values. Lists, tuples, and dicts are all containers. "Mapping types" are containers that map keys to values, like dict and SlowDict.)

  • This convention may seem like a minor detail at first, but it's important and can lead to difficult bugs! Here's a demonstration.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    cat_db = SlowDict()
    cat_db["Keanu"] = 2
  • The SlowDict only contains Keanu. When we ask whether it contains Ms. Fluff, .__contains__ gives the right answer. But .__getitem__ disagrees and gives the wrong answer.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    "Ms. Fluff" in cat_db
    Result:
    FalsePass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    contains_ms_fluff = None
    try:
    # This uses `.__getitem__`. When implemented correctly, it should raise
    # when a key doesn't exist.
    cat_db["Ms. Fluff"]
    contains_ms_fluff = True
    except KeyError:
    contains_ms_fluff = False

    contains_ms_fluff
    Result:
    TruePass Icon
  • The bug is in SlowDict, not cat_db.__getitem__. SlowDict broke the KeyError convention. When a custom data type breaks conventions, it only works with code that's aware of its specific quirks. We have to remember all of those quirks to use data type correctly.

  • If we adhere to the Python conventions, like ".__getitem__ should raise a KeyError when the key doesn't exist", then our classes will work with almost any Python code. Even more importantly, other Python programmers will be able to guess how our code works. If we fix this bug, then we can pass SlowDicts to code written by people who have never heard of SlowDict.

  • Here's a code problem:

    Fix the two bugs in this SlowDict class.

    Bug 1: Keys remain "stuck" at the first value they're given, even if we try to change their values later. This is a bug in the .__setitem__ method. To fix the bug, you can use some_key in self (which calls .__contains__) to decide whether the key exists, then del self[key] (which calls .__delitem__) to delete it. You could call the dunder methods directly, but we recommend using the standard in and del syntax.

    Bug 2: When we request a key that doesn't exist, we get None back. This is a bug in the .__getitem__ method. To match Python convention, we should raise a KeyError when the key doesn't exist. Pay close attention to exactly where you raise the KeyError!

    class SlowDict:
    def __init__(self):
    self.items = []

    def __getitem__(self, key):
    for (existing_key, value) in self.items:
    if key == existing_key:
    return value

    raise KeyError(key)

    def __setitem__(self, key, value):
    # Remove this key if it exists, then set the key to the new value.
    if key in self:
    del self[key]
    self.items.append((key, value))

    def __contains__(self, key):
    keys = [key for (key, value) in self.items]
    return key in keys

    def __delitem__(self, key):
    if key not in self:
    raise KeyError(key)
    self.items = [(k, v) for (k, v) in self.items if k != key]
    cat_ages = SlowDict()
    cat_ages["Keanu"] = 2
    cat_ages["Ms. Fluff"] = 4
    assert cat_ages["Keanu"] == 2

    cat_ages["Keanu"] = 3
    assert cat_ages["Keanu"] == 3

    del cat_ages["Keanu"]
    assert_raises(KeyError, lambda: cat_ages["Keanu"])
    assert cat_ages["Ms. Fluff"] == 4
    Goal:
    No errors.
    Yours:
    No errors.Pass Icon
  • Finally, you might wonder just how slow SlowDict is. Let's run a benchmark on it to find out.

  • (Unlike most code examples in Execute Program, we won't run these examples in your browser. Instead, they're hard-coded to show the results that we got while writing this lesson. You'll see different results if you run the code on your computer, but the performance difference will be similar.)

  • Here's the benchmark of SlowDict.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    import time

    d = SlowDict()
    for number in range(10000):
    d[number] = "a value"

    # The `time.time` function gives us the current time, expressed as the number
    # of seconds since the Unix epoch: 1970-01-01 at 00:00:00 UTC.
    start = time.time()
    for number in range(10000):
    d[number]
    end = time.time()

    end - start
    Result:
  • Here's the same benchmark for Python's built-in dict. The code is identical, except that we build a dict instead of a SlowDict.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    import time

    d = dict()
    for number in range(10000):
    d[number] = "a value"

    # The `time.time` function gives us the current time, expressed as the number
    # of seconds since the Unix epoch: 1970-01-01 at 00:00:00 UTC.
    start = time.time()
    for number in range(10000):
    d[number]
    end = time.time()

    end - start
    Result:
  • That's a big difference: Python's built-in dict is 1,267 times faster than SlowDict. It's the difference between 1 second and 21 minutes!

  • This lesson was primarily about the four dunder methods that enable indexing: .__getitem__, .__setitem__, .__contains__, and .__delitem__. But there are two other important takeaways.

  • First, introducing indexing bugs is easy, even in a simple class like SlowDict. It's easy to miss edge cases like "what happens when a key doesn't exist?" As with any dunder method, extra testing is always a good idea.

  • The second takeaway is that it's very easy to write collections that perform poorly. It's especially important to consider performance and write benchmarks when customizing indexing.