Python in Detail: Customizing Indexing
Welcome to the Customizing Indexing lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Sometimes we want our own classes to work with the indexing operators, like
some_obj[some_key] = some_value. Python lets us customize indexing behavior via four dunder methods:- When we access
some_object[key], Python calls.__getitem__(key). - When we set
some_object[key] = value, Python calls.__setitem__(key, value). - When we ask
key in some_object, Python calls.__contains__(key). - When we
del some_object[key], Python calls.__delitem__(key).
- When we access
In this lesson, we'll implement these four methods for our own dict-like data structure,
SlowDict. To query or update aSlowDict, we'll need to scan through the entire list. That's very inefficient, which is why it's calledSlowDict!>
class SlowDict:def __init__(self):self.items = []def __getitem__(self, key):for (existing_key, value) in self.items:if key == existing_key:return valuedef __setitem__(self, key, value):self.items.append((key, value))def __contains__(self, key):keys = [key for (key, value) in self.items]return key in keysdef __delitem__(self, key):if key not in self:raise KeyError(key)self.items = [(k, v) for (k, v) in self.items if k != key]- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
cat_ages = SlowDict()cat_ages["Keanu"] = 2cat_ages["Keanu"]Result:
2
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
cat_ages = SlowDict()cat_ages["Ms. Fluff"] = 4"Ms. Fluff" in cat_agesResult:
True
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
cat_ages = SlowDict()cat_ages["Ms. Fluff"] = 4del cat_ages["Ms. Fluff"]"Ms. Fluff" in cat_agesResult:
False
SlowDictworks in those simple cases. However, the current version contains two subtle bugs. See if you can spot them before we continue.The first bug is that
.__setitem__always appends an entry toself.items, even ifkeywas already present earlier in the list. That can result in multiple entries for the same key.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
cat_ages = SlowDict()cat_ages["Keanu"] = 2cat_ages["Keanu"] = 3cat_ages.itemsResult:
When we call
.__getitem__, we get the first matching entry inself.items. That gives us the oldest value for that key, but we want the most recently set value!- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
cat_ages = SlowDict()cat_ages["Keanu"] = 2cat_ages["Keanu"] = 3cat_ages["Keanu"]Result:
2
The second bug is even more subtle. When a key doesn't exist, a regular dictionary raises a
KeyError, whereasSlowDict.__getitem__returnsNone. That happens because the loop iterates over every key-value pair inself.items. If no key matches, the function ends, implicitly returningNone.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
cat_ages = SlowDict()cat_ages["Keanu"] = 2# This line returns a value, but we'd rather have it error.cat_ages["Ms. Fluff"]Result:
None
In some languages, like Ruby and JavaScript, this wouldn't be considered a bug. But in Python, there's a strong convention that we should always raise a
KeyErrorwhen a key doesn't exist. It's even specified in the official documentation: "For mapping types, ifkeyis missing (not in the container),KeyErrorshould be raised."("Containers" are data types that hold other values. Lists, tuples, and dicts are all containers. "Mapping types" are containers that map keys to values, like
dictandSlowDict.)This convention may seem like a minor detail at first, but it's important and can lead to difficult bugs! Here's a demonstration.
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
cat_db = SlowDict()cat_db["Keanu"] = 2 The
SlowDictonly contains Keanu. When we ask whether it contains Ms. Fluff,.__contains__gives the right answer. But.__getitem__disagrees and gives the wrong answer.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
"Ms. Fluff" in cat_dbResult:
False
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
contains_ms_fluff = Nonetry:# This uses `.__getitem__`. When implemented correctly, it should raise# when a key doesn't exist.cat_db["Ms. Fluff"]contains_ms_fluff = Trueexcept KeyError:contains_ms_fluff = Falsecontains_ms_fluffResult:
True
The bug is in
SlowDict, notcat_db.__getitem__.SlowDictbroke theKeyErrorconvention. When a custom data type breaks conventions, it only works with code that's aware of its specific quirks. We have to remember all of those quirks to use data type correctly.If we adhere to the Python conventions, like "
.__getitem__should raise aKeyErrorwhen the key doesn't exist", then our classes will work with almost any Python code. Even more importantly, other Python programmers will be able to guess how our code works. If we fix this bug, then we can passSlowDicts to code written by people who have never heard ofSlowDict.Here's a code problem:
Fix the two bugs in this
SlowDictclass.Bug 1: Keys remain "stuck" at the first value they're given, even if we try to change their values later. This is a bug in the
.__setitem__method. To fix the bug, you can usesome_key in self(which calls.__contains__) to decide whether the key exists, thendel self[key](which calls.__delitem__) to delete it. You could call the dunder methods directly, but we recommend using the standardinanddelsyntax.Bug 2: When we request a key that doesn't exist, we get
Noneback. This is a bug in the.__getitem__method. To match Python convention, we should raise aKeyErrorwhen the key doesn't exist. Pay close attention to exactly where you raise theKeyError!class SlowDict:def __init__(self):self.items = []def __getitem__(self, key):for (existing_key, value) in self.items:if key == existing_key:return valueraise KeyError(key)def __setitem__(self, key, value):# Remove this key if it exists, then set the key to the new value.if key in self:del self[key]self.items.append((key, value))def __contains__(self, key):keys = [key for (key, value) in self.items]return key in keysdef __delitem__(self, key):if key not in self:raise KeyError(key)self.items = [(k, v) for (k, v) in self.items if k != key]cat_ages = SlowDict()cat_ages["Keanu"] = 2cat_ages["Ms. Fluff"] = 4assert cat_ages["Keanu"] == 2cat_ages["Keanu"] = 3assert cat_ages["Keanu"] == 3del cat_ages["Keanu"]assert_raises(KeyError, lambda: cat_ages["Keanu"])assert cat_ages["Ms. Fluff"] == 4- Goal:
No errors.
- Yours:
No errors.
Finally, you might wonder just how slow
SlowDictis. Let's run a benchmark on it to find out.(Unlike most code examples in Execute Program, we won't run these examples in your browser. Instead, they're hard-coded to show the results that we got while writing this lesson. You'll see different results if you run the code on your computer, but the performance difference will be similar.)
Here's the benchmark of
SlowDict.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
import timed = SlowDict()for number in range(10000):d[number] = "a value"# The `time.time` function gives us the current time, expressed as the number# of seconds since the Unix epoch: 1970-01-01 at 00:00:00 UTC.start = time.time()for number in range(10000):d[number]end = time.time()end - startResult:
Here's the same benchmark for Python's built-in
dict. The code is identical, except that we build adictinstead of aSlowDict.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
import timed = dict()for number in range(10000):d[number] = "a value"# The `time.time` function gives us the current time, expressed as the number# of seconds since the Unix epoch: 1970-01-01 at 00:00:00 UTC.start = time.time()for number in range(10000):d[number]end = time.time()end - startResult:
That's a big difference: Python's built-in dict is 1,267 times faster than
SlowDict. It's the difference between 1 second and 21 minutes!This lesson was primarily about the four dunder methods that enable indexing:
.__getitem__,.__setitem__,.__contains__, and.__delitem__. But there are two other important takeaways.First, introducing indexing bugs is easy, even in a simple class like
SlowDict. It's easy to miss edge cases like "what happens when a key doesn't exist?" As with any dunder method, extra testing is always a good idea.The second takeaway is that it's very easy to write collections that perform poorly. It's especially important to consider performance and write benchmarks when customizing indexing.