Python in Detail: Hashing and Equality
Welcome to the Hashing and Equality lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Like many other built-in functions, Python's
hash(x)calls a dunder method:x.__hash__(). That lets us customize the.__hash__method. But why would we need to customize.__hash__in the first place?To see why, let's compare the behavior of two similar classes. The regular
Userclass stores a name attribute, whileComparableUseradds an.__eq__method.>
class User:def __init__(self, name):self.name = nameclass ComparableUser:def __init__(self, name):self.name = namedef __eq__(self, other):return self.name == other.nameOur code didn't mention hashing at all. But
Userinstances are hashable, whileComparableUserinstances aren't!- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
hash(User("Amir"))Result:
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
hash(ComparableUser("Amir"))Result:
This has many implications. The most important implication is:
Userinstances can be dict keys, butComparableUserinstances can't.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
amir = User("Amir")ages = {amir: 36}ages[amir]Result:
36
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
amir = ComparableUser("Amir")ages = {amir: 36}ages[amir]Result:
TypeError: unhashable type: 'ComparableUser'
All of this seems strange at first. Why would defining the
.__eq__method make instances unhashable, and why would that disallow them as dict keys? We'll tackle those questions in order.First, why does defining
.__eq__make instances unhashable? This happens because of a rule we saw in an earlier lesson: if two objects are equal, then their hashes must also be equal. Ifx == y, thenhash(x) == hash(y).In
ComparableUser, we replaced the default.__eq__implementation, but we didn't replace the default.__hash__implementation to match it. Now, two values might have different hashes, but still be considered equal due to our custom.__eq__! Python knows that we might have just violated the hash rules, so it makes the object unhashable as a safety precaution.We can see that by looking at the
.__hash__methods directly:Userhas one, butComparableUserdoesn't.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
User("Amir").__hash__ is NoneResult:
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
ComparableUser("Amir").__hash__ is NoneResult:
Now the second question: why do dicts require hashable keys?
Dicts make extensive use of hashes as an internal performance optimization. When we write to a dict key, the dict starts by hashing the key. It uses the hash to quickly find a short list of existing keys that might be equal to the new key. Then it does a full
==comparison on that short list to be sure that it has exactly the right key. This two-step process is much faster than using==to compare the new key against every existing key.In technical terms: dicts are backed by hash tables. They put each key in a hash bucket based on that key's hash. It's OK if you're not familiar with hash tables and buckets; this course doesn't require any knowledge of them! If you'd like to learn more, hash tables are covered in any introductory data structures course.
This might feel a bit abstract, so let's see an example.
Below, we define a
.__hash__method that's wrong: it returns a random integer. This is a clear violation of one of the hash rules: "If two objects are equal, then their hashes must also be equal.">
import randomclass IncorrectUser:def __init__(self, name):self.name = namedef __eq__(self, other):return self.name == other.namedef __hash__(self):return random.randint(1, 10**9)# We define this method to get a nicer error message below.def __repr__(self):return f"<User {repr(self.name)}>"ages = {}amir = IncorrectUser("Amir")ages[amir] = 36amir in agesResult:
It looks like the user object isn't in the dict, even though we just used that exact user object as a key. That happens because our user doesn't have a consistent hash.
The practical takeaway from all of this is: if we customize
.__eq__, and we want to use our instances as dict keys, then we should also customize.__hash__. Both methods must work together to follow the hash rules: if two objects are equal, they have the same hash.Finally, why does Python force keys to be hashable at all? Other languages avoid this complexity. For example, JavaScript's
Maptype is similar to Python'sdict: it maps keys to values, and it allows any data type as a key. But JavaScript maps use identity, not equality, to compare keys.This part of JavaScript's design can lead to some unexpected behavior. Here's some JavaScript code showing the problem with
Map. Each line is translated into equivalent Python code.(The JavaScript code returns
undefined, which is similar to Python'sNone.)>
// Python: amir = {"name": "Amir"}amir = {'name': 'Amir'}// Python: ages = {amir: 36}ages = new Map([[amir, 36]])// Python: ages[{'name': 'Amir'}]ages.get({'name': 'Amir'})Result:
The object
{'name': 'Amir'}seems like it should be in theagesmap, but it's not. This happens because we created two different{'name': 'Amir'}objects, and they have different identities, so in JavaScript they're different keys.JavaScript's behavior isn't "wrong". The two languages have different designs, and they make different trade-offs. JavaScript's behavior is easier to explain, and it doesn't require all of Python's complexity regarding hashes. But the trade-off is that JavaScript's maps (and objects) are much less flexible than Python's dicts.
In Python, we can use any hashable value as a dict key with little risk, even in complex cases. If we use an unhashable value, Python will immediately raise an exception, so we'll find out about our mistake. If we use a hashable value like a tuple as a key, it will work in a very intuitive way: all equal dict keys map to the same value.
We've now seen what hashes are, how they interact with equality, and how they interact with dicts. In a future lesson, we'll define a custom
.__hash__method. That will let us use our custom classes as dict keys while also customizing their equality behavior.