Python in Detail: Customizing Hashing
Welcome to the Customizing Hashing lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
An earlier lesson showed that the default
.__hash__implementation disappears when we define.__eq__. That ensures that our class doesn't violate one of the hash rules: if two values are equal, their hashes must also be equal. Unfortunately it also makes instances of our class unhashable, so they can't be used as dict keys.We can fix that problem by defining
.__hash__ourselves. It should return an integer, but how do we compute that integer?The most common solution is for
.__hash__to create a tuple containing the instance's attributes, then hash the tuple. Tuples already know how to hash themselves, so this is much easier than writing code to compute a hash from scratch. For a 2DPointclass, this might look likehash((self.x, self.y)). The resulting hash has the properties that we want:- If two instances have equal attributes, then the resulting tuples will be equal, so the tuples' hashes will be equal.
- If two instances have different attributes, then the resulting tuples won't be equal, so their hashes will usually be different. (But in rare cases they may have the same hash purely by chance. This is OK as long as it's infrequent.)
This
Pointclass uses the hashing solution described above:>
class Point:def __init__(self, x, y):self.x = xself.y = ydef __eq__(self, other):if not isinstance(other, Point):return NotImplementedreturn (self.x == other.x) and (self.y == other.y)def __hash__(self):return hash((self.x, self.y))Now two points with the same coordinates also have the same hash, even if the objects aren't identical according to
is.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
p1 = Point(12, 7)hash(p1)Result:
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
p1 = Point(12, 7)p2 = Point(12, 7)hash(p1) == hash(p2)Result:
True
That worked because both
Pointinstances hashed tuples that were equal to each other, and equal tuples always have the same hash.>
t1 = (12, 7)t2 = (12, 7)hash(t1) == hash(t2)Result:
True
Internally, the tuples hash themselves in a similar way: they hash each of their elements separately, then combine those hashes into a single integer. The individual tuple elements are integers, which follow the hash rules, so equal integers always have the same hash.
>
hash(12) == hash(12)Result:
True
Both tuples above combine their elements' hashes in a deterministic way, so both tuples arrive at the same hash value independently. Both
Point.__hash__methods build and hash identical tuples, so both points also arrive at the same hash value independently.There's a hierarchy here:
Point's hash depends ontuple's hash, which depends on the individual integers' hashes. If our class had string attributes instead of integers, then we'd build and hash a tuple of strings. In some cases, we might build and hash a value other than a tuple. But the pattern is the same in almost all Python.__hash__methods: we build some other hashable data type, then hash it.With this
.__hash__implementation, we can finally get the behavior that we want. We can use a point as a dictionary key, then retrieve it using a different point that's equal to the first, but not identical to it.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
treasure_locations = {Point(4, 5): "bronze",Point(12, 7): "silver",Point(8, 23): "gold",} - Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
Point(12, 7) in treasure_locationsResult:
True
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
treasure_locations.get(Point(12, 7))Result:
'silver'
This seems like a small achievement, but it's very important! In a language like JavaScript, we have to do a lot of extra work to achieve this. For example, we might give each
Pointa unique.idvalue, and look it up by its.id.But in Python, the
Pointinstance itself can act as a dict key. If the.__hash__method is correct, the dict will behave exactly as we expect, always giving us the same value for two equal keys.One final point before we move on. We've seen that
.__eq__and.__hash__work closely together, and need to be customized together. That's true of some other dunder methods as well.When we customize
.__add__, we should also write a corresponding.__radd__to make sure thata + bandb + aboth work correctly. Likewise for other mathematical operators, like.__mul__and.__rmul__.When we customize indexing,
.__getitem__,.__setitem__,.__contains__, and.__delitem__all need to agree on whether a given key exists or not. If they don't agree, we can get very strange results. For example,k in some_dictmight returnTrue, butsome_dict[k]might raise aKeyError.It's our responsibility to make sure that dunder methods work together correctly. Python can't do it for us. When classes have a lot of data or complex behavior, this can be very tricky work. It's always a good idea to slow down and be extra careful when defining dunder methods. This is a good place to write extra tests!