Execute Program

Python in Detail: Customizing Hashing

Welcome to the Customizing Hashing lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • An earlier lesson showed that the default .__hash__ implementation disappears when we define .__eq__. That ensures that our class doesn't violate one of the hash rules: if two values are equal, their hashes must also be equal. Unfortunately it also makes instances of our class unhashable, so they can't be used as dict keys.

  • We can fix that problem by defining .__hash__ ourselves. It should return an integer, but how do we compute that integer?

  • The most common solution is for .__hash__ to create a tuple containing the instance's attributes, then hash the tuple. Tuples already know how to hash themselves, so this is much easier than writing code to compute a hash from scratch. For a 2D Point class, this might look like hash((self.x, self.y)). The resulting hash has the properties that we want:

    • If two instances have equal attributes, then the resulting tuples will be equal, so the tuples' hashes will be equal.
    • If two instances have different attributes, then the resulting tuples won't be equal, so their hashes will usually be different. (But in rare cases they may have the same hash purely by chance. This is OK as long as it's infrequent.)
  • This Point class uses the hashing solution described above:

  • >
    class Point:
    def __init__(self, x, y):
    self.x = x
    self.y = y

    def __eq__(self, other):
    if not isinstance(other, Point):
    return NotImplemented
    return (self.x == other.x) and (self.y == other.y)

    def __hash__(self):
    return hash((self.x, self.y))
  • Now two points with the same coordinates also have the same hash, even if the objects aren't identical according to is.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    p1 = Point(12, 7)
    hash(p1)
    Result:
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    p1 = Point(12, 7)
    p2 = Point(12, 7)
    hash(p1) == hash(p2)
    Result:
    TruePass Icon
  • That worked because both Point instances hashed tuples that were equal to each other, and equal tuples always have the same hash.

  • >
    t1 = (12, 7)
    t2 = (12, 7)
    hash(t1) == hash(t2)
    Result:
    TruePass Icon
  • Internally, the tuples hash themselves in a similar way: they hash each of their elements separately, then combine those hashes into a single integer. The individual tuple elements are integers, which follow the hash rules, so equal integers always have the same hash.

  • >
    hash(12) == hash(12)
    Result:
    TruePass Icon
  • Both tuples above combine their elements' hashes in a deterministic way, so both tuples arrive at the same hash value independently. Both Point.__hash__ methods build and hash identical tuples, so both points also arrive at the same hash value independently.

  • There's a hierarchy here: Point's hash depends on tuple's hash, which depends on the individual integers' hashes. If our class had string attributes instead of integers, then we'd build and hash a tuple of strings. In some cases, we might build and hash a value other than a tuple. But the pattern is the same in almost all Python .__hash__ methods: we build some other hashable data type, then hash it.

  • With this .__hash__ implementation, we can finally get the behavior that we want. We can use a point as a dictionary key, then retrieve it using a different point that's equal to the first, but not identical to it.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    treasure_locations = {
    Point(4, 5): "bronze",
    Point(12, 7): "silver",
    Point(8, 23): "gold",
    }
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    Point(12, 7) in treasure_locations
    Result:
    TruePass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    treasure_locations.get(Point(12, 7))
    Result:
    'silver'Pass Icon
  • This seems like a small achievement, but it's very important! In a language like JavaScript, we have to do a lot of extra work to achieve this. For example, we might give each Point a unique .id value, and look it up by its .id.

  • But in Python, the Point instance itself can act as a dict key. If the .__hash__ method is correct, the dict will behave exactly as we expect, always giving us the same value for two equal keys.

  • One final point before we move on. We've seen that .__eq__ and .__hash__ work closely together, and need to be customized together. That's true of some other dunder methods as well.

  • When we customize .__add__, we should also write a corresponding .__radd__ to make sure that a + b and b + a both work correctly. Likewise for other mathematical operators, like .__mul__ and .__rmul__.

  • When we customize indexing, .__getitem__, .__setitem__, .__contains__, and .__delitem__ all need to agree on whether a given key exists or not. If they don't agree, we can get very strange results. For example, k in some_dict might return True, but some_dict[k] might raise a KeyError.

  • It's our responsibility to make sure that dunder methods work together correctly. Python can't do it for us. When classes have a lot of data or complex behavior, this can be very tricky work. It's always a good idea to slow down and be extra careful when defining dunder methods. This is a good place to write extra tests!