Execute Program

Python in Detail: Hashing and Equality

Welcome to the Hashing and Equality lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • Like many other built-in functions, Python's hash(x) calls a dunder method: x.__hash__(). That lets us customize the .__hash__ method. But why would we need to customize .__hash__ in the first place?

  • To see why, let's compare the behavior of two similar classes. The regular User class stores a name attribute, while ComparableUser adds an .__eq__ method.

  • >
    class User:
    def __init__(self, name):
    self.name = name

    class ComparableUser:
    def __init__(self, name):
    self.name = name

    def __eq__(self, other):
    return self.name == other.name
  • Our code didn't mention hashing at all. But User instances are hashable, while ComparableUser instances aren't!

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    hash(User("Amir"))
    Result:
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    hash(ComparableUser("Amir"))
    Result:
  • This has many implications. The most important implication is: User instances can be dict keys, but ComparableUser instances can't.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    amir = User("Amir")
    ages = {
    amir: 36
    }
    ages[amir]
    Result:
    36Pass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    amir = ComparableUser("Amir")
    ages = {
    amir: 36
    }
    ages[amir]
    Result:
    TypeError: unhashable type: 'ComparableUser'Pass Icon
  • All of this seems strange at first. Why would defining the .__eq__ method make instances unhashable, and why would that disallow them as dict keys? We'll tackle those questions in order.

  • First, why does defining .__eq__ make instances unhashable? This happens because of a rule we saw in an earlier lesson: if two objects are equal, then their hashes must also be equal. If x == y, then hash(x) == hash(y).

  • In ComparableUser, we replaced the default .__eq__ implementation, but we didn't replace the default .__hash__ implementation to match it. Now, two values might have different hashes, but still be considered equal due to our custom .__eq__! Python knows that we might have just violated the hash rules, so it makes the object unhashable as a safety precaution.

  • We can see that by looking at the .__hash__ methods directly: User has one, but ComparableUser doesn't.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    User("Amir").__hash__ is None
    Result:
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    ComparableUser("Amir").__hash__ is None
    Result:
  • Now the second question: why do dicts require hashable keys?

  • Dicts make extensive use of hashes as an internal performance optimization. When we write to a dict key, the dict starts by hashing the key. It uses the hash to quickly find a short list of existing keys that might be equal to the new key. Then it does a full == comparison on that short list to be sure that it has exactly the right key. This two-step process is much faster than using == to compare the new key against every existing key.

  • In technical terms: dicts are backed by hash tables. They put each key in a hash bucket based on that key's hash. It's OK if you're not familiar with hash tables and buckets; this course doesn't require any knowledge of them! If you'd like to learn more, hash tables are covered in any introductory data structures course.

  • This might feel a bit abstract, so let's see an example.

  • Below, we define a .__hash__ method that's wrong: it returns a random integer. This is a clear violation of one of the hash rules: "If two objects are equal, then their hashes must also be equal."

  • >
    import random

    class IncorrectUser:
    def __init__(self, name):
    self.name = name

    def __eq__(self, other):
    return self.name == other.name

    def __hash__(self):
    return random.randint(1, 10**9)

    # We define this method to get a nicer error message below.
    def __repr__(self):
    return f"<User {repr(self.name)}>"

    ages = {}
    amir = IncorrectUser("Amir")
    ages[amir] = 36
    amir in ages
    Result:
  • It looks like the user object isn't in the dict, even though we just used that exact user object as a key. That happens because our user doesn't have a consistent hash.

  • The practical takeaway from all of this is: if we customize .__eq__, and we want to use our instances as dict keys, then we should also customize .__hash__. Both methods must work together to follow the hash rules: if two objects are equal, they have the same hash.

  • Finally, why does Python force keys to be hashable at all? Other languages avoid this complexity. For example, JavaScript's Map type is similar to Python's dict: it maps keys to values, and it allows any data type as a key. But JavaScript maps use identity, not equality, to compare keys.

  • This part of JavaScript's design can lead to some unexpected behavior. Here's some JavaScript code showing the problem with Map. Each line is translated into equivalent Python code.

  • (The JavaScript code returns undefined, which is similar to Python's None.)

  • >
    // Python: amir = {"name": "Amir"}
    amir = {'name': 'Amir'}

    // Python: ages = {amir: 36}
    ages = new Map([[amir, 36]])

    // Python: ages[{'name': 'Amir'}]
    ages.get({'name': 'Amir'})
    Result:
  • The object {'name': 'Amir'} seems like it should be in the ages map, but it's not. This happens because we created two different {'name': 'Amir'} objects, and they have different identities, so in JavaScript they're different keys.

  • JavaScript's behavior isn't "wrong". The two languages have different designs, and they make different trade-offs. JavaScript's behavior is easier to explain, and it doesn't require all of Python's complexity regarding hashes. But the trade-off is that JavaScript's maps (and objects) are much less flexible than Python's dicts.

  • In Python, we can use any hashable value as a dict key with little risk, even in complex cases. If we use an unhashable value, Python will immediately raise an exception, so we'll find out about our mistake. If we use a hashable value like a tuple as a key, it will work in a very intuitive way: all equal dict keys map to the same value.

  • We've now seen what hashes are, how they interact with equality, and how they interact with dicts. In a future lesson, we'll define a custom .__hash__ method. That will let us use our custom classes as dict keys while also customizing their equality behavior.