Execute Program

Python in Detail: Sets in Practice

Welcome to the Sets in Practice lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • In earlier lessons, we saw sets and the four common operations: union, intersection, difference, and symmetric difference. Sets and their operations were first introduced by Georg Cantor in the late 1800s, and are widely used in mathematics.

  • Sets are also common in software, especially in algorithmic code. For example, the A* search algorithm or Kahn's topological sorting algorithm both use sets. Execute Program's source code also uses sets to manage its course and lesson data.

  • When a structure exists in both pure mathematics and in programming, there's usually a significant disconnect between the two. Mathematicians don't have to worry about CPU performance, running out of memory, or what happens when we mutate a value that's inside of a set. For programmers, these concerns are important, and Python's sets are no exception. In this lesson, we'll look at three practical concerns with sets, all of which can lead to problems in real code.

  • The first concern: in Python, set elements must be hashable. We already saw this constraint for dictionary keys.

  • >
    some_list = [1]
    some_set = {some_list}
    some_set
    Result:
    TypeError: unhashable type: 'list'Pass Icon
  • Sets aren't hashable, but they can only contain hashable values. That means that sets can't contain other sets.

  • >
    some_set = {1}
    set_of_sets = {some_set}
    set_of_sets
    Result:
    TypeError: unhashable type: 'set'Pass Icon
  • (Fortunately, we'll see a workaround for this in another lesson.)

  • Integers, floats, strings, and tuples are all hashable because they're immutable (they can't be changed once we create them). All of these values are allowed in sets.

  • >
    some_set = {1, 2.5, "a", (10, 20)}
    some_set
    Result:
  • The second concern: sometimes sets interact surprisingly with iterables, especially strings. Strings are hashable, so it's fine for a set to contain strings. But strings are also iterable, and set(some_iterable) makes a set out of the iterable's contents. This leads to a common mistake: if we pass a string directly to set, we get a set containing the individual characters from the string.

  • >
    some_set = set("Keanu")
    len(some_set)
    Result:
  • >
    some_set = set("Keanu")
    some_set
    Result:
    {'K', 'e', 'a', 'n', 'u'}Pass Icon
  • We can avoid this issue by passing a list: set(["Keanu"]). But when possible, the set literal syntax is generally preferable.

  • >
    some_set = {"Keanu"}
    len(some_set)
    Result:
    1Pass Icon
  • The third concern: sometimes sets look like they have an order, even though they don't. In CPython (the most popular implementation of Python, and the one we use in this course), sets containing only small integers tend to end up in numerical order. That is, sets of small integers appear to be sorted.

  • >
    list({5, 3, 1})
    Result:
  • This is an implementation detail of the CPython runtime. It only applies to small integers, and it can change at any time in the future. If we use larger integers, they no longer appear to be sorted.

  • >
    list({5000, 3000, 1000})
    Result:
  • In practice, you should never rely on the order of sets, even if it looks like the order is consistent! Assume that some_set.pop() returns a random element, and that for value in some_set iterates in random order. That will prevent some very difficult bugs. When you need ordering, you can always use a list instead.