Execute Program

Python in Detail: Sets

Welcome to the Sets lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • Python has built-in support for the set data structure. Sets are somewhat similar to lists, in that they're both collections of values. But there are some important differences.

  • We can build a set with set(some_iterable), similar to the list(some_iterable) call we've seen before. All values inside the iterable become elements of the set. We can use the in operator to check for whether a value is in a set.

  • >
    numbers = set([1, 2, 3])
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    1 in numbers
    Result:
    TruePass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    4 in numbers
    Result:
    FalsePass Icon
  • Set elements don't have an order. Trying to index into the set is an error, because there is no "0th" or "55th" element.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    numbers[1]
    Result:
    TypeError: 'set' object is not subscriptablePass Icon
  • Python has specialized syntax for literal sets: {a, b, c} is equivalent to set([a, b, c]). This is similar to dictionary syntax, but without the : characters that separate dict keys from their values.

  • >
    people = {"Amir", "Betty", "Cindy"}
    "Amir" in people
    Result:
    TruePass Icon
  • >
    set([1, 2, 3]) == {1, 2, 3}
    Result:
    TruePass Icon
  • This curly brace syntax also works when answering Execute Program's code examples.

  • >
    {"Amir", "Betty"}
    Result:
    {'Amir', 'Betty'}Pass Icon
  • Like lists, sets are mutable: we can change them after they're built. For example, we can add elements with .add.

  • >
    cats = {"Keanu"}
    cats.add("Ms. Fluff")
    cats
    Result:
    {'Keanu', 'Ms. Fluff'}Pass Icon
  • len(some_set) returns the number of elements in the set, like it does for lists and dicts.

  • >
    cats = {"Keanu", "Ms. Fluff"}
    len(cats)
    Result:
    2Pass Icon
  • Unlike lists, sets only store unique values. If we add a value that already exists in the set, nothing happens. The set doesn't change, but there's also no error. The .add call is ignored.

  • >
    cats = {"Keanu", "Ms. Fluff"}
    cats.add("Keanu")
    len(cats)
    Result:
    2Pass Icon
  • The same is true when we use the set literal syntax. We can write set syntax with duplicate values, but those duplicates won't exist in the final set. Again, there's no error.

  • >
    some_set = {1, 2, 2, 2, 2, 2, 2, 2, 2, 2}
    some_set
    Result:
    {1, 2}Pass Icon
  • (However, some linters will reject the code above, and for good reason. The duplicate values have no effect, so they're almost certainly mistakes. These unused values may trick other programmers who read the code later. For example, they make the set look bigger than it really is.)

  • Two sets are equal if their values are equal, regardless of how those values got there. Insertion order doesn't matter.

  • >
    {1} == {1}
    Result:
    TruePass Icon
  • >
    set_a = {1, 2, 3}
    set_a.remove(2)
    set_b = {1}
    set_b.add(3)
    set_a == set_b
    Result:
    TruePass Icon
  • >
    {1, 2} == {2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2}
    Result:
    TruePass Icon
  • Deciding whether a value is in a set is much faster than deciding whether it's in a list. In fact, that's the most common reason for using sets! Let's do a benchmark to see how much faster it really is.

  • >
    from time import time

    the_list = list(range(100000))
    the_set = set(range(100000))

    # Do the list lookup many times so we get a reliable timing.
    start = time()
    for _ in range(2000):
    50000 in the_list
    list_runtime = time() - start

    # Do the set lookup the same number of times.
    start = time()
    for _ in range(2000):
    50000 in the_set
    set_runtime = time() - start

    (list_runtime, set_runtime)
    Result:
  • (Similar to other benchmarks in this course, we hard-coded the runtimes that we got while writing this lesson. You'll see different numbers if you run this code.)

  • In this case, the list took about 1,662 times longer than the set. This is a massive performance difference!

  • The longer the list is, the bigger the performance gap gets. For lists, in slows down linearly with list size: when we double the length of the list, each in check takes twice as long. But for sets, in performance is approximately constant: it takes about the same amount of time no matter how large the set is.

  • Here's a demonstration of that: we benchmark a set with 1 element and a set with 1,000,000 elements.

  • >
    from time import time

    small_set = set(range(1))
    large_set = set(range(1000000))

    # Do the list lookup many times on the small set.
    start = time()
    for _ in range(2000):
    50000000 in small_set
    small_set_runtime = time() - start

    # Do the lookup again on the large set.
    start = time()
    for _ in range(2000):
    50000000 in large_set
    large_set_runtime = time() - start

    (small_set_runtime, large_set_runtime)
    Result:
  • Once again, these are the hard-coded results we got while writing this lesson. But note that the 1,000,000-element list happened to be slightly faster by chance!

  • Some problems can be solved quickly with sets, but are prohibitively slow with lists. They're a very powerful tool!

  • These are the basics of sets. Sets are relatively simple on the surface, but there are also some subtle details. We'll explore these subtleties in future lessons.