Python in Detail: Sets
Welcome to the Sets lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Python has built-in support for the set data structure. Sets are somewhat similar to lists, in that they're both collections of values. But there are some important differences.
We can build a set with
set(some_iterable), similar to thelist(some_iterable)call we've seen before. All values inside the iterable become elements of the set. We can use theinoperator to check for whether a value is in a set.>
numbers = set([1, 2, 3])- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
1 in numbersResult:
True
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
4 in numbersResult:
False
Set elements don't have an order. Trying to index into the set is an error, because there is no "0th" or "55th" element.
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
numbers[1]Result:
TypeError: 'set' object is not subscriptable
Python has specialized syntax for literal sets:
{a, b, c}is equivalent toset([a, b, c]). This is similar to dictionary syntax, but without the:characters that separate dict keys from their values.>
people = {"Amir", "Betty", "Cindy"}"Amir" in peopleResult:
True
>
set([1, 2, 3]) == {1, 2, 3}Result:
True
This curly brace syntax also works when answering Execute Program's code examples.
>
{"Amir", "Betty"}Result:
{'Amir', 'Betty'}Like lists, sets are mutable: we can change them after they're built. For example, we can add elements with
.add.>
cats = {"Keanu"}cats.add("Ms. Fluff")catsResult:
{'Keanu', 'Ms. Fluff'}len(some_set)returns the number of elements in the set, like it does for lists and dicts.>
cats = {"Keanu", "Ms. Fluff"}len(cats)Result:
2
Unlike lists, sets only store unique values. If we add a value that already exists in the set, nothing happens. The set doesn't change, but there's also no error. The
.addcall is ignored.>
cats = {"Keanu", "Ms. Fluff"}cats.add("Keanu")len(cats)Result:
2
The same is true when we use the set literal syntax. We can write set syntax with duplicate values, but those duplicates won't exist in the final set. Again, there's no error.
>
some_set = {1, 2, 2, 2, 2, 2, 2, 2, 2, 2}some_setResult:
{1, 2}(However, some linters will reject the code above, and for good reason. The duplicate values have no effect, so they're almost certainly mistakes. These unused values may trick other programmers who read the code later. For example, they make the set look bigger than it really is.)
Two sets are equal if their values are equal, regardless of how those values got there. Insertion order doesn't matter.
>
{1} == {1}Result:
True
>
set_a = {1, 2, 3}set_a.remove(2)set_b = {1}set_b.add(3)set_a == set_bResult:
True
>
{1, 2} == {2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2}Result:
True
Deciding whether a value is
ina set is much faster than deciding whether it'sina list. In fact, that's the most common reason for using sets! Let's do a benchmark to see how much faster it really is.>
from time import timethe_list = list(range(100000))the_set = set(range(100000))# Do the list lookup many times so we get a reliable timing.start = time()for _ in range(2000):50000 in the_listlist_runtime = time() - start# Do the set lookup the same number of times.start = time()for _ in range(2000):50000 in the_setset_runtime = time() - start(list_runtime, set_runtime)Result:
(Similar to other benchmarks in this course, we hard-coded the runtimes that we got while writing this lesson. You'll see different numbers if you run this code.)
In this case, the list took about 1,662 times longer than the set. This is a massive performance difference!
The longer the list is, the bigger the performance gap gets. For lists,
inslows down linearly with list size: when we double the length of the list, eachincheck takes twice as long. But for sets,inperformance is approximately constant: it takes about the same amount of time no matter how large the set is.Here's a demonstration of that: we benchmark a set with 1 element and a set with 1,000,000 elements.
>
from time import timesmall_set = set(range(1))large_set = set(range(1000000))# Do the list lookup many times on the small set.start = time()for _ in range(2000):50000000 in small_setsmall_set_runtime = time() - start# Do the lookup again on the large set.start = time()for _ in range(2000):50000000 in large_setlarge_set_runtime = time() - start(small_set_runtime, large_set_runtime)Result:
Once again, these are the hard-coded results we got while writing this lesson. But note that the 1,000,000-element list happened to be slightly faster by chance!
Some problems can be solved quickly with sets, but are prohibitively slow with lists. They're a very powerful tool!
These are the basics of sets. Sets are relatively simple on the surface, but there are also some subtle details. We'll explore these subtleties in future lessons.