Modern JavaScript: Sets
Welcome to the Sets lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Sometimes, we want to know whether a value is in a list of other values. We can use an array to find out:
>
const names = ['Amir', 'Betty', 'Cindy'];names.includes('Betty');Result:
true
This works, but there's a potential performance problem hiding here. The
.includesmethod loops through the entire array, checking each element. If there are 10,000 elements, it will loop up to 10,000 times.(The technical way to say that is:
.includesis O(n). If there are n elements, then.includeshas to do about n comparisons while checking each one. In some cases it will find the element early in the array and stop. But in the worst case, the element we're looking for is at the end of the array. Then,.includeswill have to check every other element first.)JavaScript's
Setdata type is a better solution to this problem. A JavaScript set is a collection, it's ordered, and it contains only unique values. We'll look at each of those facts in order.First, sets are collections: they contain a bunch of JavaScript values. We can provide some initial values to
Set's constructor, then.addmore values later. We can also ask a set whether it.hasa given value.>
const names = new Set(['Amir', 'Betty', 'Cindy']);names.has('Amir');Result:
true
>
const names = new Set(['Amir', 'Betty', 'Cindy']);names.has('Dalili');Result:
false
>
const names = new Set(['Amir', 'Betty', 'Cindy']);names.add('Dalili');names.has('Dalili');Result:
true
Second, JavaScript sets are ordered. If we examine the elements in the set, they'll always come back in the order that they were inserted.
To get the elements out of the set, we can use the
.valuesmethod. It returns an iterator. Iterators are covered in a different lesson, so we can ignore this for now by converting the iterator into an array withArray.from(someSet).>
const names = new Set(['Amir', 'Betty']);Array.from(names.values());Result:
>
const names = new Set(['Betty', 'Amir']);Array.from(names.values());Result:
['Betty', 'Amir']
Sets are mathematical objects studied in set theory, the modern form of which dates to the 1870s. Most programming languages, including JavaScript, provide a set type with operations that would've been familiar to the set theorist Georg Cantor (1845 - 1918).
Unlike mathematical sets, and unlike other programming languages' sets, JavaScript's sets are ordered, as we saw above. However, this example of JavaScript's "weirdness" is beneficial to us, rather than confusing. Preserving order is almost always a good thing; it's one less uncertainty to track in our heads and in our code.
It's important to keep this difference in mind if you already know other languages, or if you learn other languages in the future. In JavaScript, you can count on a set to remember the elements' insertion order. In other languages, you can't count on that, so elements can come back in any order.
The third property of sets is: they contain only unique values. If we
.adda value that already exists in the set, nothing happens; the set is unchanged.>
const names = new Set(['Amir']);names.add('Betty');names.add('Betty');Array.from(names.values());Result:
['Amir', 'Betty']
Elements can be deleted from a set with the
.deletemethod. And the entire set can be cleared with.clear.>
const names = new Set(['Amir', 'Betty']);names.delete('Betty');Array.from(names.values());Result:
['Amir']
>
const names = new Set(['Amir', 'Betty']);names.clear();Array.from(names.values());Result:
[]
Sets have a
.sizeproperty that returns the number of items in the set.>
const names = new Set(['Amir', 'Betty']);names.size;Result:
2
A set's size reflects the number of unique values that it holds. Duplicates passed to the constructor or added with
.adddon't contribute to the size.>
const names = new Set(['Amir', 'Betty', 'Amir']);names.size;Result:
2
>
const names = new Set(['Amir', 'Betty']);names.add('Betty');names.size;Result:
2
That's everything that a set can do! But the examples so far have been a bit deceptive: we keep converting the set into an array. The improvements we've seen over regular arrays so far are nice. But the real power of sets is that calling
.hasis very fast.An array's
.includesmethod slows down as the array gets larger. Likewise for many other array methods. But sets don't have that problem! A set's.hasmethod is a constant O(1): it always takes the same amount of time regardless of how many elements there are.(As is often the case, that's true in practice, but it's false in the most literal sense. There are situations where
.hascould theoretically perform as badly as an array does. But they're vanishingly rare, so few of us will ever encounter them in our entire careers.)Let's run some benchmarks to see the performance difference in reality:
- We'll use a
benchmarkfunction that calls another function a million times, then returns how many milliseconds those calls took. (We call the function a million times to make sure that it takes a measurable amount of time.) - We'll benchmark an array and a set, each with a million elements.
- We won't benchmark the cost of building the array and the set; we only care about the cost of calling the
.includesand.hasmethods.
- We'll use a
>
function benchmark(f) {const start = new Date().getTime();for (let i=0; i<1000000; i++) {f();}const end = new Date().getTime();return end - start;}/* This is a JavaScript trick that means "make an array of a million random* numbers". These array details are covered in our "JavaScript Arrays"* course. */const largeArray = new Array(1000000).fill(undefined).map(() => Math.random());const largeSet = new Set(largeArray);// Benchmark the array's .includes method vs. the set's .has method.console.log(benchmark(() => largeArray.includes(5)));console.log(benchmark(() => largeSet.has(5)));Result:
Unlike the other examples in this course, we didn't actually run the one above in your browser. Computer speed varies, so we can't know exactly what times will come out on your computer.
But those numbers, 12358 and 12, are the actual benchmarked times, in milliseconds, that we got on our machine. The
.includesmethod took 1,030 times as long as.has. In a large computation, 1,030x is the difference between waiting 1 second vs. 17 minutes!Now let's show that a set's
.hasmethod is always fast, regardless of the number of elements. The next example reuses thebenchmarkfunction definition above, as well as thelargeSetdata. We compare the performance of.hascalled on a 1-elementlonelySetto the performance on the million-elementlargeSet.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
const lonelySet = new Set([1]);[benchmark(() => lonelySet.has(5)),benchmark(() => largeSet.has(5)),];Result:
Calling
.hastook practically the same amount of time for a single-element set that it took for a million-element set!Internally, Execute Program uses sets in a few different places. All of them could be replaced with arrays for a moderate performance penalty, but why take the penalty when sets have such a simple interface?
Here's one use case: Execute Program has thousands of code examples. Every example has an ID, and all example IDs must be unique. To check for that uniqueness, we loop over the example IDs, adding them to a set. If we ever encounter an ID that's already in the set, then we must be seeing it for a second time, so it's a duplicate. We throw an exception to signal that error.
Here's a code problem:
Write a
hasDuplicatesfunction. It should use a set to decide whether an array of numbers has any duplicates. It should returntrueif there are duplicates; otherwise it should returnfalse.There are multiple ways to solve this problem with sets. Here's one possible solution:
- Begin with an empty set.
- Loop over the array. Check whether each number is in the set.
- If it's in the set, then we must have seen it earlier in the array, which means it's a duplicate, so we can return
trueimmediately. - If we get to the end of the array, then we never saw any number twice, and can return
false.
function hasDuplicates(numbers) {const set = new Set();for (const n of numbers) {if (set.has(n)) {return true;}set.add(n);}return false;}[hasDuplicates([]),hasDuplicates([1, 2, 3]),hasDuplicates([10, 20, 20]),hasDuplicates([100, 200, 300, 100]),];- Goal:
[false, false, true, true]
- Yours:
[false, false, true, true]
To recap: sets are a data type introduced to JavaScript in 2015. They store collections of values. Unlike arrays, checking for whether a set contains a value is always very fast, and duplicate values are silently removed.