Execute Program

Modern JavaScript: Sets

Welcome to the Sets lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • Sometimes, we want to know whether a value is in a list of other values. We can use an array to find out:

  • >
    const names = ['Amir', 'Betty', 'Cindy'];
    names.includes('Betty');
    Result:
    truePass Icon
  • This works, but there's a potential performance problem hiding here. The .includes method loops through the entire array, checking each element. If there are 10,000 elements, it will loop up to 10,000 times.

  • (The technical way to say that is: .includes is O(n). If there are n elements, then .includes has to do about n comparisons while checking each one. In some cases it will find the element early in the array and stop. But in the worst case, the element we're looking for is at the end of the array. Then, .includes will have to check every other element first.)

  • JavaScript's Set data type is a better solution to this problem. A JavaScript set is a collection, it's ordered, and it contains only unique values. We'll look at each of those facts in order.

  • First, sets are collections: they contain a bunch of JavaScript values. We can provide some initial values to Set's constructor, then .add more values later. We can also ask a set whether it .has a given value.

  • >
    const names = new Set(['Amir', 'Betty', 'Cindy']);
    names.has('Amir');
    Result:
    truePass Icon
  • >
    const names = new Set(['Amir', 'Betty', 'Cindy']);
    names.has('Dalili');
    Result:
    falsePass Icon
  • >
    const names = new Set(['Amir', 'Betty', 'Cindy']);
    names.add('Dalili');
    names.has('Dalili');
    Result:
    truePass Icon
  • Second, JavaScript sets are ordered. If we examine the elements in the set, they'll always come back in the order that they were inserted.

  • To get the elements out of the set, we can use the .values method. It returns an iterator. Iterators are covered in a different lesson, so we can ignore this for now by converting the iterator into an array with Array.from(someSet).

  • >
    const names = new Set(['Amir', 'Betty']);
    Array.from(names.values());
    Result:
  • >
    const names = new Set(['Betty', 'Amir']);
    Array.from(names.values());
    Result:
    ['Betty', 'Amir']Pass Icon
  • Sets are mathematical objects studied in set theory, the modern form of which dates to the 1870s. Most programming languages, including JavaScript, provide a set type with operations that would've been familiar to the set theorist Georg Cantor (1845 - 1918).

  • Unlike mathematical sets, and unlike other programming languages' sets, JavaScript's sets are ordered, as we saw above. However, this example of JavaScript's "weirdness" is beneficial to us, rather than confusing. Preserving order is almost always a good thing; it's one less uncertainty to track in our heads and in our code.

  • It's important to keep this difference in mind if you already know other languages, or if you learn other languages in the future. In JavaScript, you can count on a set to remember the elements' insertion order. In other languages, you can't count on that, so elements can come back in any order.

  • The third property of sets is: they contain only unique values. If we .add a value that already exists in the set, nothing happens; the set is unchanged.

  • >
    const names = new Set(['Amir']);
    names.add('Betty');
    names.add('Betty');
    Array.from(names.values());
    Result:
    ['Amir', 'Betty']Pass Icon
  • Elements can be deleted from a set with the .delete method. And the entire set can be cleared with .clear.

  • >
    const names = new Set(['Amir', 'Betty']);
    names.delete('Betty');
    Array.from(names.values());
    Result:
    ['Amir']Pass Icon
  • >
    const names = new Set(['Amir', 'Betty']);
    names.clear();
    Array.from(names.values());
    Result:
    []Pass Icon
  • Sets have a .size property that returns the number of items in the set.

  • >
    const names = new Set(['Amir', 'Betty']);
    names.size;
    Result:
    2Pass Icon
  • A set's size reflects the number of unique values that it holds. Duplicates passed to the constructor or added with .add don't contribute to the size.

  • >
    const names = new Set(['Amir', 'Betty', 'Amir']);
    names.size;
    Result:
    2Pass Icon
  • >
    const names = new Set(['Amir', 'Betty']);
    names.add('Betty');
    names.size;
    Result:
    2Pass Icon
  • That's everything that a set can do! But the examples so far have been a bit deceptive: we keep converting the set into an array. The improvements we've seen over regular arrays so far are nice. But the real power of sets is that calling .has is very fast.

  • An array's .includes method slows down as the array gets larger. Likewise for many other array methods. But sets don't have that problem! A set's .has method is a constant O(1): it always takes the same amount of time regardless of how many elements there are.

  • (As is often the case, that's true in practice, but it's false in the most literal sense. There are situations where .has could theoretically perform as badly as an array does. But they're vanishingly rare, so few of us will ever encounter them in our entire careers.)

  • Let's run some benchmarks to see the performance difference in reality:

    • We'll use a benchmark function that calls another function a million times, then returns how many milliseconds those calls took. (We call the function a million times to make sure that it takes a measurable amount of time.)
    • We'll benchmark an array and a set, each with a million elements.
    • We won't benchmark the cost of building the array and the set; we only care about the cost of calling the .includes and .has methods.
  • >
    function benchmark(f) {
    const start = new Date().getTime();
    for (let i=0; i<1000000; i++) {
    f();
    }
    const end = new Date().getTime();
    return end - start;
    }

    /* This is a JavaScript trick that means "make an array of a million random
    * numbers". These array details are covered in our "JavaScript Arrays"
    * course. */
    const largeArray = new Array(1000000).fill(undefined).map(() => Math.random());
    const largeSet = new Set(largeArray);

    // Benchmark the array's .includes method vs. the set's .has method.
    console.log(benchmark(() => largeArray.includes(5)));
    console.log(benchmark(() => largeSet.has(5)));
    Result:
  • Unlike the other examples in this course, we didn't actually run the one above in your browser. Computer speed varies, so we can't know exactly what times will come out on your computer.

  • But those numbers, 12358 and 12, are the actual benchmarked times, in milliseconds, that we got on our machine. The .includes method took 1,030 times as long as .has. In a large computation, 1,030x is the difference between waiting 1 second vs. 17 minutes!

  • Now let's show that a set's .has method is always fast, regardless of the number of elements. The next example reuses the benchmark function definition above, as well as the largeSet data. We compare the performance of .has called on a 1-element lonelySet to the performance on the million-element largeSet.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    const lonelySet = new Set([1]);
    [
    benchmark(() => lonelySet.has(5)),
    benchmark(() => largeSet.has(5)),
    ];
    Result:
  • Calling .has took practically the same amount of time for a single-element set that it took for a million-element set!

  • Internally, Execute Program uses sets in a few different places. All of them could be replaced with arrays for a moderate performance penalty, but why take the penalty when sets have such a simple interface?

  • Here's one use case: Execute Program has thousands of code examples. Every example has an ID, and all example IDs must be unique. To check for that uniqueness, we loop over the example IDs, adding them to a set. If we ever encounter an ID that's already in the set, then we must be seeing it for a second time, so it's a duplicate. We throw an exception to signal that error.

  • Here's a code problem:

    Write a hasDuplicates function. It should use a set to decide whether an array of numbers has any duplicates. It should return true if there are duplicates; otherwise it should return false.

    There are multiple ways to solve this problem with sets. Here's one possible solution:

    1. Begin with an empty set.
    2. Loop over the array. Check whether each number is in the set.
    3. If it's in the set, then we must have seen it earlier in the array, which means it's a duplicate, so we can return true immediately.
    4. If we get to the end of the array, then we never saw any number twice, and can return false.
    function hasDuplicates(numbers) {
    const set = new Set();
    for (const n of numbers) {
    if (set.has(n)) {
    return true;
    }
    set.add(n);
    }
    return false;
    }
    [
    hasDuplicates([]),
    hasDuplicates([1, 2, 3]),
    hasDuplicates([10, 20, 20]),
    hasDuplicates([100, 200, 300, 100]),
    ];
    Goal:
    [false, false, true, true]
    Yours:
    [false, false, true, true]Pass Icon
  • To recap: sets are a data type introduced to JavaScript in 2015. They store collections of values. Unlike arrays, checking for whether a set contains a value is always very fast, and duplicate values are silently removed.