Execute Program

SQL: ON vs. WHERE

Welcome to the ON vs. WHERE lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • For this lesson, our goal is: find the pairs of cats and owners where the person's name is the same as the cat's name.

  • (All of the examples in this lesson will use this database, but it will always be empty at the beginning of each example.)

  • >
    exec(`
    CREATE TABLE people (
    id INTEGER PRIMARY KEY NOT NULL,
    name TEXT NOT NULL
    );
    CREATE TABLE cats (
    id INTEGER PRIMARY KEY NOT NULL,
    owner_id INTEGER NOT NULL REFERENCES people(id),
    name TEXT NOT NULL
    );
    -- Wilford has a cat named Wilford.
    INSERT INTO people (id, name) VALUES (300, 'Wilford');
    INSERT INTO cats (owner_id, name) VALUES (300, 'Wilford');
    -- Cindy has no cats.
    INSERT INTO people (id, name) VALUES (200, 'Cindy');
    `);
  • First, we'll do the query in a straightforward way, using both an ON and a WHERE. We use the ON to match people and cats using the owner_id foreign key. We use the WHERE to select only the person-cat pairs where the owner's name is the same as the cat's name. (Note the AS column aliases.)

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    SELECT
    people.name AS person_name,
    cats.name AS cat_name
    FROM people
    INNER JOIN cats
    ON people.id = cats.owner_id
    WHERE people.name = cats.name
    `);
    Result:
    [{person_name: 'Wilford', cat_name: 'Wilford'}]Pass Icon
  • If we like, we can move the people.name = cats.name check into the ON, eliminating the WHERE.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    SELECT
    people.name AS person_name,
    cats.name AS cat_name
    FROM people
    INNER JOIN cats
    ON people.id = cats.owner_id
    AND people.name = cats.name
    `);
    Result:
    [{person_name: 'Wilford', cat_name: 'Wilford'}]Pass Icon
  • The cat's name isn't a foreign key to the person's name, but that's OK; we can still compare those columns in the ON. Databases don't limit what we put in an ON, just like they don't limit what we put in a WHERE. If the columns exist, the database will let us join on them.

  • We can also convert the entire ON into a WHERE.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    SELECT
    people.name AS person_name,
    cats.name AS cat_name
    FROM people
    INNER JOIN cats
    WHERE people.id = cats.owner_id
    AND people.name = cats.name
    `);
    Result:
    [{person_name: 'Wilford', cat_name: 'Wilford'}]Pass Icon
  • Conceptually, this version is computing every combination of person and cat, then filtering that giant list:

    • For each person:
      • For each cat:
        • Produce a new temporary row that contains this person's columns with this cat's columns.
    • For each of those combinations of every person with every cat:
      • If the cat's owner_id matches the person's id:
        • And the cat's name matches the person's name:
          • Include this combination of person and cat in the final results.
  • As always, the mental model above is correct, but the database will intelligently optimize the query to make it faster. (If we have 10,000 people and 10,000 cats, the database will NOT produce a temporary list of 100,000,000 rows.)

  • All three of the queries above are truly equivalent: an ON mixed with a WHERE; an ON only; and a WHERE only. So why bother with ON at all? Why not always use WHERE? Let's analyze that question along three dimensions: performance, correctness, and clarity for human readers.

  • First: performance. Modern databases are very good at optimizing ON and WHERE. You're unlikely to encounter a performance problem that's fixed by converting an ON to a WHERE or vice-versa. So performance isn't a reason to choose one or the other.

  • Second: correctness. For inner joins (the common type of "simple" joins that we're using here), ON and WHERE are interchangeable. Your query will never be wrong if you convert one to the other. However, ON and WHERE are NOT interchangeable for any other type of join. In LEFT joins, RIGHT joins, and other more rare types that we don't cover in this course, like OUTER, converting an ON to a WHERE can change the results.

  • Third: clarity for human readers. An ON should specify which rows from the left table go with which rows from the right table. Here are some examples of good ONs. They concern the relationship between the two tables being joined.

    • Matching the left table's foreign key against the column that it references in the right table.
    • Matching email addresses in an invitations_sent table against email addresses in our users table to find out what percentage of invited users eventually register (assuming that there's no proper foreign key between the tables).
  • Here are some examples of conditions that are better in a WHERE than an ON. They're not about the two tables' relationships.

    • Selecting a range of rows (like WHERE created_at > /* some date here */).
    • Selecting a specific row (like WHERE user_id = ? or WHERE cats.name = 'Ms. Fluff').
  • Separating ON conditions from WHERE conditions helps with both correctness and clarity. It makes mistakes with rare types of joins less likely, where converting an ON to a WHERE will give different results. It also gives readers a hint about whether the condition concerns combinations of left and right rows.

  • Here's a code problem:

    Find the pairs of cats and people where the person's name is the same as the cat's name. You can use an ON or a WHERE; either works.

    A quick note about our expected output. Wilford owns a cat named Wilford, so they should be included in the results. There's also a person named Cindy, and a cat named Cindy, but Cindy-the-person doesn't own Cindy-the-cat. For that reason, Cindy and Cindy should not appear in the output.

    exec(`
    CREATE TABLE people (
    id INTEGER PRIMARY KEY NOT NULL,
    name TEXT NOT NULL
    );
    CREATE TABLE cats (
    id INTEGER PRIMARY KEY NOT NULL,
    owner_id INTEGER NOT NULL REFERENCES people(id),
    name TEXT NOT NULL
    );

    -- Wilford has a cat named Wilford.
    INSERT INTO people (id, name) VALUES (100, 'Wilford');
    INSERT INTO cats (owner_id, name) VALUES (100, 'Wilford');
    -- Cindy has no cats.
    INSERT INTO people (id, name) VALUES (200, 'Cindy');
    -- Dalili has a cat named Cindy. But Cindy shouldn't appear in the
    -- results because Cindy-the-cat isn't owned by Cindy-the-person.
    INSERT INTO people (id, name) VALUES (300, 'Dalili');
    INSERT INTO cats (owner_id, name) VALUES (300, 'Cindy');
    SELECT
    people.name AS person_name,
    cats.name AS cat_name
    FROM people
    INNER JOIN cats
    ON people.name = cats.name
    AND people.id = cats.owner_id
    `);
    Goal:
    [{person_name: 'Wilford', cat_name: 'Wilford'}]
    Yours:
    [{person_name: 'Wilford', cat_name: 'Wilford'}]Pass Icon
  • There's no hard-and-fast rule about when to use ON vs. WHERE. The example that we started with lives in a gray area where either choice makes sense. Putting the people.name = cats.name condition in the ON is probably slightly better, but neither is "wrong".

  • You'll do fine as long as you ask yourself "is this condition about how the left table relates to the right table, or is it about something else?"