SQL: ON vs. WHERE
Welcome to the ON vs. WHERE lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
For this lesson, our goal is: find the pairs of cats and owners where the person's name is the same as the cat's name.
(All of the examples in this lesson will use this database, but it will always be empty at the beginning of each example.)
>
exec(`CREATE TABLE people (id INTEGER PRIMARY KEY NOT NULL,name TEXT NOT NULL);CREATE TABLE cats (id INTEGER PRIMARY KEY NOT NULL,owner_id INTEGER NOT NULL REFERENCES people(id),name TEXT NOT NULL);-- Wilford has a cat named Wilford.INSERT INTO people (id, name) VALUES (300, 'Wilford');INSERT INTO cats (owner_id, name) VALUES (300, 'Wilford');-- Cindy has no cats.INSERT INTO people (id, name) VALUES (200, 'Cindy');`);First, we'll do the query in a straightforward way, using both an
ONand aWHERE. We use theONto match people and cats using theowner_idforeign key. We use theWHEREto select only the person-cat pairs where the owner's name is the same as the cat's name. (Note theAScolumn aliases.)- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECTpeople.name AS person_name,cats.name AS cat_nameFROM peopleINNER JOIN catsON people.id = cats.owner_idWHERE people.name = cats.name`);Result:
[{person_name: 'Wilford', cat_name: 'Wilford'}] If we like, we can move the
people.name = cats.namecheck into theON, eliminating theWHERE.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECTpeople.name AS person_name,cats.name AS cat_nameFROM peopleINNER JOIN catsON people.id = cats.owner_idAND people.name = cats.name`);Result:
[{person_name: 'Wilford', cat_name: 'Wilford'}] The cat's name isn't a foreign key to the person's name, but that's OK; we can still compare those columns in the
ON. Databases don't limit what we put in anON, just like they don't limit what we put in aWHERE. If the columns exist, the database will let us join on them.We can also convert the entire
ONinto aWHERE.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECTpeople.name AS person_name,cats.name AS cat_nameFROM peopleINNER JOIN catsWHERE people.id = cats.owner_idAND people.name = cats.name`);Result:
[{person_name: 'Wilford', cat_name: 'Wilford'}] Conceptually, this version is computing every combination of person and cat, then filtering that giant list:
- For each person:
- For each cat:
- Produce a new temporary row that contains this person's columns with this cat's columns.
- For each cat:
- For each of those combinations of every person with every cat:
- If the cat's
owner_idmatches the person'sid:- And the cat's name matches the person's name:
- Include this combination of person and cat in the final results.
- And the cat's name matches the person's name:
- If the cat's
- For each person:
As always, the mental model above is correct, but the database will intelligently optimize the query to make it faster. (If we have 10,000 people and 10,000 cats, the database will NOT produce a temporary list of 100,000,000 rows.)
All three of the queries above are truly equivalent: an
ONmixed with aWHERE; anONonly; and aWHEREonly. So why bother withONat all? Why not always useWHERE? Let's analyze that question along three dimensions: performance, correctness, and clarity for human readers.First: performance. Modern databases are very good at optimizing
ONandWHERE. You're unlikely to encounter a performance problem that's fixed by converting anONto aWHEREor vice-versa. So performance isn't a reason to choose one or the other.Second: correctness. For inner joins (the common type of "simple" joins that we're using here),
ONandWHEREare interchangeable. Your query will never be wrong if you convert one to the other. However,ONandWHEREare NOT interchangeable for any other type of join. InLEFTjoins,RIGHTjoins, and other more rare types that we don't cover in this course, likeOUTER, converting anONto aWHEREcan change the results.Third: clarity for human readers. An
ONshould specify which rows from the left table go with which rows from the right table. Here are some examples of goodONs. They concern the relationship between the two tables being joined.- Matching the left table's foreign key against the column that it references in the right table.
- Matching email addresses in an
invitations_senttable against email addresses in ouruserstable to find out what percentage of invited users eventually register (assuming that there's no proper foreign key between the tables).
Here are some examples of conditions that are better in a
WHEREthan anON. They're not about the two tables' relationships.- Selecting a range of rows (like
WHERE created_at > /* some date here */). - Selecting a specific row (like
WHERE user_id = ?orWHERE cats.name = 'Ms. Fluff').
- Selecting a range of rows (like
Separating
ONconditions fromWHEREconditions helps with both correctness and clarity. It makes mistakes with rare types of joins less likely, where converting anONto aWHEREwill give different results. It also gives readers a hint about whether the condition concerns combinations of left and right rows.Here's a code problem:
Find the pairs of cats and people where the person's name is the same as the cat's name. You can use an
ONor aWHERE; either works.A quick note about our expected output. Wilford owns a cat named Wilford, so they should be included in the results. There's also a person named Cindy, and a cat named Cindy, but Cindy-the-person doesn't own Cindy-the-cat. For that reason, Cindy and Cindy should not appear in the output.
exec(`CREATE TABLE people (id INTEGER PRIMARY KEY NOT NULL,name TEXT NOT NULL);CREATE TABLE cats (id INTEGER PRIMARY KEY NOT NULL,owner_id INTEGER NOT NULL REFERENCES people(id),name TEXT NOT NULL);-- Wilford has a cat named Wilford.INSERT INTO people (id, name) VALUES (100, 'Wilford');INSERT INTO cats (owner_id, name) VALUES (100, 'Wilford');-- Cindy has no cats.INSERT INTO people (id, name) VALUES (200, 'Cindy');-- Dalili has a cat named Cindy. But Cindy shouldn't appear in the-- results because Cindy-the-cat isn't owned by Cindy-the-person.INSERT INTO people (id, name) VALUES (300, 'Dalili');INSERT INTO cats (owner_id, name) VALUES (300, 'Cindy');SELECTpeople.name AS person_name,cats.name AS cat_nameFROM peopleINNER JOIN catsON people.name = cats.nameAND people.id = cats.owner_id`);- Goal:
[{person_name: 'Wilford', cat_name: 'Wilford'}]- Yours:
[{person_name: 'Wilford', cat_name: 'Wilford'}]
There's no hard-and-fast rule about when to use
ONvs.WHERE. The example that we started with lives in a gray area where either choice makes sense. Putting thepeople.name = cats.namecondition in theONis probably slightly better, but neither is "wrong".You'll do fine as long as you ask yourself "is this condition about how the left table relates to the right table, or is it about something else?"