SQL: Simple Joins
Welcome to the Simple Joins lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Foreign keys let us guarantee that rows in one table correctly reference rows in another table. With that guarantee in place, we can ask the database system to combine the tables' data for us.
In this lesson, our goal is to produce a list of every cat's name, along with their owner's name. We could write that code in JavaScript, using a nested loop that executes one cat query per person. But that would be inefficient. SQL can do it in one query!
Here's our cats-and-owners database. (We'll use this database throughout this lesson, but it will reset before each code example. That is, each code example will start with exactly the tables and rows defined here.)
>
exec(`CREATE TABLE people (id INTEGER PRIMARY KEY NOT NULL,first_name TEXT NOT NULL);CREATE TABLE cats (id INTEGER PRIMARY KEY NOT NULL,owner_id INTEGER NOT NULL REFERENCES people(id),name TEXT NOT NULL);-- Amir has a cat named Ms. Fluff.INSERT INTO people (id, first_name) VALUES (100, 'Amir');INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');-- Betty has a cat named Keanu.INSERT INTO people (id, first_name) VALUES (200, 'Betty');INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');`);Result:
We can build our cats-and-owners list by
JOINing the two tables together into one. First, we'll do the simplest possible join:people JOIN cats, giving us combinations of every person with every cat. (It will look strange, so we don't recommend dwelling on it; we'll improve it soon.)- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT * FROM people JOIN cats`);Result:
The join gave us every combination of a person and a cat, whether or not they're related via
owner_id. There are 2 people and 2 cats, so the join gave us 4 rows (2 * 2):- Amir and Ms. Fluff
- Amir and Keanu
- Betty and Ms. Fluff
- Betty and Keanu
If we had 100 people and 100 cats then this join would give us 10,000 rows: each of the 100 people would be independently combined with each of the 100 cats. You can think of it like this:
- For each person in the people table:
- For each cat in the cats table:
- Return a row with the columns from that person and this cat.
- For each cat in the cats table:
- For each person in the people table:
The join has all of the columns from people (
idandfirst_name) and all of the columns from cats (id,name, andowner_id). Both tables have anidcolumn, which is awkward because we can't have two columns with the same name. The cats'ids are "winning" here, so we see them rather than the peoples'ids. We'll address the issue of conflicting column names in more detail later.To make our join more useful, we can add an
ONclause.ONis likeWHERE, but it applies specifically to joins. We "select where", but we "join on". For our cats-and-owners list, we want all pairs of people and cats wherepeople.idequalscats.owner_id:- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT * FROM people JOIN cats ON people.id = cats.owner_id`);Result:
This is much better! Our condition was
people.id = cats.owner_id, so people and cats get matched up. For example, the cat withowner_id100 gets joined with person 100: Amir gets matched with Ms. Fluff. Because ourcatstable has aFOREIGN KEYthat associatescats.owner_idtopeople.id, we know our cats will always be matched with their correct owner!There's a simple rule for thinking about what a join will do. It's always correct to think about basic
JOINs as two nested loops with a filter, like this:- For each person in the people table:
- For each cat in the cats table:
- If the
ONcondition is true for this person and this cat:- Return a row with the columns from this person and this cat.
- If the
- For each cat in the cats table:
- For each person in the people table:
Here's a code problem:
Write a
JOINthat joins people and cats oncats.owner_id = people.id. Make sure that you join people to cats, in that order.exec(`CREATE TABLE people (id INTEGER PRIMARY KEY NOT NULL,first_name TEXT NOT NULL);CREATE TABLE cats (id INTEGER PRIMARY KEY NOT NULL,owner_id INTEGER NOT NULL REFERENCES people(id),name TEXT NOT NULL);-- Amir owns Ms. FluffINSERT INTO people (id, first_name) VALUES (100, 'Amir');INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');-- Betty owns KeanuINSERT INTO people (id, first_name) VALUES (200, 'Betty');INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');`);exec(`SELECT * FROM people JOIN cats ON cats.owner_id = people.id`);- Goal:
[{id: 1, first_name: 'Amir', owner_id: 100, name: 'Ms. Fluff'}, {id: 2, first_name: 'Betty', owner_id: 200, name: 'Keanu'}]- Yours:
[{id: 1, first_name: 'Amir', owner_id: 100, name: 'Ms. Fluff'}, {id: 2, first_name: 'Betty', owner_id: 200, name: 'Keanu'}]
With our cats and owners matched, we can pick the columns that we want with
AS, like we've done for more basicSELECTs. (For example,SELECT name AS person, age AS oldness FROM some_table.) In aJOIN, data is identified by the table it comes from, likepeople.first_name AS person.Here's a code problem:
Write a
JOINthat joins people and cats oncats.owner_id = people.id. UsingAS, return data in the form of[{person: 'Amir', cat: 'Ms. Fluff'}]exec(`CREATE TABLE people (id INTEGER PRIMARY KEY NOT NULL,first_name TEXT NOT NULL);CREATE TABLE cats (id INTEGER PRIMARY KEY NOT NULL,owner_id INTEGER NOT NULL REFERENCES people(id),name TEXT NOT NULL);INSERT INTO people (id, first_name) VALUES (100, 'Amir');INSERT INTO cats (id, owner_id, name) VALUES (1, 100, 'Ms. Fluff');INSERT INTO people (id, first_name) VALUES (200, 'Betty');INSERT INTO cats (id, owner_id, name) VALUES (2, 200, 'Keanu');`);exec(`SELECT people.first_name AS person, cats.name AS catFROM peopleJOIN catsON people.id = cats.owner_id`);- Goal:
[{person: 'Amir', cat: 'Ms. Fluff'}, {person: 'Betty', cat: 'Keanu'}]- Yours:
[{person: 'Amir', cat: 'Ms. Fluff'}, {person: 'Betty', cat: 'Keanu'}]
Usually, the order of tables in a join doesn't matter:
cats JOIN peopleis the same aspeople JOIN cats. There are two cases where join order may matter, though, depending on your database system.The first case is query optimization. In rare cases, your database engine may optimize a complex join badly, causing it to be slower than it could be. Reordering the join may help.
The second case is when there are duplicate column names. When the joined tables have duplicate column names, the last table in the
JOINwins. Here's an example from earlier in the lesson: we select frompeople JOIN catswith noASto alias the column names.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT * FROM people JOIN cats ON people.id = cats.owner_id`);Result:
Notice that the
idcolumns returned are the cats' IDs:1and2. If we reverse the order and selectcats JOIN people, theidcolumn will be the people's IDs,100and200.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT * FROM cats JOIN people ON people.id = cats.owner_id`);Result:
It's common for tables to have the same column names, especially for
idcolumns. This can get very confusing, so it's best to useASto explicitly alias every joined column, rather than blindly selecting*. That will make your SQL code more clear to other programmers. We recommend writing those aliases on their own lines, indented inside theSELECT.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECTcats.id AS cat_id,people.id AS person_idFROM people JOIN cats ON people.id = cats.owner_id`);Result:
We've been treating JOIN as if it returns a table. The proper term is "relation", which means: "it has rows and columns and supports operations like
WHERE,AS,JOIN, etc."(If a join combines two relations into a new relation, does that mean that we can join on the result of a join? Yes! We'll see that in a later lesson.)
Relations are the core concept in SQL databases. We've called them "SQL databases" so far, but "relational databases" is a more correct term. SQL is just the most popular relational query language.
There are many kinds of relations. A table is a relation stored on the disk. A join gives us a relation that the database constructs on the fly based on our join condition.
Even
SELECT 1gives us a relation: it has one row with one column named1. That's why executing that statement gives us an array of objects, as if we were selecting from a table:[{'1': 1}]. We could join a table againstSELECT 1if we wanted to!When querying relations, we can use all of the query operations that we've already seen on tables:
AS,WHERE, etc. In the example above, we already did that withAS.When we use
WHEREwith joins, it filters the rows, just like it does with a normal table. TheWHEREcan see columns from both of the joined tables.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT people.first_name AS person, cats.name AS catFROM peopleJOIN catsON people.id = cats.owner_idWHERE people.first_name = 'Amir'`);Result:
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT people.first_name AS person, cats.name AS catFROM peopleJOIN catsON people.id = cats.owner_idWHERE people.first_name = 'Betty'`);Result:
[{person: 'Betty', cat: 'Keanu'}] What if we join people and cats, but some people have no cats? They won't show up in the results because
ON people.id = cats.owner_idis never true for a person with no cats.For example: Betty's
idis 200. If no cat'sowner_idis 200, then no cat can be paired with Betty in ourJOIN. Betty won't show up at all in the results. Amir and Ms. Fluff will still show up because they're still related.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`-- Delete Keanu, so Betty now has no cats.DELETE FROM cats WHERE owner_id = 200;SELECT people.first_name AS person, cats.name AS catFROM peopleJOIN catsON people.id = cats.owner_id`);Result:
[{person: 'Amir', cat: 'Ms. Fluff'}] When a person has multiple cats, the
JOINwill find them all.One important note, though. Suppose that Betty owns two cats, Keanu and Kim. The query will NOT combine the cats' names into an array like
[{person: 'Betty', cat: ['Keanu', 'Kim']}]. Instead, the join returns two separate rows:{person: 'Betty', cat: 'Keanu'}{person: 'Betty', cat: 'Kim'}
(In this example, remember that our test database resets for each code example. Specifically, the
DELETEin the example above has no effect here.)- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`-- Betty also has a cat named KimINSERT INTO cats (owner_id, name) VALUES (200, 'Kim');SELECTpeople.first_name AS person,cats.name AS catFROM peopleJOIN catsON people.id = cats.owner_idWHERE people.first_name = 'Betty'`);Result:
[{person: 'Betty', cat: 'Keanu'}, {person: 'Betty', cat: 'Kim'}] Here's a code problem:
Use a join to find cats and their toys. The join should return two columns aliased with
AS:cat(the cat's name) andtoy(the toy's name).exec(`CREATE TABLE cats (id INTEGER PRIMARY KEY NOT NULL,name TEXT NOT NULL);CREATE TABLE toys (cat_id INTEGER REFERENCES cats(id) NOT NULL,name TEXT NOT NULL);INSERT INTO cats (id, name) VALUES (1, 'Ms. Fluff');INSERT INTO cats (id, name) VALUES (2, 'Keanu');INSERT INTO toys (cat_id, name) VALUES (1, 'Birdo');INSERT INTO toys (cat_id, name) VALUES (2, 'Mouser');INSERT INTO toys (cat_id, name) VALUES (2, 'Shy Guy');`);exec(`SELECTcats.name AS cat,toys.name AS toyFROM catsJOIN toysON cats.id = toys.cat_id`);- Goal:
[{cat: 'Ms. Fluff', toy: 'Birdo'}, {cat: 'Keanu', toy: 'Mouser'}, {cat: 'Keanu', toy: 'Shy Guy'}]- Yours:
[{cat: 'Ms. Fluff', toy: 'Birdo'}, {cat: 'Keanu', toy: 'Mouser'}, {cat: 'Keanu', toy: 'Shy Guy'}]
You'll sometimes see the "simple" joins from this lesson called "inner joins", which is the full technical term for them. We can even write
cats INNER JOIN toysif we like. However, that does exactly the same thing, so it's not necessary.The existence of "inner" joins implies that there are also "outer" joins, and it's true! They're much less common, so we won't look at them in detail. We will see one type of outer join in a later lesson, though.