Execute Program

SQL: Simple Joins

Welcome to the Simple Joins lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • Foreign keys let us guarantee that rows in one table correctly reference rows in another table. With that guarantee in place, we can ask the database system to combine the tables' data for us.

  • In this lesson, our goal is to produce a list of every cat's name, along with their owner's name. We could write that code in JavaScript, using a nested loop that executes one cat query per person. But that would be inefficient. SQL can do it in one query!

  • Here's our cats-and-owners database. (We'll use this database throughout this lesson, but it will reset before each code example. That is, each code example will start with exactly the tables and rows defined here.)

  • >
    exec(`
    CREATE TABLE people (
    id INTEGER PRIMARY KEY NOT NULL,
    first_name TEXT NOT NULL
    );
    CREATE TABLE cats (
    id INTEGER PRIMARY KEY NOT NULL,
    owner_id INTEGER NOT NULL REFERENCES people(id),
    name TEXT NOT NULL
    );
    -- Amir has a cat named Ms. Fluff.
    INSERT INTO people (id, first_name) VALUES (100, 'Amir');
    INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');
    -- Betty has a cat named Keanu.
    INSERT INTO people (id, first_name) VALUES (200, 'Betty');
    INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');
    `);
    Result:
  • We can build our cats-and-owners list by JOINing the two tables together into one. First, we'll do the simplest possible join: people JOIN cats, giving us combinations of every person with every cat. (It will look strange, so we don't recommend dwelling on it; we'll improve it soon.)

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`SELECT * FROM people JOIN cats`);
    Result:
  • The join gave us every combination of a person and a cat, whether or not they're related via owner_id. There are 2 people and 2 cats, so the join gave us 4 rows (2 * 2):

    • Amir and Ms. Fluff
    • Amir and Keanu
    • Betty and Ms. Fluff
    • Betty and Keanu
  • If we had 100 people and 100 cats then this join would give us 10,000 rows: each of the 100 people would be independently combined with each of the 100 cats. You can think of it like this:

    • For each person in the people table:
      • For each cat in the cats table:
        • Return a row with the columns from that person and this cat.
  • The join has all of the columns from people (id and first_name) and all of the columns from cats (id, name, and owner_id). Both tables have an id column, which is awkward because we can't have two columns with the same name. The cats' ids are "winning" here, so we see them rather than the peoples' ids. We'll address the issue of conflicting column names in more detail later.

  • To make our join more useful, we can add an ON clause. ON is like WHERE, but it applies specifically to joins. We "select where", but we "join on". For our cats-and-owners list, we want all pairs of people and cats where people.id equals cats.owner_id:

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`SELECT * FROM people JOIN cats ON people.id = cats.owner_id`);
    Result:
  • This is much better! Our condition was people.id = cats.owner_id, so people and cats get matched up. For example, the cat with owner_id 100 gets joined with person 100: Amir gets matched with Ms. Fluff. Because our cats table has a FOREIGN KEY that associates cats.owner_id to people.id, we know our cats will always be matched with their correct owner!

  • There's a simple rule for thinking about what a join will do. It's always correct to think about basic JOINs as two nested loops with a filter, like this:

    • For each person in the people table:
      • For each cat in the cats table:
        • If the ON condition is true for this person and this cat:
          • Return a row with the columns from this person and this cat.
  • Here's a code problem:

    Write a JOIN that joins people and cats on cats.owner_id = people.id. Make sure that you join people to cats, in that order.

    exec(`
    CREATE TABLE people (
    id INTEGER PRIMARY KEY NOT NULL,
    first_name TEXT NOT NULL
    );
    CREATE TABLE cats (
    id INTEGER PRIMARY KEY NOT NULL,
    owner_id INTEGER NOT NULL REFERENCES people(id),
    name TEXT NOT NULL
    );

    -- Amir owns Ms. Fluff
    INSERT INTO people (id, first_name) VALUES (100, 'Amir');
    INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');

    -- Betty owns Keanu
    INSERT INTO people (id, first_name) VALUES (200, 'Betty');
    INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');
    `);
    exec(`
    SELECT * FROM people JOIN cats ON cats.owner_id = people.id
    `);
    Goal:
    [{id: 1, first_name: 'Amir', owner_id: 100, name: 'Ms. Fluff'}, {id: 2, first_name: 'Betty', owner_id: 200, name: 'Keanu'}]
    Yours:
    [{id: 1, first_name: 'Amir', owner_id: 100, name: 'Ms. Fluff'}, {id: 2, first_name: 'Betty', owner_id: 200, name: 'Keanu'}]Pass Icon
  • With our cats and owners matched, we can pick the columns that we want with AS, like we've done for more basic SELECTs. (For example, SELECT name AS person, age AS oldness FROM some_table.) In a JOIN, data is identified by the table it comes from, like people.first_name AS person.

  • Here's a code problem:

    Write a JOIN that joins people and cats on cats.owner_id = people.id. Using AS, return data in the form of [{person: 'Amir', cat: 'Ms. Fluff'}]

    exec(`
    CREATE TABLE people (
    id INTEGER PRIMARY KEY NOT NULL,
    first_name TEXT NOT NULL
    );
    CREATE TABLE cats (
    id INTEGER PRIMARY KEY NOT NULL,
    owner_id INTEGER NOT NULL REFERENCES people(id),
    name TEXT NOT NULL
    );

    INSERT INTO people (id, first_name) VALUES (100, 'Amir');
    INSERT INTO cats (id, owner_id, name) VALUES (1, 100, 'Ms. Fluff');
    INSERT INTO people (id, first_name) VALUES (200, 'Betty');
    INSERT INTO cats (id, owner_id, name) VALUES (2, 200, 'Keanu');
    `);
    exec(`
    SELECT people.first_name AS person, cats.name AS cat
    FROM people
    JOIN cats
    ON people.id = cats.owner_id
    `);
    Goal:
    [{person: 'Amir', cat: 'Ms. Fluff'}, {person: 'Betty', cat: 'Keanu'}]
    Yours:
    [{person: 'Amir', cat: 'Ms. Fluff'}, {person: 'Betty', cat: 'Keanu'}]Pass Icon
  • Usually, the order of tables in a join doesn't matter: cats JOIN people is the same as people JOIN cats. There are two cases where join order may matter, though, depending on your database system.

  • The first case is query optimization. In rare cases, your database engine may optimize a complex join badly, causing it to be slower than it could be. Reordering the join may help.

  • The second case is when there are duplicate column names. When the joined tables have duplicate column names, the last table in the JOIN wins. Here's an example from earlier in the lesson: we select from people JOIN cats with no AS to alias the column names.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`SELECT * FROM people JOIN cats ON people.id = cats.owner_id`);
    Result:
  • Notice that the id columns returned are the cats' IDs: 1 and 2. If we reverse the order and select cats JOIN people, the id column will be the people's IDs, 100 and 200.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`SELECT * FROM cats JOIN people ON people.id = cats.owner_id`);
    Result:
  • It's common for tables to have the same column names, especially for id columns. This can get very confusing, so it's best to use AS to explicitly alias every joined column, rather than blindly selecting *. That will make your SQL code more clear to other programmers. We recommend writing those aliases on their own lines, indented inside the SELECT.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    SELECT
    cats.id AS cat_id,
    people.id AS person_id
    FROM people JOIN cats ON people.id = cats.owner_id
    `);
    Result:
  • We've been treating JOIN as if it returns a table. The proper term is "relation", which means: "it has rows and columns and supports operations like WHERE, AS, JOIN, etc."

  • (If a join combines two relations into a new relation, does that mean that we can join on the result of a join? Yes! We'll see that in a later lesson.)

  • Relations are the core concept in SQL databases. We've called them "SQL databases" so far, but "relational databases" is a more correct term. SQL is just the most popular relational query language.

  • There are many kinds of relations. A table is a relation stored on the disk. A join gives us a relation that the database constructs on the fly based on our join condition.

  • Even SELECT 1 gives us a relation: it has one row with one column named 1. That's why executing that statement gives us an array of objects, as if we were selecting from a table: [{'1': 1}]. We could join a table against SELECT 1 if we wanted to!

  • When querying relations, we can use all of the query operations that we've already seen on tables: AS, WHERE, etc. In the example above, we already did that with AS.

  • When we use WHERE with joins, it filters the rows, just like it does with a normal table. The WHERE can see columns from both of the joined tables.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    SELECT people.first_name AS person, cats.name AS cat
    FROM people
    JOIN cats
    ON people.id = cats.owner_id
    WHERE people.first_name = 'Amir'
    `);
    Result:
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    SELECT people.first_name AS person, cats.name AS cat
    FROM people
    JOIN cats
    ON people.id = cats.owner_id
    WHERE people.first_name = 'Betty'
    `);
    Result:
    [{person: 'Betty', cat: 'Keanu'}]Pass Icon
  • What if we join people and cats, but some people have no cats? They won't show up in the results because ON people.id = cats.owner_id is never true for a person with no cats.

  • For example: Betty's id is 200. If no cat's owner_id is 200, then no cat can be paired with Betty in our JOIN. Betty won't show up at all in the results. Amir and Ms. Fluff will still show up because they're still related.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    -- Delete Keanu, so Betty now has no cats.
    DELETE FROM cats WHERE owner_id = 200;
    SELECT people.first_name AS person, cats.name AS cat
    FROM people
    JOIN cats
    ON people.id = cats.owner_id
    `);
    Result:
    [{person: 'Amir', cat: 'Ms. Fluff'}]Pass Icon
  • When a person has multiple cats, the JOIN will find them all.

  • One important note, though. Suppose that Betty owns two cats, Keanu and Kim. The query will NOT combine the cats' names into an array like [{person: 'Betty', cat: ['Keanu', 'Kim']}]. Instead, the join returns two separate rows:

    • {person: 'Betty', cat: 'Keanu'}
    • {person: 'Betty', cat: 'Kim'}
  • (In this example, remember that our test database resets for each code example. Specifically, the DELETE in the example above has no effect here.)

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    -- Betty also has a cat named Kim
    INSERT INTO cats (owner_id, name) VALUES (200, 'Kim');
    SELECT
    people.first_name AS person,
    cats.name AS cat
    FROM people
    JOIN cats
    ON people.id = cats.owner_id
    WHERE people.first_name = 'Betty'
    `);
    Result:
    [{person: 'Betty', cat: 'Keanu'}, {person: 'Betty', cat: 'Kim'}]Pass Icon
  • Here's a code problem:

    Use a join to find cats and their toys. The join should return two columns aliased with AS: cat (the cat's name) and toy (the toy's name).

    exec(`
    CREATE TABLE cats (
    id INTEGER PRIMARY KEY NOT NULL,
    name TEXT NOT NULL
    );
    CREATE TABLE toys (
    cat_id INTEGER REFERENCES cats(id) NOT NULL,
    name TEXT NOT NULL
    );

    INSERT INTO cats (id, name) VALUES (1, 'Ms. Fluff');
    INSERT INTO cats (id, name) VALUES (2, 'Keanu');

    INSERT INTO toys (cat_id, name) VALUES (1, 'Birdo');
    INSERT INTO toys (cat_id, name) VALUES (2, 'Mouser');
    INSERT INTO toys (cat_id, name) VALUES (2, 'Shy Guy');
    `);
    exec(`
    SELECT
    cats.name AS cat,
    toys.name AS toy
    FROM cats
    JOIN toys
    ON cats.id = toys.cat_id
    `);
    Goal:
    [{cat: 'Ms. Fluff', toy: 'Birdo'}, {cat: 'Keanu', toy: 'Mouser'}, {cat: 'Keanu', toy: 'Shy Guy'}]
    Yours:
    [{cat: 'Ms. Fluff', toy: 'Birdo'}, {cat: 'Keanu', toy: 'Mouser'}, {cat: 'Keanu', toy: 'Shy Guy'}]Pass Icon
  • You'll sometimes see the "simple" joins from this lesson called "inner joins", which is the full technical term for them. We can even write cats INNER JOIN toys if we like. However, that does exactly the same thing, so it's not necessary.

  • The existence of "inner" joins implies that there are also "outer" joins, and it's true! They're much less common, so we won't look at them in detail. We will see one type of outer join in a later lesson, though.