Execute Program

SQL: Foreign Keys

Welcome to the Foreign Keys lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • One big selling point of SQL databases is correctness: the database lets us set up constraints that should never be violated. Then it makes sure that we don't accidentally violate them.

    • We can't insert into columns that don't exist.
    • We can't insert or update nulls into NOT NULL columns.
    • We can't insert or update duplicate values in UNIQUE columns.
    • In most databases, we can't insert the wrong data type into columns. (SQLite is an anomaly here; it allows us to make this mistake.)
  • We've seen tables referencing other tables, like a cat's owner_id referencing a person's id. What if we make a mistake with those IDs? What if we insert a cat whose owner_id doesn't reference any actual person? So far, we haven't seen a way to prevent that.

  • (Remember that statements like CREATE and INSERT return [].)

  • >
    exec(`
    CREATE TABLE people (id INTEGER PRIMARY KEY, name TEXT NOT NULL);
    -- Remember that INTEGER PRIMARY KEYs start at 1.
    INSERT INTO people (name) VALUES ('Amir');
    CREATE TABLE cats (owner_id INTEGER NOT NULL, name TEXT NOT NULL);
    INSERT INTO cats (owner_id, name) VALUES (2170, 'Ms. Fluff');
    `);
    Result:
    []Pass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`SELECT id FROM people`);
    Result:
    [{id: 1}]Pass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`SELECT owner_id FROM cats WHERE name = 'Ms. Fluff'`);
    Result:
    [{owner_id: 2170}]Pass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`SELECT name FROM people WHERE id = 2170`);
    Result:
    []Pass Icon
  • This is a problem! The cat has an owner_id, but there's no person with that ID.

  • Suppose that this database is being used by a cat groomer. Ms. Fluff is dropped off, gets her grooming, looks great... and the database says that her owner is person 2170, who doesn't exist. Who do we call to pick her up? The database is wrong; cats are supposed to have owners who actually exist!

  • We can prevent this mistake with a foreign key: a column whose value must equal a value in another table. In our case, we want the database to ensure that every cat's owner_id equals some person's id.

  • Note the new REFERENCES syntax in the cats table below.

  • >
    exec(`
    CREATE TABLE people (id INTEGER PRIMARY KEY, name TEXT NOT NULL);
    CREATE TABLE cats (
    owner_id INTEGER REFERENCES people(id),
    name TEXT NOT NULL
    );
    `);
    Result:
    []Pass Icon
  • The database won't let us violate this foreign key in any way.

    • We can't insert a cat whose owner_id references a person ID that doesn't exist.
    • We can't change an existing cat's owner_id to one that doesn't match some person.
    • If some cats reference a person, then we can't change the person's ID. That would leave the cats referencing a person ID that doesn't exist.
    • For the same reason, we can't delete a person referenced by some cats.
  • (All of the following examples use the two tables above. Each example is separate, and each starts with the tables empty. Remember that you can type error to indicate an error.)

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    INSERT INTO people (id, name) VALUES (100, 'Amir');
    INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');
    `);
    Result:
    []Pass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu')`);
    Result:
    Error: FOREIGN KEY constraint failedPass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    INSERT INTO people (id, name) VALUES (100, 'Amir');
    INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');
    UPDATE people SET id = 101 WHERE name = 'Amir';
    `);
    Result:
    Error: FOREIGN KEY constraint failedPass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    INSERT INTO people (id, name) VALUES (100, 'Amir');
    INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');
    UPDATE cats SET owner_id = 101 WHERE name = 'Ms. Fluff';
    `);
    Result:
    Error: FOREIGN KEY constraint failedPass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    INSERT INTO people (id, name) VALUES (200, 'Betty');
    INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');
    DELETE FROM people WHERE name = 'Betty';
    `);
    Result:
    Error: FOREIGN KEY constraint failedPass Icon
  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    INSERT INTO people (id, name) VALUES (200, 'Betty');
    INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');
    DELETE FROM cats WHERE name = 'Keanu';
    DELETE FROM people WHERE name = 'Betty';
    `);
    Result:
    []Pass Icon
  • All of this safety comes from that one little REFERENCES in our table definition. This shows us the power of SQL's declarative constraints. We don't have to write separate code to say "a person can't be deleted if they still have cats", "a cat can't be updated to have an owner ID who doesn't exist", etc. That one REFERENCES clause prevents many kinds of mistakes.

  • We have a nice phrase that describes preventing all of those mistakes: referential integrity. Referential as in "concerning references from one table to another"; and integrity as in "the references are guaranteed to be correct".

  • Here's a code problem:

    Create a cats table with a TEXT name and an owner_id INTEGER foreign key that references the id column of the people table.

    exec(`CREATE TABLE people (id INTEGER PRIMARY KEY, name TEXT NOT NULL);`);
    exec(`
    CREATE TABLE cats (
    owner_id INTEGER REFERENCES people(id),
    name TEXT NOT NULL
    );
    `);
    exec(`INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu')`);
    Goal:
    Error: FOREIGN KEY constraint failed
    Yours:
    Error: FOREIGN KEY constraint failedPass Icon
  • We're still allowed to UPDATE and DELETE rows in any way that satisfies the foreign key constraint. For example, we can update Keanu's owner_id to reference Amir instead of Betty. Then Betty can be deleted because no cats reference her any more.

  • Note: this code example reuses elements (variables, etc.) defined in earlier examples.
    >
    exec(`
    -- Ms. Fluff belongs to Amir (user 100)
    INSERT INTO people (id, name) VALUES (100, 'Amir');
    INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');
    -- Keanu belongs to Betty (user 200)
    INSERT INTO people (id, name) VALUES (200, 'Betty');
    INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');
    -- Keanu belongs to Amir instead of Betty
    UPDATE cats SET owner_id = 100 WHERE name = 'Keanu';
    -- Delete Betty, who is no longer referenced by any cats
    DELETE FROM people WHERE name = 'Betty';
    SELECT name, owner_id FROM cats;
    `);
    Result:
    [{name: 'Ms. Fluff', owner_id: 100}, {name: 'Keanu', owner_id: 100}]Pass Icon
  • The foreign keys were respected at every step of that change:

    • Initially, both cats' owner_ids reference different people.
    • We update Keanu's owner_id to be 100, which is Amir's ID. Now both cats are owned by Amir.
    • Then we delete Betty, which is OK because no cats reference her any more.
  • Setting up these constraints takes effort. It's more work than throwing all of our data into a schemaless database with no foreign keys. However, we do this work because it saves us from other work that's more annoying, more difficult, and more error-prone.

  • Carefully creating a schema now means that we're less likely to end up with invalid data in our production database. Our foreign key means that we'll never find "orphaned" cats whose owner_id references an owner who doesn't exist. If PostgreSQL ever allows us to orphan a cat, that's a bug in PostgreSQL!

  • You may also hear people talk about "leaked" records, by analogy to memory leaks. That means the same thing as "orphaned": a row that's referencing another row that no longer exists, because we didn't add a foreign key constraint.

  • A couple of final notes. By default, SQLite doesn't enforce foreign keys at all. However, in this lesson we saw some "FOREIGN KEY constraint failed" errors, which shows that SQLite was enforcing foreign keys. That's because Execute Program enables the relevant SQLite option: PRAGMA foreign_keys = ON;. Postgres, MySQL, and most other SQL databases enforce foreign keys by default, so you don't need to manually turn them on in most cases.

  • Second, a note about terminology. Why are these called foreign keys? A key is a column that uniquely identifies a row. We've seen several tables with a numeric key called id. But we could have other types of keys. For example, users tables often have an email column that's a key: no two users ever have the same email.

  • A cat's owner_id references the person's id key. From the cat's perspective, owner_id is referencing something "foreign": the id column of the people table. That's why it's a "foreign key".