SQL: Foreign Keys
Welcome to the Foreign Keys lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
One big selling point of SQL databases is correctness: the database lets us set up constraints that should never be violated. Then it makes sure that we don't accidentally violate them.
- We can't insert into columns that don't exist.
- We can't insert or update nulls into
NOT NULLcolumns. - We can't insert or update duplicate values in
UNIQUEcolumns. - In most databases, we can't insert the wrong data type into columns. (SQLite is an anomaly here; it allows us to make this mistake.)
We've seen tables referencing other tables, like a cat's
owner_idreferencing a person'sid. What if we make a mistake with those IDs? What if we insert a cat whoseowner_iddoesn't reference any actual person? So far, we haven't seen a way to prevent that.(Remember that statements like
CREATEandINSERTreturn[].)>
exec(`CREATE TABLE people (id INTEGER PRIMARY KEY, name TEXT NOT NULL);-- Remember that INTEGER PRIMARY KEYs start at 1.INSERT INTO people (name) VALUES ('Amir');CREATE TABLE cats (owner_id INTEGER NOT NULL, name TEXT NOT NULL);INSERT INTO cats (owner_id, name) VALUES (2170, 'Ms. Fluff');`);Result:
[]
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT id FROM people`);Result:
[{id: 1}] - Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT owner_id FROM cats WHERE name = 'Ms. Fluff'`);Result:
[{owner_id: 2170}] - Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`SELECT name FROM people WHERE id = 2170`);Result:
[]
This is a problem! The cat has an
owner_id, but there's no person with that ID.Suppose that this database is being used by a cat groomer. Ms. Fluff is dropped off, gets her grooming, looks great... and the database says that her owner is person 2170, who doesn't exist. Who do we call to pick her up? The database is wrong; cats are supposed to have owners who actually exist!
We can prevent this mistake with a foreign key: a column whose value must equal a value in another table. In our case, we want the database to ensure that every cat's
owner_idequals some person'sid.Note the new
REFERENCESsyntax in the cats table below.>
exec(`CREATE TABLE people (id INTEGER PRIMARY KEY, name TEXT NOT NULL);CREATE TABLE cats (owner_id INTEGER REFERENCES people(id),name TEXT NOT NULL);`);Result:
[]
The database won't let us violate this foreign key in any way.
- We can't insert a cat whose
owner_idreferences a person ID that doesn't exist. - We can't change an existing cat's
owner_idto one that doesn't match some person. - If some cats reference a person, then we can't change the person's ID. That would leave the cats referencing a person ID that doesn't exist.
- For the same reason, we can't delete a person referenced by some cats.
- We can't insert a cat whose
(All of the following examples use the two tables above. Each example is separate, and each starts with the tables empty. Remember that you can type
errorto indicate an error.)- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`INSERT INTO people (id, name) VALUES (100, 'Amir');INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');`);Result:
[]
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu')`);Result:
Error: FOREIGN KEY constraint failed
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`INSERT INTO people (id, name) VALUES (100, 'Amir');INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');UPDATE people SET id = 101 WHERE name = 'Amir';`);Result:
Error: FOREIGN KEY constraint failed
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`INSERT INTO people (id, name) VALUES (100, 'Amir');INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');UPDATE cats SET owner_id = 101 WHERE name = 'Ms. Fluff';`);Result:
Error: FOREIGN KEY constraint failed
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`INSERT INTO people (id, name) VALUES (200, 'Betty');INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');DELETE FROM people WHERE name = 'Betty';`);Result:
Error: FOREIGN KEY constraint failed
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`INSERT INTO people (id, name) VALUES (200, 'Betty');INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');DELETE FROM cats WHERE name = 'Keanu';DELETE FROM people WHERE name = 'Betty';`);Result:
[]
All of this safety comes from that one little
REFERENCESin our table definition. This shows us the power of SQL's declarative constraints. We don't have to write separate code to say "a person can't be deleted if they still have cats", "a cat can't be updated to have an owner ID who doesn't exist", etc. That oneREFERENCESclause prevents many kinds of mistakes.We have a nice phrase that describes preventing all of those mistakes: referential integrity. Referential as in "concerning references from one table to another"; and integrity as in "the references are guaranteed to be correct".
Here's a code problem:
Create a
catstable with aTEXTname and anowner_idINTEGERforeign key that references theidcolumn of thepeopletable.exec(`CREATE TABLE people (id INTEGER PRIMARY KEY, name TEXT NOT NULL);`);exec(`CREATE TABLE cats (owner_id INTEGER REFERENCES people(id),name TEXT NOT NULL);`);exec(`INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu')`);- Goal:
Error: FOREIGN KEY constraint failed
- Yours:
Error: FOREIGN KEY constraint failed
We're still allowed to
UPDATEandDELETErows in any way that satisfies the foreign key constraint. For example, we can update Keanu'sowner_idto reference Amir instead of Betty. Then Betty can be deleted because no cats reference her any more.- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
exec(`-- Ms. Fluff belongs to Amir (user 100)INSERT INTO people (id, name) VALUES (100, 'Amir');INSERT INTO cats (owner_id, name) VALUES (100, 'Ms. Fluff');-- Keanu belongs to Betty (user 200)INSERT INTO people (id, name) VALUES (200, 'Betty');INSERT INTO cats (owner_id, name) VALUES (200, 'Keanu');-- Keanu belongs to Amir instead of BettyUPDATE cats SET owner_id = 100 WHERE name = 'Keanu';-- Delete Betty, who is no longer referenced by any catsDELETE FROM people WHERE name = 'Betty';SELECT name, owner_id FROM cats;`);Result:
[{name: 'Ms. Fluff', owner_id: 100}, {name: 'Keanu', owner_id: 100}] The foreign keys were respected at every step of that change:
- Initially, both cats'
owner_ids reference different people. - We update Keanu's
owner_idto be 100, which is Amir's ID. Now both cats are owned by Amir. - Then we delete Betty, which is OK because no cats reference her any more.
- Initially, both cats'
Setting up these constraints takes effort. It's more work than throwing all of our data into a schemaless database with no foreign keys. However, we do this work because it saves us from other work that's more annoying, more difficult, and more error-prone.
Carefully creating a schema now means that we're less likely to end up with invalid data in our production database. Our foreign key means that we'll never find "orphaned" cats whose
owner_idreferences an owner who doesn't exist. If PostgreSQL ever allows us to orphan a cat, that's a bug in PostgreSQL!You may also hear people talk about "leaked" records, by analogy to memory leaks. That means the same thing as "orphaned": a row that's referencing another row that no longer exists, because we didn't add a foreign key constraint.
A couple of final notes. By default, SQLite doesn't enforce foreign keys at all. However, in this lesson we saw some "FOREIGN KEY constraint failed" errors, which shows that SQLite was enforcing foreign keys. That's because Execute Program enables the relevant SQLite option:
PRAGMA foreign_keys = ON;. Postgres, MySQL, and most other SQL databases enforce foreign keys by default, so you don't need to manually turn them on in most cases.Second, a note about terminology. Why are these called foreign keys? A key is a column that uniquely identifies a row. We've seen several tables with a numeric key called
id. But we could have other types of keys. For example,userstables often have anemailcolumn that's a key: no two users ever have the same email.A cat's
owner_idreferences the person'sidkey. From the cat's perspective,owner_idis referencing something "foreign": theidcolumn of the people table. That's why it's a "foreign key".