SQL: On Conflict Update
Welcome to the On Conflict Update lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Imagine that we want to track how many times someone has visited our website. If a person has never visited, no row exists for them. When a person visits, a row is created or updated to reflect how many times they've visited.
The SQL features that we've learned so far aren't sufficient to directly express this kind of "create-or-update" logic. We'll have to do it in JavaScript instead. First, we query the table to see whether any
visitsrow exists for this person's email address. If a row exists then we update it. Otherwise, we create a new row.In this example, our first call to the
visit()will create thevisitsrow. Then the second call will notice that existing row and update it.>
exec(`CREATE TABLE visits (id INTEGER PRIMARY KEY,-- email is unique because we only want one row per personemail TEXT UNIQUE NOT NULL,count INTEGER NOT NULL)`);function visit(email) {const visits = exec(`SELECT * FROM visits WHERE email = ?`,[email]);if (visits.length === 0) {exec(`INSERT INTO visits (email, count) VALUES (?, ?)`,[email, 1]);} else {const visit = visits[0];exec(`UPDATE visits SET count = ? WHERE id = ?`,[visit.count + 1, visit.id]);}}visit('amir@example.com');visit('amir@example.com');exec(`SELECT email, count FROM visits`);Result:
[{email: 'amir@example.com', count: 2}]That worked, but there's a subtle problem that will bite us in practice. We need to make two observations to see it:
First: imagine that a user hits the refresh button while a page is loading. That causes two requests to our servers with very little time between them. If our luck is bad, the two requests will end up running at the same time on two different physical servers.
Second: our
SELECTandINSERTqueries are executed separately, so there's always a small delay between them. It's possible for server 2 to execute itsSELECTafter server 1'sSELECT, but before server 1'sINSERT.Here's a step-by-step view of the problem:
- Server 1 does its
SELECTand sees nothing. - Server 2 does its
SELECTand sees nothing. - Server 1 does its
INSERT. - Server 2 does its
INSERT. But there's already a record for this email address, so this insert violates theUNIQUEconstraint onemail. An exception is thrown!
- Server 1 does its
These bugs are called "race conditions": the two servers are "racing" to complete their write, and the slower server "loses" the race. This may seem like a nitpick: how likely is it that this will actually happen?
It's relatively unlikely for any given request. However, unlikely problems become likely at scale. If our site gets 1,000,000 requests per month, then one-in-a-million situations will happen once per month. This happens in practice: recently, two Execute Program users saw error pages caused by a
UNIQUEconstraint violation, which in turn was caused by exactly this kind of race condition!Fortunately, SQL gives us a way to shorten this code while simultaneously fixing the bug. There's already a
UNIQUEconstraint onvisits'emailcolumn, because we only want to maintain one row per visitor. If we try to insert two rows for the same person, it will be an error. This is the same constraint violation error that showed up in our race condition bug.(When you think that code in this course will cause an error, you can type
erroras its output.)>
exec(`CREATE TABLE visits (id INTEGER PRIMARY KEY,-- email is unique because we only want one row per personemail TEXT UNIQUE NOT NULL,count INTEGER NOT NULL);INSERT INTO visits (email, count) VALUES ('amir@example.com', 1);INSERT INTO visits (email, count) VALUES ('amir@example.com', 1);`);Result:
Error: UNIQUE constraint failed: visits.email
That error is good: it shows that the
UNIQUEconstraint is working! We can use this error to our advantage with SQL'sON CONFLICTclause.We've already seen
ON CONFLICT (column_name) DO NOTHING, which would leave our record unchanged. But there's alsoON CONFLICT (column_name) DO UPDATE [...]. We'll tell the database "if the insert conflicts with an existing record'semail, increment that row's count instead of erroring".Here's an isolated example of that. Look closely at the
ON CONFLICT (email) DO UPDATE SET count = count + 1. It means:- If our insert violates the
emailcolumn's uniqueness constraint, - Update that existing row by incrementing its count by 1.
- If our insert violates the
>
exec(`CREATE TABLE visits (id INTEGER PRIMARY KEY,-- email is unique because we only want one row per personemail TEXT UNIQUE NOT NULL,count INTEGER NOT NULL);INSERT INTO visits (email, count) VALUES ('amir@example.com', 1);INSERT INTO visits (email, count) VALUES ('amir@example.com', 1)ON CONFLICT (email) DO UPDATE SET count = count + 1;SELECT email, count FROM visits`);Result:
[{email: 'amir@example.com', count: 2}]Now we can write a better version of our
visitfunction. Instead of retrieving the row and usingifto decide what to do, we'll use anON CONFLICT. That way the database already knows what to do if the row exists!Here's a code problem:
Write a
visitfunction that:INSERTs a row for the givenemailwhen one doesn't exist.UPDATEs the row if the givenemailalready exists.- Uses an
ON CONFLICT (...) DO UPDATE ...to do both of the above at once.
exec(`CREATE TABLE visits (id INTEGER PRIMARY KEY,-- email is unique because we only want one row per personemail TEXT UNIQUE NOT NULL,count INTEGER NOT NULL)`);function visit(email) {exec(`INSERT INTO visits (email, count) VALUES (?, 1)ON CONFLICT (email) DO UPDATE SET count = count + 1`,[email]);}visit('amir@example.com');visit('betty@example.com');visit('betty@example.com');exec(`SELECT email, count FROM visits`);- Goal:
[{email: 'amir@example.com', count: 1}, {email: 'betty@example.com', count: 2}]- Yours:
[{email: 'amir@example.com', count: 1}, {email: 'betty@example.com', count: 2}]
The new version of
visitfixes the race condition bug while being simpler than the original. There's no conditional in the JavaScript, and we've replaced a trio ofSELECT/INSERT/UPDATEstatements with a singleINSERT ... ON CONFLICTstatement.The new version is also more efficient. Our original version had to retrieve the visits count from the database, then make a decision, then go back to the database again to create or update the
visitsrow. The new version is a single statement that hits the database only once.