Execute Program

SQL: On Conflict Update

Welcome to the On Conflict Update lesson!

This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!

  • Imagine that we want to track how many times someone has visited our website. If a person has never visited, no row exists for them. When a person visits, a row is created or updated to reflect how many times they've visited.

  • The SQL features that we've learned so far aren't sufficient to directly express this kind of "create-or-update" logic. We'll have to do it in JavaScript instead. First, we query the table to see whether any visits row exists for this person's email address. If a row exists then we update it. Otherwise, we create a new row.

  • In this example, our first call to the visit() will create the visits row. Then the second call will notice that existing row and update it.

  • >
    exec(`
    CREATE TABLE visits (
    id INTEGER PRIMARY KEY,
    -- email is unique because we only want one row per person
    email TEXT UNIQUE NOT NULL,
    count INTEGER NOT NULL
    )
    `);

    function visit(email) {
    const visits = exec(
    `SELECT * FROM visits WHERE email = ?`,
    [email]
    );

    if (visits.length === 0) {
    exec(
    `INSERT INTO visits (email, count) VALUES (?, ?)`,
    [email, 1]
    );
    } else {
    const visit = visits[0];
    exec(
    `UPDATE visits SET count = ? WHERE id = ?`,
    [visit.count + 1, visit.id]
    );
    }
    }

    visit('amir@example.com');
    visit('amir@example.com');
    exec(`SELECT email, count FROM visits`);
    Result:
    [{email: 'amir@example.com', count: 2}]Pass Icon
  • That worked, but there's a subtle problem that will bite us in practice. We need to make two observations to see it:

  • First: imagine that a user hits the refresh button while a page is loading. That causes two requests to our servers with very little time between them. If our luck is bad, the two requests will end up running at the same time on two different physical servers.

  • Second: our SELECT and INSERT queries are executed separately, so there's always a small delay between them. It's possible for server 2 to execute its SELECT after server 1's SELECT, but before server 1's INSERT.

  • Here's a step-by-step view of the problem:

    • Server 1 does its SELECT and sees nothing.
    • Server 2 does its SELECT and sees nothing.
    • Server 1 does its INSERT.
    • Server 2 does its INSERT. But there's already a record for this email address, so this insert violates the UNIQUE constraint on email. An exception is thrown!
  • These bugs are called "race conditions": the two servers are "racing" to complete their write, and the slower server "loses" the race. This may seem like a nitpick: how likely is it that this will actually happen?

  • It's relatively unlikely for any given request. However, unlikely problems become likely at scale. If our site gets 1,000,000 requests per month, then one-in-a-million situations will happen once per month. This happens in practice: recently, two Execute Program users saw error pages caused by a UNIQUE constraint violation, which in turn was caused by exactly this kind of race condition!

  • Fortunately, SQL gives us a way to shorten this code while simultaneously fixing the bug. There's already a UNIQUE constraint on visits' email column, because we only want to maintain one row per visitor. If we try to insert two rows for the same person, it will be an error. This is the same constraint violation error that showed up in our race condition bug.

  • (When you think that code in this course will cause an error, you can type error as its output.)

  • >
    exec(`
    CREATE TABLE visits (
    id INTEGER PRIMARY KEY,
    -- email is unique because we only want one row per person
    email TEXT UNIQUE NOT NULL,
    count INTEGER NOT NULL
    );
    INSERT INTO visits (email, count) VALUES ('amir@example.com', 1);
    INSERT INTO visits (email, count) VALUES ('amir@example.com', 1);
    `);
    Result:
    Error: UNIQUE constraint failed: visits.emailPass Icon
  • That error is good: it shows that the UNIQUE constraint is working! We can use this error to our advantage with SQL's ON CONFLICT clause.

  • We've already seen ON CONFLICT (column_name) DO NOTHING, which would leave our record unchanged. But there's also ON CONFLICT (column_name) DO UPDATE [...]. We'll tell the database "if the insert conflicts with an existing record's email, increment that row's count instead of erroring".

  • Here's an isolated example of that. Look closely at the ON CONFLICT (email) DO UPDATE SET count = count + 1. It means:

    • If our insert violates the email column's uniqueness constraint,
    • Update that existing row by incrementing its count by 1.
  • >
    exec(`
    CREATE TABLE visits (
    id INTEGER PRIMARY KEY,
    -- email is unique because we only want one row per person
    email TEXT UNIQUE NOT NULL,
    count INTEGER NOT NULL
    );
    INSERT INTO visits (email, count) VALUES ('amir@example.com', 1);
    INSERT INTO visits (email, count) VALUES ('amir@example.com', 1)
    ON CONFLICT (email) DO UPDATE SET count = count + 1;
    SELECT email, count FROM visits
    `);
    Result:
    [{email: 'amir@example.com', count: 2}]Pass Icon
  • Now we can write a better version of our visit function. Instead of retrieving the row and using if to decide what to do, we'll use an ON CONFLICT. That way the database already knows what to do if the row exists!

  • Here's a code problem:

    Write a visit function that:

    • INSERTs a row for the given email when one doesn't exist.
    • UPDATEs the row if the given email already exists.
    • Uses an ON CONFLICT (...) DO UPDATE ... to do both of the above at once.
    exec(`
    CREATE TABLE visits (
    id INTEGER PRIMARY KEY,
    -- email is unique because we only want one row per person
    email TEXT UNIQUE NOT NULL,
    count INTEGER NOT NULL
    )
    `);

    function visit(email) {
    exec(
    `
    INSERT INTO visits (email, count) VALUES (?, 1)
    ON CONFLICT (email) DO UPDATE SET count = count + 1
    `,
    [email]
    );
    }

    visit('amir@example.com');
    visit('betty@example.com');
    visit('betty@example.com');
    exec(`SELECT email, count FROM visits`);
    Goal:
    [{email: 'amir@example.com', count: 1}, {email: 'betty@example.com', count: 2}]
    Yours:
    [{email: 'amir@example.com', count: 1}, {email: 'betty@example.com', count: 2}]Pass Icon
  • The new version of visit fixes the race condition bug while being simpler than the original. There's no conditional in the JavaScript, and we've replaced a trio of SELECT/INSERT/UPDATE statements with a single INSERT ... ON CONFLICT statement.

  • The new version is also more efficient. Our original version had to retrieve the visits count from the database, then make a decision, then go back to the database again to create or update the visits row. The new version is a single statement that hits the database only once.