How to add upsert support to your API

An upsert is a common technique that combines the automatic creation of a new record with an update to an existing record if it already exists using a single statement. While not all APIs may need this kind of behavior, some find that it can optimise API interactions for those specific cases when a client needs to check for the existance of a resource before a create or update request. In this article, we will offer three approaches for how to add upserts into your REST-based API when this kind of pattern would benefit your consumers.

What is an upsert?

Upserts are common with databases such as MongoDB, PostgreSQL, MSSQL, and others. Whenever an INSERT statement is used, an additional clause is specified to indicate the properties to update if a row already exists with the same key(s).

To demonstrate how it works, let’s consider the following relational table structure:

CREATE TABLE customers (
 id serial PRIMARY KEY,
 email VARCHAR UNIQUE,
 name VARCHAR NOT NULL
 );

As you may have noticed, the email column is declared with a UNIQUE constraints. Any time we try to insert the same row with the same email, we will receive a constraint violation error.

Next, let’s insert a row into the table:

INSERT INTO customers (email, name) VALUES ('[email protected]', 'Lisa');

The new row will insert without error, as the email is unique at this point. If we try to insert the row again, however, we would receive an error regarding the unique constraint violation because a row already exists with the same email.

What if we want to support the bulk importing data from a CSV file, while avoiding a SELECT statement to see if the data already exists to determine if we should insert a new record or update an existing record? We can accomplish this using an upsert statement. For databases such as PostgreSQL, we perform an upsert by using the ON CONFLICT clause in our INSERT statement:

INSERT INTO customers (email, name)
VALUES ('[email protected]', 'Lisa Simpson')
ON CONFLICT (email) 
DO
 UPDATE
   SET name = EXCLUDED.name;

Now, anytime we insert data, we will automatically update the name to the new value if a row already exists with the same email. No need to check if the row exists to determine if we should add a new row or update an existing row. Convienant and much more efficient as we reduce the number of SQL statements we must execute by half.

So how do you design an upsert behavior in a REST-based API to achieve the same results? Let’s look at three approaches and determine which one may be the best fit.

Approach 1: Use a PUT with a consumer-supplied identifier

As you may recall, the HTTP PUT verb is often used to support the replacement of an existing resource instance with a new representation. What you may not recall is that PUT may also act as a create action as well. For example:

PUT /customers/lisa%40thesimpsons.show

{
  "email": "[email protected]",
  "name": "Lisa Simpson"
}

In this example, the consumer is supplying the identifier, 12345. The API will check for an existing resource and, if found, update the resource representation to what has been provided and return a 200 OK response code. If the resource is not found, then it is created using the identifier provided and return a 201 CREATED response code.

Elasticsearch supports this style of upsert, allowing identifiers that exist in one system to be used as the identifier for an indexed document within Elasticsearch. In the event that an indexed document already exists with the provided identifier, the document is replaced with the new version.

Note that using this approach requires that API consumers generate their own unique, predictable identifier that will be included into the URL. If consumers generate random or sequential identifiers, this technique will not work. If this is the case for your API, you will need to select one of the two remaining approaches detailed below.

Approach 2: Use a POST with unique property detection

In this approach, the consumer does not provide the identifier. Instead, the consumer issues a POST to the resource collection with the resource details as typical of a create resource request. It becomes the API server’s responsibility to perform a uniqueness check and determine if a new resource should be created or an existing resource replaced.

For example:

POST /customers

{
  "email": "[email protected]",
  "name": "Lisa Simpson"
}

In this example, the consumer submits the representation of the resource desired. The API will check for an existing resource using a specific property, such as the email address. If found, the resource representation is updated to what has been provided and a 200 OK response code is returned. If the resource is not found, then it is created using the representation provided and a 201 CREATED response code is returned.

It is important to note this behaviour in the API documentation, to ensure that developers are aware of this upsert behaviour and anticipate the result based on the response code.

Approach 3: Use a POST with unique composite property detection

This approach uses more than one resource property as part of the uniqueness check. In relational database terms, this is referred to as a composite key. This technique may be familiar to those dealing with personal identifiable information (PII), where a combination of name, email, and other properties are used to uniquely identify a customer.

Below is the same example as we showed in the previous approach:

POST /customers

{
  "email": "[email protected]",
  "name": "Lisa Simpson"
}

However, this time the server may use a combination of name and email to determine uniqueness. This may be necessary to support multiple accounts that share the same email address, as is often the case for a shared household email address.

In this case, the API will check for an existing resource using the combination of name and email properties. If found, the resource representation is updated to what has been provided and a 200 OK response code is returned. If the resource is not found, then it is created using the representation provided and a 201 CREATED response code is returned.

Again, it is important to note this behavior in the API documentation, to ensure that developers are aware of this upsert behavior and the specific combinations of properties that make up the uniqueness check.

Wrap-Up

While not common, upserts offer a consumer optimisation by reducing the number of HTTP requests required to check for the existance of a resource and either create or update it. This is especially important to reduce network chattiness, as well as limit concurrency issues over HTTP. Combined with techniques such as conditional headers using eTags or last modified dates, HTTP consumers are able to build safe and efficient applications that create or replace resources easily and effectively.

If you’re interested in other multi-approach solutions be sure to check out Tips for building an effective enterprise API programme. This article offers aproaches that can benefit anyone just starting out to those looking for some additional insights for their established API programme.