API design guidance: bulk vs batch import

Some APIs require importing lots of data at once to prevent submitting hundreds or thousands of individual POST operations (a slow and tedious process as a result of a CRUD-based API design). There are two common use cases for bulk/batch processing:

  • Importing many resources via a single transaction

  • Performing a background import of a large data set in an efficient manner

Let’s look at this design pattern and explore some variants that may help you with your next API design.

This is a multi-part series on API design guidance, where we take a look at tricks and hidden troubles in API designs and how to avoid them. While these articles may not be exhaustive, they will serve to identify common patterns and anti-patterns in API design. The full series includes:

Definition of API design guidance: bulk and batch import

API design guidance focuses on providing advice that will result in the best experience for API users and, therefore, maximise your API’s chances of success. When it comes to bulk and batch importing, which approach you use will depend on a range of factors, including the API’s capabilities, performance considerations and your particular use case.

There are two common terms used in this pattern: ‘bulk’ and ‘batch’-based processing. Let’s explore the differences between batch vs bulk API design considerations.

Bulk import explained

Bulk API import operations involve a single request made to the API to import a large dataset in a single operation. It is a fast and efficient way to import a significant number of records or other large dataset while minimising the number of requests made to the API.

Bulk import can be particularly useful for migrating data between systems or for initial data seeding for a new application or system. It delivers efficiency in importing large volumes of data without overwhelming the API with requests.

That said, the importing of an entire dataset in a single request may put a heavier load on the API than is ideal. This could impact performance, meaning that other API functionalities are compromised.

Batch import explained

Batch API import operations deal with a large dataset by breaking it down into smaller, more manageable batches, with a request sent to the API for each batch to import separately.

Batch import can be useful when API payload sizes are limited or when there are other factors that restrict the ability to import all data in a single bulk operation. This can help to avoid overwhelming the API.

Bulk and batch import examples

Let’s take a look at a couple of bulk vs batch API import examples that highlight the differences between the two operations. These differences can clearly be seen when it comes to how batch API vs bulk API importing operations process records and deal with successes and failures.

Bulk operations process each submitted record independently, allowing failures to be caught but the remainder of the import to succeed. This means that out of 2000 records, perhaps 76 of them failed while the remaining 1,924 records imported successfully.

In the batch API example, batch operations process all submitted records within a single pass-or-fail transaction, i.e. it is the batch that ends up passing or failing. This means that if 76 records fail in a batch of 2,000, none of the records would be stored. The client would need to correct (or remove) the 76 failed records and resubmit the revised batch.

Bulk import design example

The request below provides individual JSON documents for each resource to bulk import:

POST /accounts/bulk-import
Content-Type: application/json-seq

{ "id":"12", "name":"...", ... }
{ "id":"13", "name":"...", ... }
{ "id":"14", "name":"...", ... }
{ "id":"15", "name":null, ... }

The response provides the success or failure status for each record for processing and correction by the consuming app:

HTTP/1.1 207 Multi-Status
Content-Type: application/json

{
    "items": [
        { "id": "12", "status": 201, errors: [] },
        { "id": "13", "status": 201, errors: [] },
        { "id": "14", "status": 201, errors: [] },
        { "id": "15", "status": 400, errors: [ ... ] }
    ]
}

In this design, we return the result of each record individually by returning the 207 Multi-Status response. This allows clients to automatically perform a second attempt at importing the records or surface failures to a human.

Batch import design example

Batch import operations may be required for some use cases. They are similar to bulk imports, but either succeed or fail as a complete request. For example:

POST /accounts/batch-import
Content-Type: application/json-seq

{ "id":"12", "name":"...", ... }
{ "id":"13", "name":"...", ... }
{ "id":"14", "name":"...", ... }
{ "id":"15", "name":null, ... }
HTTP/1.1 400 Bad Request
Content-Type: application/json

{
    "items": [
        { "id": "12", "status": 200, errors: [] },
        { "id": "13", "status": 200, errors: [] },
        { "id": "14", "status": 200, errors: [] },
        { "id": "15", "status": 400, errors: [ ... ] }
    ]
}

Note that since this is a batch import, the response should be all-or-nothing. However, we still need to tell the client what couldn’t be imported so we can use the same response format as the bulk import. The primary difference is that records that succeeded don’t return a 201 Created status, but rather a 200 OK response to indicate no errors were encountered.

Bulk vs batch processing

The different approaches to error handling shown above each have their advantages. The one you choose will depend on your particular use case.

With bulk API processing, you can kick off the import and then deal with the errors afterwards, with the majority of your records already successfully imported.

Batch API processing, on the other hand, makes it easier for you to deal with errors during the import operation. If a batch fails, you can fix the errors that led to the failure while other batches continue to import.

A word of caution on very large JSON import requests

Requiring clients to construct a single, very large JSON document may require large amounts of memory and CPU to both produce by the client and parse on the server.

To avoid this, each resource representation to be imported can be isolated and complete, using a record separator approach such as a linefeed (LF). For example:

{ "id":"12", "name":"...", ... }
{ "id":"13", "name":"...", ... }
{ "id":"14", "name":"...", ... }
{ "id":"15", "name":"...", ... }

Each record in the request may be parsed and processed individually as if it was an incoming stream. This has three advantages:

  1. Avoiding the server waiting for all the content to be received before parsing.
  2. Avoiding increased memory requirements by the consuming application to construct a single, properly formed JSON structure. Instead, consuming applications may construct individual records for resource creation and flush the request to the server, allowing the server to treat the incoming request as a stream and process records as they are available.
  3. Avoiding malformed JSON causing the entire request to fail.

The content type should be application/json-seq, as defined by RFC 7464 to differentiate between a single JSON document containing multiple records, and record-separated JSON documents that may be parsed individually.

Note: The OpenAPI Specification doesn’t currently support specifying this content type properly, since it isn’t an array. Therefore, use caution when choosing this approach as code generators, validators, and other automation tools may not be able to handle this content type properly.

Adaptive sync vs async processing design

For some situations, the amount of data provided, and the backend I/O required to support the operation, may be small enough to return the results immediately. In this case, return a 207 Multi-Status response with the details immediately.

For bulk imports that will take longer than a few seconds, the API should return validate the request, queue the work to be done, then respond immediately a 202 Accepted response along with a Location header to the resource instance that represents the bulk import process details. Clients may then follow up to see if the import was completed and obtain the results of the bulk import process, if so desired. This supports the requirement of offloading bulk/batch import operations to background workers.

Ensure effective implementation with Tyk’s tools and technologies

Tyk is the ideal solution for meeting your bulk and batch import needs (along with a whole bunch of other needs!). You can use our API experience platform to create, publish and maintain APIs at any scale, anywhere in the world. 

Using Tyk makes it easy to secure your APIs and build performant products that serve a wide range of use cases. Bulk and batch importing don’t need to be a headache when you have the right tools and technologies to hand. 

Wrap-up

Adding bulk or batch import support to your API can help you and your clients optimise their interactions. By applying the application/json-seq content type, each record can be parsed and processed independently while conserving memory and CPU by reducing the parsing effort required for very large documents. 

Finally, by using the different response codes in an appropriate manner, your API design can be used to adapt to a variety of circumstances, including high workload periods where resources are limited and or when large datasets are submitted.

Ready for more? Then why not check out our article on file upload APIs, as part of our API design guidance series?