James Higginbotham

Your data model is not an API

Just about every mobile and web application is backed by at least one database. These applications often require a web-based API to expose the database(s) for storing, querying, and retrieving necessary application data. But is exposing our data model directly to other developers the best approach to API design?

This article discusses the difference between data model design and API design, why we often use data models for our APIs, and whether this is the right approach to API design. We will also cover some techniques for ensuring that your API design hides your internal implementation details to ensure greater API longevity.

Data models are an implementation detail

Data models are a constant with web development. When new team members join, one of the first questions is, “Can I see the data model for the app?” A data model defines how information is stored and retrieved to support a solution. It reveals exactly what the app can (and cannot) offer to the end user.

Your data model is optimized for the internal details of our application – including the decisions we have made about our programming language, framework, and database vendor. If we design our API to directly map to our database, we risk exposing these internal implementation details to our API consumers.

Also, your API consumers don’t get the luxury of seeing the details your data model and code. They didn’t sit in on the endless meetings that resulted in the multitude of decisions that drove your data model design. They don’t have the context of why your data model was designed in a certain way. Yet, many teams have made the decision to expose an implementation decision – their data model – as an external API. Why?

Why do we model our APIs from the data model?

Teams often choose to model their API from data models for several reasons:

  1. It’s familiar – Developers building the API often modelled the data, so a bottom-up approach is often the most comfortable for developers
  2. It’s faster – Directly mapping a data model to an API is the fastest route for the API provider, when time is limited (and when isn’t our time limited?)
  3. It’s flexible – We often think that offering finer-grained access via your API allows consumers the greatest flexibility; however, this comes at the expense of slower application performance and more effort at integration time (as we will explore below)

Before we make the leap directly from our data model into an API, we need to reconsider what problems data models solve:

  • Transactional queries from end-users, via web, mobile, and API clients
  • Optimizing for read-based and/or write-based throughput, through data denormalization, physical storage optimizations, and caching strategies
  • Reporting and/or adhoc query support through star schemas, snowflake schemas, and other data warehousing strategies

A data model doesn’t concern itself with solving the desired outcomes of our API consumers. Instead, it addresses the architectural and operational needs of the solution. It’s up to us to build APIs and applications on top of the data model to provide the capabilities and desired outcomes that our API consumers need. Simply mirroring the data model as an API typically isn’t enough to solve the problems that API consumers encounter.

What happens when we expose our data model as an API?

In the effort to deliver APIs that are familiar, fast, and flexible, we push much of the burden of the API on our consumers:

Ever-changing APIs: Database schema changes will result in a constantly-changing API, as the API is forced to keep in lock-step with the underlying database. This change to the data model will force consumers to rewrite their API integration code every time the underlying data model changes, resulting in a change to the API definition.

Network chattiness: Exposing link tables as separate API endpoints will cause API “chattiness”, similar to how an n+1 query problem degrades database performance. However, while an n+1 problem can be a performance bottleneck for databases, API chattiness will have a devastating impact on API consumption performance due to the many separate HTTP calls necessary to render a single UI screen.

Confusing API details: Columns optimized for query performance, such as a CHAR(1) column that uses character codes to indicate status, become meaningless to your consumers without additional clarification.

Exposing sensitive data: Tools that build APIs directly from databases often expose all columns with a table using SELECT * FROM [table name]. However, this can expose data that API consumers should never see, such as personally identifiable information (PII). It may also expose data that helps hackers compromise systems through a better understanding of the internal details of the API.

Exposing implementation details: Our applications are designed and built on top of many decisions, constraints, and other internal details. Directly exposing our database pushes the results of these decisions directly to API consumers, without the benefit of the reasons why they exist.

Flexibility for the consumer comes at a price. Unless the vast majority of your API consumers want this level of control, it is best to focus on an API design that is separate from your data model, by hiding the implementation details of our app.

Your API design should hide implementation details

The technique of hiding internal implementation details and exposing an external API has been rooted in software design for decades. Terms such as modularization, loose coupling, and high cohesion are associated with great software design. Let’s review some of these concepts:

Modules are the smallest atomic unit within a software program and may be composed of one or more source files. Modules have a public-facing API to expose the functionality that they offer. Modules are sometimes known as components.

High cohesion is a term used when the code within a module is all closely related to the same functionality. A highly-cohesive module results in less “spaghetti code”, as method calls aren’t jumping all over the codebase. Instead, they are confined into one area, typically a code package or namespace.

Loose coupling hides a module’s internal details away from other modules, restricting the knowledge between modules to a public interface, or API that other areas of the code can invoke.

Your data model is not an API

When we apply these concepts into larger software solutions, we hide implementation details within a module from other modules. This results in some basic rules for modular software development that will benefit our API design:

  • A module’s internal implementation details should be hidden (e.g. its data model, classes/objects/functions, and the events it emits to process internal logic)
  • A module’s API is a contract with other modules that should not change when internal implementation details change (e.g. data model changes or switching to a new framework)
  • A module may emit messages and/or business events related to its capabilities, but they should not expose the internal implementation details (e.g. moving from RabbitMQ to Kafka to achieve greater scale shouldn’t impact API consumers)

By extending these rules to web-based APIs, we create modular APIs that focus on providing a set of capabilities to API consumers. These capabilities may be driven by a request/response HTTP API call or by subscribing to messages and events from the API through webhooks or websockets. As long as the capabilities that the API offers continues to adhere to the contract defined by the API, our API consumers can continue to use the API without fear of underlying implementation changes that could cause their integration code to stop functioning.

Conclusion: Be an API consumer advocate

As mentioned earlier, teams often default to directly exposing their data model as it is familiar, faster, and more flexible. Your API consumers care about these same attributes – but from their point of view, not yours:

  • Your API consumers want an API that is familiar to them, not to your implementation team
  • Your API consumers want an API that is fast to integrate for them, not for your internal team to implement
  • Your API consumers want an API that is flexible by offering capabilities to get things done, without creating lots of HTTP calls and stitching data together to achieve their desired outcomes

Finally, it is important to remember that 5 hours saved by your team may cost 10s to 100s of hours by your API consumers. This is because every consumer has to deal with the same API design flaws because of your exposed implementation details. A great API design makes it easy for API consumers by hiding internal implementation details, leading to faster integration and happy developers.