API Management 101: Rate Limiting

API rate limiting is one of the fundamental aspects of managing traffic to your APIs. It is important for quality of service, efficiency and security. It is also one of the easiest and most efficient ways to control traffic to your APIs.

What is API rate limiting and how does it work? 

An API rate limit refers to the number of calls the client (API consumer) can make in a second. Rate limits are calculated in requests per second (RPS). 

Let’s say you only want a client to call an API a maximum of 10 times per minute. You can apply a rate limit to expressed as “10 requests per 60 seconds”. The client will be able to call the API successfully up to 10 times within any 60-second interval. If they call the API any more within that timeframe, they’ll get an error stating they have exceeded their rate limit. 

Benefits of rate limiting 

API rate limiting can:

  • Help with API overuse caused by accidental issues within client code, which results in the API being slammed with requests. 
  • Prevent a denial-of-service (DoS) attack meant to overwhelm the API resources, which could easily be executed without rate limits in place. 
  • Protect your API from other events that would impact its availability. 
  • Ensure that everyone who calls your API receives an efficient, quality service. 
  • Support various API monetisation models. 

What are the different types of rate limiting?

There are different ways that you can approach API rate limiting. 

Key-level rate limiting is focused on controlling API traffic from individual sources and making sure that users are staying within their prescribed limits. You could limit the rate of calls the user of a key can make to all available APIs (i.e. a global limit) or to specific, individual APIs (a key-level-per-API limit). 

API-level rate limiting assesses all traffic coming into an API from all sources and ensures that the overall rate limit is not exceeded. This limit could be calculated by something as simple as having a good idea of the maximum number of requests you could expect from users of your API. It could also be something more scientific and precise, such as the number of requests your system can handle while still performing at a high level. You can quickly establish this threshold with performance testing. 

Which type of API rate limiting should you use?

These two approaches have different use cases. They can also be used in unison to power an overall API rate limiting strategy.

The simplest way to figure out which type of rate limits you should apply can be determined by asking a few questions:

  • Do you want to protect against denial of service attacks or overwhelming amounts of traffic from all users of the API? Then, go for an API-level global rate limit!
  • Do you want to limit the number of API requests a specific user can make to all APIs they have access to? Then choose a key-level global rate limit!
  • Do you want to limit the number of requests a specific user can make to specific APIs they have access to? Then it’s time for a key-level per-API rate limit.

How to implement rate limiting in API environments

If you want to implement API rate limiting, you have various strategies available to you, including several algorithm-based approaches. These include:

  • Leaky bucket – a first come, first served approach that queues items and processes them at a regular rate. 
  • Fixed window – a fixed number of requests are permitted in a fixed period of time (per second, hour, day and so on). 
  • Moving/sliding window – similar to a fixed window but with a sliding timescale, to avoid bursts of intense demand each time the window opens again. 
  • Sliding log – user logs are time stamped and the total calculated, with a limit set on the total rate. 

How to test API rate limiting

It’s important to test that your API rate limit is working as it should. It’s not the kind of thing you want untested when you’re facing a DoS attack! There are companies that will undertake API pen testing to test how robust your API security is, including how well your rate limiting works. 

You’ll also need to check your API rate limits are still appropriate as your business grows. An API management tool with a handy dashboard should make it easy for you to see which limits you have in place. 

How long does the rate limit last?

There is no fixed answer to how long an API rate limit lasts. It is common to apply a dynamic rate limit based on the number of requests per second, but you could also think in terms of minutes, hours or whatever timeframe best suits your business model. 

What is API throttling vs rate limiting?

There are two ways that requests can be handled once they exceed the prescribed limit. One is by returning an error (via API rate limiting); the other is by queueing the request (though throttling) to be executed later.

You can implement throttling at key or policy level, depending on your requirements. It’s a versatile approach that can work well if you prefer not to throw an error back when a rate limit is exceeded. By using throttling, you can instead queue the request to auto-retry. 

Throttling means that you can protect your API while still enabling people to use it. However, it can slow down the service that the user receives considerably, so how to throttle API requests needs careful thought in terms of maintaining service quality and availability. 

What does “API rate limit exceeded” mean?

“API rate limit exceeded” means precisely what it says – that the client trying to call an API has exceeded its rate limit. This will result in the service producing a 429 error status response. You can modify that response to include relevant details about why the response has been triggered. 

How to bypass an API rate limit

While API rate limiting can go a long way towards protecting the availability of your APIs and downstream services, it is not without its flaws. Some individuals have worked out how to bypass an API rate limit. In fact, they’ve worked out several ways to do so. 

If you use an IP-based rate-limiter, rather than key-level rate limiting, people could bypass your limits using proxy servers. They can multiply their usual quota by the number of proxies they can use. 

Key-based API rate limiting can also be bypassed, by people creating multiple accounts and getting numerous keys. 

There are other techniques out there, such as using client-side JavaScript to bypass rate limits, so be aware that knowing how to rate limit API products doesn’t make them impervious to being bypassed! 

We mentioned pen testing above. While you’re thinking about API functionality, performance and testing, why not check out this article on API testing tools?