API rate limiting explained: From basics to best practices

API rate limiting is a crucial security and performance technique that you can use to control how many times an API can be called within a specific timeframe. For API providers, it can ensure your API services remain stable, secure and available for all users. Whether you’re building your first API or managing existing ones, understanding rate limiting is essential for maintaining service quality.

What is API rate limiting and how does it work? 

An API rate limit refers to the number of calls the client (API consumer) can make in a second. Rate limits are calculated in requests per second (RPS).

Let’s say you only want a client to call an API a maximum of 10 times per minute. You can apply a rate limit expressed as “10 requests per 60 seconds”. The client will be able to call the API successfully up to 10 times within any 60-second interval. If they call the API any more that their request quota allows within that timeframe, they’ll get an error stating they have exceeded their rate limit.

Benefits of rate limiting 

The right approach to API rate limiting can deliver stability, security and availability benefits, supporting the overall performance of your API. Let’s break down some of the benefits.

Prevention of overuse

Rate limiting can prevent overuse of your API – along with the knock-on impact on performance. Overuse can occur for a range of reasons, such as accidental issues within client code slamming the API with requests. Rate limiting will protect your API and services from being overwhelmed by all those requests.

Protection against API abuse

Attempted overuse of your API is not always accidental. Without rate limiting in place, it would be easy for an attacker to carry out a denial of service (DoS) attack designed to overwhelm your API resources. Rate limiting can prevent such API abuse from succeeding.

High availability

It’s always crucial to provide your API consumers with a stable, high availability API. Rate limiting supports this goal by protecting your API from events that could impact its availability.

Service quality maintenanc

Everyone who calls your API deserves an efficient, quality service. Rate limiting supports fair usage and stability, ensuring you can deliver the reliability your consumers need.

Monetization

Rate limiting supports a range of API monetization models. We dive into the detail of API monetization strategies in this helpful article.

Resource optimization

Rate limiting can help manage and optimize server resources. This supports fair use of your API and reliability by preventing spikes in traffic and ensuring no single user monopolizes all resources.

Cost control

No business has money to burn. Rate limiting can help you keep your costs in check by preventing unanticipated and unnecessary spikes in resource consumption. It can also help you avoid the cost of having to sort out overwhelmed services – along with the associated stress.

Real-world rate limiting impact examples

Rate limiting isn’t just a nice theory. It’s delivering a tangible impact for enterprises around the globe every day. Some examples of real-world rate limiting in action include:

  • Financial services businesses deterring fraud by preventing excessive login attempts or sensitive transactions within a specified time period. 
  • Ecommerce retailers preventing excessive price scraping (which can impact website performance) by limiting the number of price checks that can be carried out within a set time.
  • Social media and review sites maintaining content quality by limiting the number of posts to prevent spam. 

The Google Maps API provides a real-world example of rate limiting. Google limits the number of geocoding requests per user. It does this to prevent excessive usage and maintain stability of its mapping service.  

What are the different types of rate limiting?

There are different ways that you can approach API rate limiting. 

Key-level rate limiting

Key-level rate limiting is focused on controlling API traffic from individual sources and making sure that users are staying within their prescribed limits. You could limit the rate of calls the user of a key can make to all available APIs (i.e. a global limit) or to specific, individual APIs (a key-level-per-API limit). 

API-level rate limiting

API-level rate limiting assesses all traffic coming into an API from all sources and ensures that the overall rate limit is not exceeded. This limit could be calculated by something as simple as having a good idea of the maximum number of requests you could expect from users of your API. It could also be something more scientific and precise, such as the number of requests your system can handle while still performing at a high level. You can quickly establish this threshold with performance testing.

User-based limiting 

User-based rate limiting focuses on applying rate limits based on specific users. Doing so ensures that individual users can’t overload the API, irrespective of which API key they use. You can set different thresholds for each user so that you can manage the load across your API. If you have users with varying levels of access, or your monetization strategy requires you to implement request throttling based on account tiers, this is an effective approach.  

IP-based limiting

IP-based rate limiting is all about controlling the rate of requests that come from specific IP addresses, supporting you to protect the performance of your API. This approach is helpful for defending against DoS and distributed denial of service (DDoS) attacks, which aim to overwhelm your API with excessive concurrent requests. Such attacks can quickly bring a service to its knees if an API is left undefended. IP-based rate limits can also help you identify and block malicious traffic, further supporting the reliable and secure continued operation of your API. 

 

Approach Description Key characteristics Use cases
Key-level rate limiting Controls API traffic based on the unique API key of the user. Can be global (for all APIs) or per API. – Rate limits based on individual API keys

– Can set global or per-API limits

– Ensures fairness per user

– Managing traffic per user or API key

– Freemium model APIs

– Balancing API usage across multiple services

API-level rate limiting Sets a rate limit based on the total traffic entering an API from all sources. – Limits traffic across all users accessing a specific API

– Focused on overall API health

– Simple to implement

– Preventing overload on a specific API

– Ensuring API scalability

– Handling unexpected traffic spikes

User-based rate limiting Limits requests based on specific users, ensuring individual user traffic is controlled. – Different thresholds for different users

– Can be tied to account tiers or access levels

– Prevents abuse by individual users

– Subscription-based models (tiered access)

– Personalized service limits for users

– User-specific rate control

IP-based rate limiting Limits the rate of requests from specific IP addresses. – Focuses on controlling traffic per IP

– Ideal for preventing DDoS attacks

– Can block or throttle malicious IPs

– Protecting against DDoS attacks

– Identifying and blocking malicious traffic

– Throttling traffic from untrusted sources

 

Which type of API rate limiting should you use?

The different approaches to rate limiting have distinct use cases, as shown in the table above. You can also use them in unison to power an overall API rate limiting strategy.

The simplest way to figure out which type of rate limits you should apply can be determined by asking a few questions:

  • Do you want to protect against DoS attacks or overwhelming amounts of traffic from all users of the API? Then, go for an API-level global rate limit.
  • Would you prefer to limit the number of API requests a specific user can make to all APIs they have access to? Then choose a key-level global rate limit.
  • Do you want to limit the number of requests a specific user can make to specific APIs they have access to? Then it’s time for a key-level-per-API rate limit.
  • Do you want to monetize your API via a range of subscription tiers? Then user-based limiting is the way forward. 
  • Is your goal to identify and block malicious traffic from specific sources? Then it’s IP-based rate limiting that you need.

Rate Limiting Best Practices

If you’re ready to embrace the benefits of rate limiting, be sure to do it right from the outset. Adhering to these rate limiting best practices will help.

Setting appropriate limits

Rate limits need to be both effective and realistic, based on your specific use case (hence us discussing those in detail above). Set limits that are too lenient and you leave your API open to abuse. Set limits that are too strict and you irritate your users. 

You can avoid these pitfalls by testing your API to understand its capacity. Testing can tell you how many requests your API can handle without a negative impact on performance. This provides you with a baseline when you start thinking about your rate limit policies. 

You’ll also need to factor in your users’ needs and their specific usage patterns. Your limits will need to align carefully with these in order to balance protecting your API’s performance with not frustrating your users. 

Error handling strategies

When you implement rate limiting policies, you need to think about the status codes that your API returns to users who have exceeded their limit. It’s standard practice to return a 429 HTTP status code, sure. But you have the power to help your users by making it clearer why they’ve received the code. A little bit of thought and a clear, user-friendly message can make a big difference to the user experience. Why not tell your user when they can make another request, for example, by using Retry-After rate limit headers? Or perhaps tell them how they can avoid hitting the limit again.  

User communication guidelines

As with any aspect of your API, you need to communicate your rate limiting policies clearly to your users, so they know what to expect. You can do so as part of your API documentation, within your service agreement and during onboarding, ensuring everybody is clear from the outset. Doing so supports superior user satisfaction levels, empowering your consumers to interact with your API in a well-considered manner that meets their needs and expectations. 

You can further delight your users by giving them access to details of their rate limit status via rate limit headers. For example, you can show a user how many requests they have left within the current window using an X-RateLimit-Remaining header. You can also let them know their overall limit by using an X-RateLimit-Limit header. Using rate-limit headers in this way serves as a request counter for the user, so they can remain in control and avoid hitting their limits unexpectedly (which is bound to result in frustration). 

Monitoring and adjusting limits

A “set it and forget it” approach won’t cut it for your rate limiting strategy. You’ll need to monitor your API traffic as usage grows and evolves, adjusting limits in line with those changes in usage. Doing so will ensure your rate limiting approach continues to achieve all the benefits we’ve outlined above. 

The adjustments you need to make will result on what your monitoring uncovers. Are particular endpoints subject to abuse? Does peak traffic warrant some tweaks? Your monitoring data should guide you here. 

It’s worth a quick mention of dynamic rate limiting at this point. This is where you use an API management solution to automate your rate limiting automatically in response to factors such as the API’s load or your users’ behavior. It means your API can maintain reliable performance even as your traffic flows change. 

Documentation requirements

There are all sorts of reasons why your API documentation should be clear, comprehensive and user-friendly. The quality of your documentation can have a major influence over everything from user satisfaction levels to customer churn.

When it comes to rate limits, your documentation should let users know precisely what to expect when they interact with your API. That means clearly stating the rate limits for each endpoint and the consequences of exceeding those limits. If you have different rate limits for different tiers or plans, make this clear in your documentation.

You’ll also need to include information on rate limit errors, wait times before retries, whether requests can be throttled, if retry logic is supported, and anything else relevant to your API rate limiting. 

How to implement rate limiting in API environments

If you want to implement API rate limiting, you have various strategies available to you, including several algorithm-based approaches. These rate limit algorithms include:

Leaky bucket – a first come, first served approach that queues items and processes them at a regular rate.

Fixed window – a fixed number of requests are permitted in a fixed period of time (per second, hour, day and so on).

Moving/sliding window – similar to a fixed window but with a sliding timescale, to avoid bursts of intense demand each time the window opens again.

Sliding log – user logs are time stamped and the total calculated, with a limit set on the total rate.

Monitoring and Maintenance

We talked a bit above about monitoring your rate limiting. Here’s how. 

Key metrics to track

The rate limiting metrics you track can give you insights into the effectiveness of your approach and whether you need to make any changes. Key metrics to track are the number of requests per second (or per minute, or per hour), the percentage of them that hit the rate limit and the number of 429 errors. These will give you the big picture.

You can also drill down into the details. This involves looking at the distribution of requests across your API users and consider response times and error rates in terms of the user experience and their impact on it. This can help identify a range of issues, from users who may be on the wrong tier to traffic issues caused by your rate limiting approach to malicious activity. 

Setting up alerts

You can link alerts to your monitoring, so that certain patterns or behaviours are flagged up. If your requests per second exceed a certain threshold, for example, you can have your system alert you to that fact. Doing so is a proactive way of keeping on top of your key rate limiting metrics. It means you can identify and rectify issues fast. 

Adjusting limits based on usage patterns

The results of your monitoring will give you the information you need to assess the impact of your rate limiting strategy. It’s important to review this regularly and to make adjustments in response to the findings. This is a normal part of the evolution of your API and how it is used. 

Performance impact analysis

It’s important to understand how your rate limiting policies impact the performance of your API and its stability. This means keeping an eye on system capacity, traffic patterns, the impact of malicious traffic and how your users typically behave. In addition to monitoring the metrics we mentioned above, you can keep a close watch of your server load using load testing tools. Check out this performance testing walkthrough for further details.

How to test API rate limiting

It’s important to test that your API rate limit is working as it should. It’s not the kind of thing you want untested when you’re facing a DoS attack! There are companies that will undertake API pen testing to test how robust your API security is, including how well your rate limiting works. 

You’ll also need to check your API rate limits are still appropriate as your business grows. An API management solution with a handy dashboard should make it easy for you to see which limits you have in place. 

How long does the rate limit last?

There is no fixed answer to how long an API rate limit lasts. It is common to apply a dynamic rate limit based on the number of requests per second, but you could also think in terms of minutes, hours or whatever timeframe best suits your business model. 

What is API throttling vs rate limiting?

There are two ways that requests can be handled once they exceed the prescribed limit. One is by returning an error (via API rate limiting); the other is by queueing the request (though throttling) to be executed later.

You can implement throttling at key or policy level, depending on your requirements. It’s a versatile approach that can work well if you prefer not to throw an error back when a rate limit is exceeded. By using throttling, you can instead queue the request to auto-retry. 

Throttling means that you can protect your API while still enabling people to use it. However, it can slow down the service that the user receives considerably, so how to throttle API requests needs careful thought in terms of maintaining service quality and availability. 

What does “API rate limit exceeded” mean?

“API rate limit exceeded” means precisely what it says – that the client trying to call an API has exceeded its rate limit. This will result in the service producing a 429 error status response. You can modify that response to include relevant details about why the response has been triggered. 

How to bypass an API rate limit

While API rate limiting can go a long way towards protecting the availability of your APIs and downstream services, it is not without its flaws. Some individuals have worked out how to bypass an API rate limit. In fact, they’ve worked out several ways to do so.

If you use an IP-based rate limiter, rather than key-level rate limiting, people could bypass your limits using proxy servers. They can multiply their usual quota by the number of proxies they can use.

Key-based API rate limiting can also be bypassed, by people creating multiple accounts and getting numerous keys. 

There are other techniques out there, such as using client-side JavaScript to bypass rate limits, so be aware that knowing how to rate limit API products doesn’t make them impervious to being bypassed! 

Conclusion

The right approach to rate limiting can enhance your APIs’ service quality, efficiency and security, supporting you to manage and control traffic in a way that delivers a superior UX. 

Next steps

We mentioned pen testing above. While you’re thinking about API functionality, performance and testing, why not check out this article on API testing tools?

Finally, remember that the Tyk team is always on hand to help. Whether you want to chat about rate limiting middleware, rate limit algorithms, distributed rate limiting or anything else, reach out to our team.