API rate limiting explained: From basics to best practices

Implement API rate limiting to control traffic to your APIs and prevent overwhelming amounts of requests.

API rate limiting is a crucial security and performance technique that controls how many times an API can be called within a specific timeframe. For API providers, it can ensure your API services remain stable, secure and available for all users.

Understanding rate limiting is essential for maintaining service quality.

What is API rate limiting and how does it work?

An API rate limit is the number of calls the client (API consumer) can make in a second. Rate limits are calculated in requests per second (RPS).

Example: You want a client to call an API a maximum of 10 times per minute. You apply a rate limit expressed as “10 requests per 60 seconds”. The client will be able to call the API successfully up to 10 times within any 60-second interval. If they call more than their request quota allows within that timeframe, they’ll get an error stating they have exceeded their rate limit.

Benefits of rate limiting

The right approach to API rate limiting can deliver stability, security and availability benefits, supporting the overall performance of your API. Let’s break down some of the benefits.

Prevention of overuse

Rate limiting can prevent overuse of your API – along with the knock-on impact on performance. Overuse can occur for a range of reasons, such as accidental issues within client code slamming the API with requests. Rate limiting protects your API and services from being overwhelmed by all those requests.

Protection against API abuse

Attempted overuse of your API is not always accidental. Without rate limiting in place, it would be easy for an attacker to carry out a denial of service (DoS) attack designed to overwhelm your API resources. Rate limiting can prevent API abuse like this from succeeding.

High availability

It’s crucial to provide your API consumers with a stable, high availability API. Rate limiting supports this goal by protecting your API from events (such as traffic spikes or misbehaving clients) that could impact its availability.

Service quality maintenance

Everyone who calls your API deserves an efficient, quality service. Rate limiting supports fair usage and stability, ensuring you can deliver the reliability your consumers need.

Monetization

Rate limiting supports a range of API monetization models. We dive into the details of API monetization strategies in this helpful article.

Resource optimization

Rate limiting can help manage and optimize server resources. This supports fair use of your API and reliability by preventing traffic spikes and ensuring no single user monopolizes all resources.

Cost control

No business has money to burn. Rate limiting can help you keep your costs in check by preventing unanticipated and unnecessary spikes in resource consumption. It can also help you avoid the cost of having to sort out overwhelmed services – along with the associated stress.

Real-world rate limiting impact examples

Rate limiting isn’t just a theoretical concept; it’s actively shaping how businesses operate worldwide. The impact of API rate limiting is felt daily across various industries. For instance:

Financial services businesses deter fraud by preventing excessive login attempts or sensitive transactions within a specified time period.
Ecommerce retailers prevent excessive price scraping (which can impact website performance) by limiting the number of price checks that can be carried out within a set time.
Social media and review sites maintain content quality by limiting the number of posts to prevent spam.

The Google Maps API provides a real-world example of rate limiting. Google limits the number of geocoding requests per user. It does this to prevent excessive usage and maintain stability of its mapping service.

What are the different types of rate limiting?

There are different ways that you can approach API rate limiting.

Key-level rate limiting

Key-level rate limiting focuses on controlling API traffic from individual sources and making sure users stay within their prescribed limits. You could limit the rate of calls the user of a key can make to all available APIs (i.e. a global limit) or to specific, individual APIs (a key-level-per-API limit).

API-level rate limiting

API-level rate limiting assesses all traffic coming into an API from all sources and ensures that the overall rate limit is not exceeded. This limit could be calculated by something as simple as having a good idea of the maximum number of requests you could expect from users of your API. It could also be something more scientific and precise, such as the number of requests your system can handle while still performing at a high level. For API providers, performance testing is a quick and easy way to establish this threshold.

User-based limiting

User-based rate limiting focuses on applying rate limits based on specific users. Doing so ensures that individual users can’t overload the API, irrespective of which API key they use. You can set different thresholds for each user to manage the load across your API. For API consumers with varying levels of access, or where your monetization strategy requires you to implement request throttling based on account tiers, this is an effective approach.

IP-based limiting

IP-based rate limiting is all about controlling the rate of requests that come from specific IP addresses, supporting you to protect the performance of your API. For security teams, this approach is helpful for defending against DoS and distributed denial of service (DDoS) attacks, which aim to overwhelm your API with excessive concurrent requests. Such attacks can quickly bring a service to its knees if an API is left undefended. IP-based rate limits can also help you identify and block malicious traffic, further supporting the reliable and secure continued operation of your API.

Approach	Description	Key characteristics	Use cases
Key-level rate limiting	Controls API traffic based on the unique API key of the user. Can be global (for all APIs) or per API.	– Rate limits based on individual API keys – Can set global or per-API limits – Ensures fairness per user	– Managing traffic per user or API key – Freemium model APIs – Balancing API usage across multiple services
API-level rate limiting	Sets a rate limit based on the total traffic entering an API from all sources.	– Limits traffic across all users accessing a specific API – Focused on overall API health – Simple to implement	– Preventing overload on a specific API – Ensuring API scalability – Handling unexpected traffic spikes
User-based rate limiting	Limits requests based on specific users, ensuring individual user traffic is controlled.	– Different thresholds for different users – Can be tied to account tiers or access levels – Prevents abuse by individual users	– Subscription-based models (tiered access) – Personalized service limits for users – User-specific rate control
IP-based rate limiting	Limits the rate of requests from specific IP addresses.	– Focuses on controlling traffic per IP – Ideal for preventing DDoS attacks – Can block or throttle malicious IPs	– Protecting against DDoS attacks – Identifying and blocking malicious traffic – Throttling traffic from untrusted sources

Which type of API rate limiting should you use?

The different approaches to rate limiting have distinct use cases, as shown in the table above. You can also use them in unison to power an overall API rate limiting strategy.

The simplest way to figure out which type of rate limits you should apply can be determined by asking a few questions:

Do you want to protect against DoS attacks or overwhelming amounts of traffic from all users of the API? Then go for an API-level global rate limit.
Would you prefer to limit the number of API requests a specific user can make to all APIs they have access to? Then choose a key-level global rate limit.
Do you want to limit the number of requests a specific user can make to specific APIs? Then it’s time for a key-level-per-API rate limit.
Do you want to monetize your API via a range of subscription tiers? Then user-based limiting is the way forward.
Is your goal to identify and block malicious traffic from specific sources? Then it’s IP-based rate limiting that you need.

Rate limiting best practices

If you’re ready to embrace the benefits of rate limiting, be sure to do it right from the outset. Adhering to these rate limiting best practices will help.

What is a good rate limit for an API?

The ideal rate limit depends on your specific use case, so there is no fixed “good” limit. You may, for example, choose to link rate limiting with tiered plans as part of your monetization strategy. As an example for a public API, you could set limits such as:

Basic tier: 100-10,000 requests per hour
Premium tier: 10,000-100,000 requests per hour
Enterprise tier: Unlimited or very high limit

These are just examples of rate limits – what “good” looks like can vary widely based on a number of factors, including:

Your server capacity and performance under load
User needs and typical usage patterns
Type of operations (read vs. write operations)
Resource intensity of each request
Business model and monetization strategy

Best practice is to start with conservative limits, monitor usage patterns, and adjust based on real-world data.

Setting appropriate limits

Rate limits need to be both effective and realistic, based on your specific use case. Set limits that are too lenient and you leave your API open to abuse. Set limits that are too strict and you irritate your users.

You can avoid these pitfalls by testing your API to understand its capacity. Testing can tell you how many requests your API can handle without a negative impact on performance. This provides you with a baseline when you start thinking about your rate limit policies.

You’ll also need to factor in your users’ needs and their specific usage patterns. Your limits will need to align carefully with these in order to balance protecting your API’s performance with not frustrating your users.

How to handle API rate limit errors

Handling rate limiting errors should be part of your overall approach to managing and troubleshooting errors and ensuring the health of your APIs. Let’s look at a couple of ways you can handle API rate limit errors.

Error handling strategies

When you implement rate limiting policies, think about the status codes that your API returns to users who have exceeded their limit. It’s standard practice to return a 429 HTTP status code, sure. But you have the power to help your users by making it clearer why they’ve received that code. A little bit of thought and a clear, user-friendly message can make a big difference to the user experience. Why not tell your user when they can make another request, for example, by using Retry-After rate limit headers? Or perhaps tell them how they can avoid hitting the limit again.

User communication guidelines

As with any aspect of your API, you need to communicate your rate limiting policies clearly to your users, so they know what to expect. You can do so as part of your API documentation, within your service agreement and during onboarding, ensuring everybody is clear from the outset. Doing so supports superior user satisfaction levels, empowering your consumers to interact with your API in a well-considered manner that meets their needs and expectations.

You can further delight your users by giving them access to details of their rate limit status via rate limit headers. For example:

Use an X-RateLimit-Remaining header to show the user how many requests they have left within the current window
Use an X-RateLimit-Limit header to let them know their overall limit

Using rate-limit headers in this way serves as a request counter for the user, so they remain in control and avoid hitting their limits unexpectedly.

Monitoring and adjusting limits

A “set it and forget it” approach won’t cut it for your rate limiting strategy. You’ll need to monitor your API traffic as usage grows and evolves, adjusting limits in line with those changes. Doing so will ensure your rate limiting approach continues to achieve all the benefits we’ve outlined above.

The adjustments you need to make will result from whatever your monitoring uncovers. Are particular endpoints subject to abuse? Does peak traffic warrant some tweaks? Your monitoring data should guide you here.

It’s worth a quick mention of dynamic rate limiting at this point. This is where you use an API management solution to automate your rate limiting in response to factors such as the API’s load or user behavior. It means your API can maintain reliable performance even as your traffic flows change.

Documentation requirements

Your API documentation must be clear, comprehensive, and user-friendly. The quality of your documentation can have a major influence over everything from user satisfaction levels to customer churn. Remember that, these days, it also needs to be presented in an AI-friendly format, as well as a human-readable one. AI agents are your consumers now, too.

When it comes to rate limits, your documentation should let users know precisely what to expect when they interact with your API. That means clearly stating the rate limits for each endpoint and the consequences of exceeding those limits. If you have different rate limits for different tiers or plans, make this clear in your documentation.

You’ll also need to include information on rate limit errors, wait times before retries, whether requests can be throttled, if retry logic is supported, and anything else relevant to your API rate limiting.

How to implement rate limiting in API environments

If you want to implement API rate limiting, you have various strategies available to you, including several algorithm-based approaches. Common rate limiting algorithms include:

Fixed window: Simple, but prone to bursts
Sliding window: More accurate, smoother traffic control
Leaky bucket: Predictable output rate
Token bucket: Flexible burst handling
Sliding log: Highly precise request tracking, but more memory-intensive and less scalable at high volumes

How to implement rate limiting in REST API

There are several ways you can approach implementing rate limiting in REST APIs. You could use JavaScript, for example, to implement it at the application level. For greater consistency across your API infrastructure, you could instead implement rate limiting at the API gateway level. You can configure rate limits in your gateway and set limits per API key, IP address, or user. Choose based on your architecture, scalability needs, and existing infrastructure.

Monitoring and maintenance

Here’s how to monitor and maintain your rate limiting.

Key metrics to track

The rate limiting metrics you track can confirm the effectiveness of your approach and whether you need to make any changes. Key metrics to track are:

Number of requests per second (or per minute, or per hour)
Percentage of requests that hit the rate limit
Number of 429 errors

These will give you the big picture. You can also drill down into the details. This involves looking at factors such as:

The distribution of requests across your API users
Response times and error rates in terms of the user experience and their impact on it

This can identify a range of issues, from users who may be on the wrong tier to traffic problems caused by your rate limiting approach to malicious activity.

Setting up alerts

You can link alerts to your monitoring, to flag certain patterns or behaviors.

Example: If your requests per second exceed a certain threshold, your system alerts you to it. This is a proactive way of keeping on top of your key rate limiting metrics so you can identify and rectify issues fast.

Adjusting limits based on usage patterns

The results of your monitoring will give you the information you need to assess the impact of your rate limiting strategy. It’s important to review this regularly and to make adjustments in response to the findings. This is a normal part of the evolution of your API and how it is used.

Performance impact analysis

It’s important to understand how your rate limiting policies impact the performance of your API and its stability. This means keeping an eye on:

System capacity
Traffic patterns
The impact of malicious traffic
How users typically behave

In addition to monitoring the metrics we mentioned above, you can keep a close watch on your server load using load testing tools. Check out this performance testing walkthrough for further details.

How to test API rate limiting

It’s important to test if your API rate limit is working as it should. It’s not the kind of thing you want untested when you’re facing a DoS attack! There are companies that will undertake API pen testing to test how robust your API security is, including how well your rate limiting works.

You’ll also need to check that API rate limits are still appropriate as your business grows. An API management solution with a handy dashboard should make it easy for you to see which limits you have in place.

How to increase an API rate limit

If you’re an API consumer and you need to increase a rate limit, you can:

Upgrade your plan: Move to a higher tier with increased limits
Contact the provider: Request limit increases for legitimate business needs
Optimize usage: Reduce unnecessary requests to stay within limits
Use multiple API keys: Some providers allow multiple keys per account
Implement caching: Store responses to reduce API calls

If you’re an API provider seeking to increase an API rate limit, you have several things to consider:

Performance testing: Ensure your infrastructure can handle higher loads
Gradual increases: Implement tiered limits based on user verification
Monitor impact: Track server performance when increasing limits
Premium tiers: Offer paid plans with higher rate limits
Dynamic scaling: Use auto-scaling infrastructure to handle traffic spikes

Always remember to balance user needs with system stability and cost considerations.

How long does the rate limit last?

There is no fixed answer to how long an API rate limit lasts. It is common to apply a dynamic rate limit based on the number of requests per second, but you could also think in terms of minutes, hours, or whatever timeframe best suits your business model.

How to avoid API rate limiting

There are several ways that API providers can try to avoid their consumers hitting rate limits and experiencing the resulting disruption. These include:

Setting realistic limits based on actual capacity
Implementing tiered pricing with an easy way to move up a tier and access higher limits when required
Using intelligent rate limiting that considers request complexity
Providing clear documentation about limits and best practices

What is API throttling vs rate limiting?

Rate limiting blocks requests once a limit is exceeded, while throttling delays requests to preserve service availability without returning errors.

They are two different approaches to handling requests once they exceed the prescribed limit. One is by returning an error (via API rate limiting); the other is by queueing the request (though throttling) to be executed later.

You can implement throttling at key or policy level, depending on your requirements. It’s a versatile approach that can work well if you prefer not to throw an error back when a rate limit is exceeded. By using throttling, you can instead queue the request to auto-retry.

Throttling means that you can protect your API while still enabling people to use it. However, it can slow down the service that the user receives considerably, so how to throttle API requests needs careful thought in terms of maintaining service quality and availability.

What does “API rate limit exceeded” mean?

“API rate limit exceeded” means precisely what it says – that the client trying to call an API has exceeded its rate limit. This will result in the service producing a 429 error status response. As an API provider, you can modify that response to include relevant details about why the response has been triggered.

How to bypass an API rate limit

While API rate limiting can go a long way towards protecting the availability of your APIs and downstream services, it is not without its flaws. Some individuals have worked out how to bypass an API rate limit. In fact, they’ve worked out several ways to do so.

If you use an IP-based rate limiter, rather than key-level rate limiting, people could bypass your limits using proxy servers. They can multiply their usual quota by the number of proxies they use.
Key-based API rate limiting can also be bypassed, by people creating multiple accounts and getting numerous keys.

There are other techniques out there, such as using client-side JavaScript to bypass rate limits, so be aware that knowing how to rate limit API products doesn’t make them impervious to being bypassed!

Conclusion

The right approach to rate limiting can enhance your APIs’ service quality, efficiency and security, supporting you to manage and control traffic in a way that delivers a superior UX.

Next steps

The Tyk team is always on hand to help. Whether you want to chat about rate limiting middleware, rate limit algorithms, distributed rate limiting, or anything else, reach out to our team.

Tyk API Management

Deployment Options

Develop

Operate

Govern

Publish

Tyk Self-managed

Run Tyk on-prem or in your cloud for complete control over data, security, and operations

Tyk Hybrid

Blend cloud convenience with local gateways and centralised/ managed control plane for secure, scalable growth across multi-cloud and regions.

Tyk Cloud

Use Tyk as a fully managed cloud service for effortless scaling and low overhead.

Industries

Ecosystem

Comparing

Explore

Events

Company

News