As a client-facing engineer at Tyk, I’m often asked “How fast is Tyk’s API Gateway?”. My answer is always: “damn fast – and feature packed too!”. But how do you get the optimum performance from your Tyk installation?
In this post we’ll be walking through step-by-step how you can get some insanely low latency and high throughput out of a tiny commodity server. We will get down to the nitty gritty and show you some actual numbers to help effectively tune and size your Tyk Gateway installation.
I will be showing you how to:
- Increase throughput from ~5000 requests per second to ~6400 requests per second
- Reduce 95th %ile gateway latency by ~54% (9.2ms to 4.2ms)
- Reduce CPU consumption by ~40%
In order to achieve all of the above, we will be tweaking just a single environment variable: GOGC.
We will be deploying Tyk Gateway, Redis Database, Upstream Server & Load Generator ALL to a single commodity 2-core DigitalOcean Droplet.
In client production deployments, we have seen Tyk Gateways performing TLS termination, Authentication, Rate-Limiting & Quota checking plus analytics recording alongside Statsd instrumentation – with the Tyk Gateway introducing just 1.5ms of latency at a sustained mission critical 60k transactions per second.
This performance blog is not supposed to be scientifically accurate – if you want to do that, feel free to re-create in a more sanitised environment and let me know your results over on the Tyk Community.
Does that sound like fun? Let’s get benching!
But first, how not to deploy Tyk
When you view showcase WordPress themes they are almost always gorilla-slow: that is, very large and with assets that are megabytes in size when the browser downloads. In real life, you will never use every single asset that the theme provides, you will likely minify and apply a few tweaks, only enabling what you need for the production site.
With this in mind, I’ve seen a couple of benchmarks of Tyk in the wild, and it is instantly clear that the deployment option chosen was a quick-start installation. While this is perfect for a few key use-cases (see next paragraph), it’s important to remember that, much like a WordPress theme, the easiest & fastest way to deploy Tyk is never going to be optimally configured.
The installation I refer to is the Tyk Pro Docker Demo, which is an all-in-one installation of the professional edition of Tyk. It is great at showing off features & functionality of Tyk as an enterprise API Management Platform, which can be deployed pretty much anywhere docker is available, but the amount of resource contention involved when benchmarking against this docker-compose micro-service architecture should not be surprising.
By design, Tyk is deployed as a collection of cloud-native micro-services, all working harmoniously together. Possibly the only micro-services architecture API management solution in the wild to manage your micro-services. As such, taking an out-of-the-box docker-compose to benchmark it without taking the time to understand how each component interacts with each other will almost certainly result in sub-optimal and inconsistent results. See our performance benchmarks or learn about API management.
Sizing & Configuration
This short section will provide some insights on how to go about sizing Tyk & its dependencies.
Tyk’s API Gateway is possibly the lightest and most performant solution in the industry, its only dependency being a small Redis Server.
That’s right – there is no need for MongoDB, Cassandra, Postgres or any of these heavy, computationally & memory expensive databases for an Open Source installation of Tyk. In AWS, you could simply use ElastiCache, if GCP, you could use MemoryStore.
Redis
Redis requires enough ram to hold your tokens, temporary analytics and fast networking.
A default Redis installation will persist to disk in a non-optimal fashion, in doing so, is likely to choke the Gateway with the default configurations. For simplicity, we would recommend disabling AOF to reduce disk space usage. SAVE locks the database, and as such, BGSAVE will allow snapshotting in a non-blocking fashion.
In a real-world scenario, we recommend a Redis-cluster deployment, which will allow for linear performance improvements by sharding the data which is especially useful when you enable authentication, handle rate-limits and quotas. With Redis Cluster, you also get High Availability baked in with automatic failover. Introduction to Redis Cluster Tyk can easily speak with a Redis Cluster, or single Redis instance.
For the purposes of this article, and to illustrate my point, we are simply deploying a single default configuration Redis master node on the same tiny Digital Ocean droplet as the gateway.
Tyk Open Source API Gateway
Tyk’s open source API gateway requires CPU, a small amount of RAM and fast networking. The Gateway is multi-threaded, and as such, will by default, under load, consume all the CPU made available to it by the host operating system.
We would always recommend compute optimised nodes if deploying to a cloud provider like DO, Linode, AWS, Azure or GCP. Cloud providers usually provide a good balance of RAM to CPU, and RAM should not really be a concern for Tyk to perform optimally, unless you intend to also run RAM heavy applications on the same node.
Once Tyk Gateways start consuming 60-70% CPU, we would recommend a scaling event. In the event that a node fails or is taken out for maintenance, then the remaining nodes in the cluster should have enough CPU remaining to handle the traffic from the failed node without suffering significant performance degradation.
Installation
We will deploy a commodity DigitalOcean server with 4GB RAM and 2 vCPUs for this benchmark. I will use a one-click-app with Docker deployed on Ubuntu 18.04 for speed of deployment.
Going against everything I have said above, I will be deploying upstream, load server, Gateway and Redis all on the same box.
This post is about how we can easily tune Tyk Gateway performance so the fact that all services are deployed on the same machine (resource contention) only proves that Tyk can go significantly faster if configured appropriately. Moving the target upstream and load generating servers to their own boxes, or ensuring appropriate resource limits if deploying inside Kubernetes for example.
Infrastructure / Application Setup
First, we install & startup NGinX which will be our baseline upstream server – you could use any performant upstream server you like really.
# install nginx apt install nginx systemctl start nginx
# Install Redis apt install redis-server systemctl start redis-server # Install Tyk Gateway apt install curl gnupg apt-transport-https curl -L https://packagecloud.io/tyk/tyk-gateway/gpgkey | sudo apt-key add - cat > /etc/apt/sources.list.d/tyk_tyk-gateway.list <<- SOURCES deb https://packagecloud.io/app/tyk/tyk-gateway/search?dist=ubuntu trusty main deb-src https://packagecloud.io/app/tyk/tyk-gateway/search?dist=ubuntu trusty main SOURCES
apt update apt install tyk-gateway systemctl start tyk-gateway # check its up curl -i localhost:8080/hello HTTP/1.1 200 OK Date: Sun, 03 Feb 2019 22:25:21 GMT Content-Length: 10 Content-Type: text/plain; charset=utf-8 Hello Tiki
Gateway Proxy Configuration
# set env var to get ip address of eth1 easily export privateip=$(ifconfig eth1 | grep "inet " | awk '{print $2}') # configure a keyless API to override sample application cat > /opt/tyk-gateway/apps/app_sample.json <<- EOF { "name": "nginx", "api_id": "nginx", "org_id": "default", "definition": { "location": "", "key": "" }, "use_keyless": true, "auth": { "auth_header_name": "" }, "version_data": { "not_versioned": true, "versions": { "Default": { "name": "Default", "expires": "3000-01-02 15:04", "use_extended_paths": true, "extended_paths": { "ignored": [], "white_list": [], "black_list": [] } } } }, "proxy": { "listen_path": "/nginx/", "target_url": "https://$privateip/", "strip_listen_path": true }, "do_not_track": true } EOF # reload Gateway configurations via it’s control API curl -H "x-tyk-authorization: 352d20ee67be67f6340b4c0605b044b7" \ localhost:8080/tyk/reload/ # or just restart the service given that we are not in production systemctl restart tyk-gateway # test it curl -i localhost:8080/nginx/ HTTP/1.1 200 OK Content-Type: text/html Date: Sun, 03 Feb 2019 22:38:53 GMT Etag: W/"5c5747c8-264" Last-Modified: Sun, 03 Feb 2019 19:58:00 GMT Server: nginx/1.14.0 (Ubuntu) X-Ratelimit-Limit: 0 X-Ratelimit-Remaining: 0 X-Ratelimit-Reset: 0 Content-Length: 612 <!DOCTYPE html> <html> --- SNIP ---
As you can see, this is an out-of-the-box Tyk installation.
Setting up benchmarking
In order to benchmark Tyk, we will use docker-hey https://hub.docker.com/r/rcmorano/docker-hey
# fix firewall rules to allow docker0 access to host machine iptables -A INPUT -i docker0 -j ACCEPT docker pull rcmorano/docker-hey
Baseline NginX Upstream
# benchmark nginx directly to act as a baseline
docker run --rm -it rcmorano/docker-hey -z 30s https://$privateip/ Summary: Total: 30.0029 secs Slowest: 0.0305 secs Fastest: 0.0001 secs Average: 0.0029 secs Requests/sec: 17244.7455 Response time histogram: 0.000 [1] | 0.003 [303947] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.006 [193680] |■■■■■■■■■■■■■■■■■■■■■■■■■ 0.009 [16511] |■■ 0.012 [2364] | 0.015 [562] | 0.018 [202] | 0.021 [38] | 0.024 [66] | 0.027 [20] | 0.031 [1] | Latency distribution: 10% in 0.0007 secs 25% in 0.0016 secs 50% in 0.0027 secs 75% in 0.0040 secs 90% in 0.0050 secs 95% in 0.0058 secs 99% in 0.0083 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0001 secs, 0.0305 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0225 secs resp wait: 0.0026 secs, 0.0000 secs, 0.0241 secs resp read: 0.0002 secs, 0.0000 secs, 0.0220 secs Status code distribution: [200] 517392 responses
So, out of the box, NginX performs as follows – even though docker-hey is also consuming CPU and eating system resources.
Slowest | 30.5ms |
Average | 2.9ms |
Median | 2.7ms |
95%ile | 5.8ms |
Baseline Tyk Reverse Proxy to NginX
So now that we have our baseline, we should see how Tyk out of the box performs as a transparent reverse proxy to NginX – even with some resource contention.
docker run --rm -it rcmorano/docker-hey -z 30s https://$privateip:8080/nginx/ Summary: Total: 30.0035 secs Slowest: 0.0768 secs Fastest: 0.0003 secs Average: 0.0099 secs Requests/sec: 5056.8813 Response time histogram: 0.000 [1] | 0.008 [67948] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.016 [58333] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.023 [22119] |■■■■■■■■■■■■■ 0.031 [2519] |■ 0.039 [544] | 0.046 [198] | 0.054 [33] | 0.062 [18] | 0.069 [6] | 0.077 [5] | Latency distribution: 10% in 0.0034 secs 25% in 0.0055 secs 50% in 0.0087 secs 75% in 0.0135 secs 90% in 0.0177 secs 95% in 0.0203 secs 99% in 0.0268 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0003 secs, 0.0768 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0022 secs resp wait: 0.0098 secs, 0.0002 secs, 0.0768 secs resp read: 0.0000 secs, 0.0000 secs, 0.0054 secs Status code distribution: [200] 151724 responses
Baseline Latency | Latency (ms) | Introduced Latency (ms) | |
Slowest | 30.5 | 76.8 | +46.3 |
Average | 2.9 | 9.9 | +7 |
Median | 2.7 | 8.7 | +6 |
95%ile | 5.8 | 20.3 | +14.5 |
Not bad – Tyk is able to handle 5000 rps out of the box despite the resource contention – this tiny machine is working very hard and all the CPU is being consumed.
The problem we have here – is that our test is extremely aggressive – the average/median latency is acceptable, however there are some outliers which causes average latency to increase. The slowest request introduces 46ms latency which for me is unacceptable.
Resource Contention
Now in a real world scenario, you simply wouldn’t hammer the Gateway – the added latency for some of the requests is just not acceptable. Let’s throttle our rps a little.
By default, docker-hey has 50 concurrent clients. The Gateway can handle 5k rps max in this setup. So by turning down the requests per second by just 5%, let’s see how latency improves, ensuring that requests come back in a timely and consistent fashion. We will set the rate limit per consumer to 95.
Rate limit per consumer = 5000/50 * 95% = 95
docker run --rm -it rcmorano/docker-hey -z 30s -q 95 https://$privateip:8080/nginx/ Summary: Total: 30.0138 secs Slowest: 0.0506 secs Fastest: 0.0002 secs Average: 0.0069 secs Requests/sec: 4713.9667 Response time histogram: 0.000 [1] | 0.005 [58984] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.010 [57735] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.015 [18420] |■■■■■■■■■■■■ 0.020 [5468] |■■■■ 0.025 [740] |■ 0.030 [110] | 0.036 [22] | 0.041 [2] | 0.046 [1] | 0.051 [1] | Latency distribution: 10% in 0.0027 secs 25% in 0.0040 secs 50% in 0.0060 secs 75% in 0.0090 secs 90% in 0.0127 secs 95% in 0.0150 secs 99% in 0.0192 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0002 secs, 0.0506 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0032 secs resp wait: 0.0069 secs, 0.0002 secs, 0.0506 secs resp read: 0.0000 secs, 0.0000 secs, 0.0121 secs Status code distribution: [200] 141484 responses
Whilst we lost just shy of 350 requests per second, look what we gained!
Baseline Latency | Latency (ms) | Introduced Latency (ms) | |
Slowest | 30.5 | 50.6 | +20.1 |
Average | 2.9 | 6.9 | +4 |
Median | 2.7 | 6.0 | +3.3 |
95%ile | 5.8 | 15.0 | +9.2 |
Latency is now at a much more acceptable level – by not overloading the CPU. Notice how we have still not actually done any tuning, but have still managed to reduce CPU consumption by more than 10%.
Tyk Gateway is written in Go
Unlike various API Management solutions Tyk is written in pure Go. Go is a modern systems programming language, and is being actively developed by Google, to solve Google scale problems. Tyk does not rely on 3rd party software such as NginX to do all the heavy lifting. Go provides both Type safety, and also offers a Garbage Collector.
The Go programming language has an enviably performant concurrent, tri-colour, mark sweep garbage collector first proposed by Edsger W. Dijkstra in 1978. If you want the details, https://dl.acm.org/citation.cfm?id=359655. At a very high-level, stop-the-world GC pauses are now well below 10ms.
Despite the awesome GC, at such high throughput, compute is now being spent on scheduling the garbage collector, and collecting that garbage.
At the expense of a little extra RAM usage, we can tweak the scheduler so that it runs less often. The default value is 100%, and each time we double it, the collection frequency is halved. I’m going to set the value to 3200%. This can be achieved simply by setting a system environment variable GOGC.
It’s worth playing around with this figure for specific use-case – don’t set it too high, or you will run out of RAM and crash the VM. There will come a point where you don’t get noticeable gains. I doubled the GOGC value each test to find what the optimum figure was and that’s how I arrived at the 3200% figure.
Tuning Tyk’s Garbage Collector
vi /lib/systemd/system/tyk-gateway.service # add this line inside the [Service] directive Environment="GOGC=3200" # save, quit and restart daemon and service systemctl daemon-reload systemctl restart tyk-gateway
Let’s re-run the benchmark
docker run --rm -it rcmorano/docker-hey -z 30s -q 95 https://$privateip:8080/nginx/ Summary: Total: 30.0065 secs Slowest: 0.0425 secs Fastest: 0.0003 secs Average: 0.0048 secs Requests/sec: 4739.7088 Response time histogram: 0.000 [1] | 0.004 [69149] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.009 [69266] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.013 [2489] |■ 0.017 [841] | 0.021 [373] | 0.026 [67] | 0.030 [16] | 0.034 [4] | 0.038 [9] | 0.043 [7] | Latency distribution: 10% in 0.0027 secs 25% in 0.0035 secs 50% in 0.0045 secs 75% in 0.0056 secs 90% in 0.0067 secs 95% in 0.0077 secs 99% in 0.0126 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0003 secs, 0.0425 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0045 secs resp wait: 0.0047 secs, 0.0002 secs, 0.0425 secs resp read: 0.0000 secs, 0.0000 secs, 0.0052 secs Status code distribution: [200] 142222 responses
Holy Cow – we just dropped from 82% CPU usage down to less than 60%!
Holy Cow – we just dropped from 82% CPU usage down to less than 60%! We are now using 26% less CPU than the previous test and about 40% less CPU than our first bench.
Tyk is still handling the same RPS, but rather than waste compute on collecting garbage, the Gateway is being able to proxy requests.
Baseline Latency | Latency (ms) | Introduced Latency (ms) | |
Slowest | 30.5 | 42.5 | +12 |
Average | 2.9 | 4.8 | +1.9 |
Median | 2.7 | 4.5 | +1.8 |
95%ile | 5.8 | 7.7 | +1.9 |
The slowest request is just 12ms slower than the slowest baseline request. Tyk is now adding sub 2ms 95%ile latency.
At 60-70% CPU usage we would recommend adding another Tyk to the cluster. But this is a benchmarking blog – so let’s up the RPS. Let’s try full speed again to see what Tyk can handle when maxing out.
docker run --rm -it rcmorano/docker-hey -z 30s https://$privateip:8080/nginx/ Summary: Total: 30.0050 secs Slowest: 0.0955 secs Fastest: 0.0003 secs Average: 0.0072 secs Requests/sec: 6897.5780 Response time histogram: 0.000 [1] | 0.010 [165543] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.019 [38325] |■■■■■■■■■ 0.029 [2665] |■ 0.038 [332] | 0.048 [75] | 0.057 [6] | 0.067 [8] | 0.076 [2] | 0.086 [3] | 0.096 [2] | Latency distribution: 10% in 0.0030 secs 25% in 0.0046 secs 50% in 0.0066 secs 75% in 0.0091 secs 90% in 0.0120 secs 95% in 0.0144 secs 99% in 0.0212 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0003 secs, 0.0955 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0042 secs resp wait: 0.0072 secs, 0.0002 secs, 0.0955 secs resp read: 0.0000 secs, 0.0000 secs, 0.0059 secs Status code distribution: [200] 206962 responses req write: 0.0000 secs, 0.0000 secs, 0.0078 secs resp wait: 0.0072 secs, 0.0002 secs, 0.0684 secs resp read: 0.0000 secs, 0.0000 secs, 0.0069 secs Status code distribution: [200] 205300 responses
So the Gateway is now handling 6900 rps rather than the original 5000 rps (27% increase), and latency is still pretty good adding just 8.6ms at the 95th %ile – That’s a massive 40% improvement in latency from the out-of-the-box installation of Tyk.
Baseline Latency | Tyk Latency (ms) | Introduced Latency (ms) | |
Slowest | 30.5 | 95.5 | +65.5 |
Average | 2.9 | 7.2 | +4.3 |
Median | 2.7 | 6.6 | +3.9 |
95%ile | 5.8 | 14.4 | +8.6 |
By dialling back the throughput once again by about 5% so that we are not overloading the CPUs – we set q to 131 (6900/50*0.95).
docker run --rm -it rcmorano/docker-hey -z 30s -q 131 https://$privateip:8080/nginx/ Summary: Total: 30.0058 secs Slowest: 0.0473 secs Fastest: 0.0003 secs Average: 0.0052 secs Requests/sec: 6412.0916 Response time histogram: 0.000 [1] | 0.005 [105806] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.010 [76053] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.014 [7891] |■■■ 0.019 [1834] |■ 0.024 [563] | 0.028 [194] | 0.033 [52] | 0.038 [4] | 0.043 [1] | 0.047 [1] | Latency distribution: 10% in 0.0027 secs 25% in 0.0036 secs 50% in 0.0047 secs 75% in 0.0060 secs 90% in 0.0079 secs 95% in 0.0100 secs 99% in 0.0158 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0003 secs, 0.0473 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0026 secs resp wait: 0.0051 secs, 0.0002 secs, 0.0473 secs resp read: 0.0000 secs, 0.0000 secs, 0.0034 secs Status code distribution: [200] 192400 responses
Baseline Latency | Total Latency (ms) | Tyk Introduced Latency (ms) | |
Slowest | 30.5 | 47.3 | +17 |
Average | 2.9 | 5.2 | +2.3 |
Median | 2.7 | 4.7 | +2.0 |
95%ile | 5.8 | 10.0 | +4.2 |
CPU is at a much healthier 80%, we are still handling 6400 requests per second (1400 rps more than original) and are introducing just 2.3ms average latency or 4.2ms latency at the 95th %ile. Therefore a 54% improvement on the default Tyk Installation.
Summary
It is important to remember that everything has been deployed on a 2-virtual-core commodity server with just 4GB of RAM.
Because we still have significant resource contention from the upstream server, load server, Redis and the Gateway all deployed on the same box, there is no way of knowing the true performance of the gateway without splitting out the components onto their own machines, or dedicating compute to each component.
Despite all of this, by tweaking a single environment variable, Tyk has been able to handle 6400 requests per second. 95% of requests come back within 10ms and subtracting NginX’s baseline latency, Tyk has introduced just 4.2ms 95th %ile latency to the request.
To top that, all the services combined are only consuming 80% CPU and less than 600Mb RAM. Further optimisations can be made to Tyk configurations to tune Tyk further in accordance with your specific use-case.
Want to give it a go? We’d love to hear your results! Drop us a line via the Tyk Community to share your own benchmarks, and help us continue to improve Tyk performance.
Happy benching!