Tyk – A GraphQL API Gateway at the speed of light

Publish on 21 Dec, 2020 - by Budhaditya Bhattacharya
Last updated: 06 Mar, 2024

API Gateways for REST APIs are a well known topic. Solving the same problems for GraphQL APIs is a different story. However, we didn’t just make GraphQL APIs as secure as REST APIs. Instead, we took it to the next level and let you combine REST & GraphQL into one unified GraphQL API. In this post, well look into how Tyk’s Universal Data Graph works and what makes it so scalable.. Read on to discover:

How GraphQL servers traditionally behave
What GraphQL gateways need to do and why that’s a barrier to great performance
How Tyk did things differently

Let me walk you through it…

Handling GraphQL requests at the API gateway layer

API gateways are responsible for a set of difficult but very important tasks. They take care of authorisation and authentication, make sure requests are valid, do content negotiation, mediation, caching and more.

There’s plenty of info out there about doing all these tasks for REST APIs but not so much when it comes to GraphQL. Handling GraphQL requests at the API gateway layer can be a seriously expensive and resource-intensive operation, especially if your upstream is not just a single GraphQL server.

And so Tyk decided to step in. In its current version, Tyk’s Universal Data Graph (UDG) supports combining multiple GraphQL and REST APIs into a single unified Data Graph. With our next release, you will be able to use Apollo Federation and GraphQL Schema Stitching at the same time, while also being able to add REST APIs into the mix.

The anatomy of a GraphQL request

In order to understand what makes UDG so fast, we first need to understand how GraphQL servers traditionally behave. If you want to dive deep into this topic, check out this post by Craig Taub. For brevity, we’ll keep it short and simple here. GraphQL servers handle:

Lexing of the query (turning text into tokens)
Parsing the GraphQL operation/query (building an Abstract Syntax Tree or AST from the tokens)
Validating the operation (analysing the AST to see if the operation is valid)
Executing the operation (traversing all field nodes in the AST recursively until exhausted)

Those are some fairly complex tasks and depending on the size of the query, the AST can grow to a gigantic tree structure. This will further have to be built for each request and traversed multiple times for validation and execution.

What do GraphQL gateways need to do?

A GraphQL API Gateway needs to handle:

Lexing of the query
Parsing
Normalisation (removing whitespace, duplicate fields, etc.)
Validation
Enforcing field level authorisation
Calculating the complexity of the query
Enforcing rate limits and quotas
Printing the query (because we modified and cleaned it)
Sending the request to the upstream
Validating that the response conforms to the GraphQL schema
Returning the response to the client

Normalisation, calculating the complexity of the GraphQL Operation and printing the outbound query all mean walking the AST (and potentially modifying it), while the latter also means printing it as a human-readable string, the sanitized GraphQL query document.

And the fun doesn’t stop there. Once the response comes back, we have to read the whole JSON document and compare all fields to the requested GraphQL query and schema to see if any unexpected errors occurred. If the server responded with null for a non-nullable field, for example, we have to bubble up the error until we reach the next parent field that is nullable and add an error to the errors array to indicate the problem to the client.

And that’s not all…

The problem with achieving sub-millisecond performance

If you were to now throw in handling federated GraphQL servers, GraphQL Schema Stitching and REST APIs, which is what our UDG will be enabling, you will need to add the following steps to those above:

Preparing multiple child GraphQL queries and REST requests
Printing all child GraphQL queries
Building up the final response

Try to execute these tasks for each individual request and you’ll quickly realise that no matter what programming language and frameworks you use, even with C++, you’ll not be able to achieve sub-millisecond performance.

Thankfully, it’s possible to make this list of tasks a lot shorter. In fact, I’ve spent the last three years immersed in doing so, focusing on what can be removed and what can’t, as well as how to execute the tasks carried out in real time when processing a request at lightning-fast speed.

It’s a complex and fascinating area of work. One of the key issues is that you have to make a tradeoff between code that’s fast to execute and code that’s easy to understand. You can parallelise the code to speed it up, but the more you optimise it, the harder it becomes to understand and test.

Tyk vs Apigee

Native REST, SOAP, TCP, gRPC and GraphQL support

Create, shape and transform your APIs however you need to.

Read full comparison

How Tyk is doing things differently

To achieve the perfect blend of code that executes fast but is easy to understand and reason about, we split the execution into multiple phases:

Hash the request (every request)
Prepare the execution plan and store it in a hash map (only once per request)
Execute the query plan

Splitting the execution into these three steps means we are able to optimise the first and third phases for the computer and the second phase for humans. As part of this, we move a lot of the complexity into the planning (second) phase to make the execution (final) phase very simple and high performance.

During the planning phase, we can build up a data structure that is optimised for extremely fast parallel execution. As we’re transforming ASTs multiple times during this process and the output is an execution tree, we call this process Query Compilation.

In the end, this architecture enables us to execute GraphQL requests with sub-millisecond performance. This has to do with the choice of technologies as well as the overall architecture. For instance, Node.js, while popular, might not be the ideal language to implement a GraphQL gateway. Even if you rewrite the Gateway in Rust, you might still want to take some inspiration from our architecture. Or just use Tyk’s UDG. We’re not only proxying the request but also making sure it’s valid, enforcing security policies for field level authorisation and applying rate limiting and quotas.

Conclusion

Executing GraphQL queries is complex; even more so when executing Queries with Federation, Schema Stitching and wrapped REST APIs. By splitting the query execution into multiple phases, Tyk’s UDG optimises the hot path for computers, while optimising everything outside of it for humans.

This approach keeps the code easy to maintain and test, but also extremely fast. While it’s still early days, our initial testing shows that UDG is between 37x and 50x faster than the current market leaders. Of course, we’ll be carrying out plenty more tests, so you can look forward to some detailed insights a little further down the line. Why not contact us to find out more about how you could benefit from this combination of performance and speed?

API product design Create, secure & test APIs

API platform engineering fundamentals

Enter the Leader!

API platform engineering fundamentals

Enter the Leader!

Enter the Leader!

Tyk – A GraphQL API Gateway at the speed of light

Handling GraphQL requests at the API gateway layer

The anatomy of a GraphQL request

What do GraphQL gateways need to do?

The problem with achieving sub-millisecond performance

Native REST, SOAP, TCP, gRPC and GraphQL support

How Tyk is doing things differently

Conclusion