API Gateways for REST APIs are a well known topic. Solving the same problems for GraphQL APIs is a different story. However, we didn’t just make GraphQL APIs as secure as REST APIs. Instead, we took it to the next level and let you combine REST & GraphQL into one unified GraphQL API. In this post, well look into how Tyk’s Universal Data Graph works and what makes it so scalable.. Read on to discover:
- How GraphQL servers traditionally behave
- What GraphQL gateways need to do and why that’s a barrier to great performance
- How Tyk did things differently
Let me walk you through it…
Handling GraphQL requests at the API gateway layer
API gateways are responsible for a set of difficult but very important tasks. They take care of authorisation and authentication, make sure requests are valid, do content negotiation, mediation, caching and more.
There’s plenty of info out there about doing all these tasks for REST APIs but not so much when it comes to GraphQL. Handling GraphQL requests at the API gateway layer can be a seriously expensive and resource-intensive operation, especially if your upstream is not just a single GraphQL server.
And so Tyk decided to step in. In its current version, Tyk’s Universal Data Graph (UDG) supports combining multiple GraphQL and REST APIs into a single unified Data Graph. With our next release, you will be able to use Apollo Federation and GraphQL Schema Stitching at the same time, while also being able to add REST APIs into the mix.
The anatomy of a GraphQL request
In order to understand what makes UDG so fast, we first need to understand how GraphQL servers traditionally behave. If you want to dive deep into this topic, check out this post by Craig Taub. For brevity, we’ll keep it short and simple here. GraphQL servers handle:
- Lexing of the query (turning text into tokens)
- Parsing the GraphQL operation/query (building an Abstract Syntax Tree or AST from the tokens)
- Validating the operation (analysing the AST to see if the operation is valid)
- Executing the operation (traversing all field nodes in the AST recursively until exhausted)
Those are some fairly complex tasks and depending on the size of the query, the AST can grow to a gigantic tree structure. This will further have to be built for each request and traversed multiple times for validation and execution.
What do GraphQL gateways need to do?
A GraphQL API Gateway needs to handle:
- Lexing of the query
- Parsing
- Normalisation (removing whitespace, duplicate fields, etc.)
- Validation
- Enforcing field level authorisation
- Calculating the complexity of the query
- Enforcing rate limits and quotas
- Printing the query (because we modified and cleaned it)
- Sending the request to the upstream
- Validating that the response conforms to the GraphQL schema
- Returning the response to the client
Normalisation, calculating the complexity of the GraphQL Operation and printing the outbound query all mean walking the AST (and potentially modifying it), while the latter also means printing it as a human-readable string, the sanitized GraphQL query document.
And the fun doesn’t stop there. Once the response comes back, we have to read the whole JSON document and compare all fields to the requested GraphQL query and schema to see if any unexpected errors occurred. If the server responded with null for a non-nullable field, for example, we have to bubble up the error until we reach the next parent field that is nullable and add an error to the errors array to indicate the problem to the client.
And that’s not all…
The problem with achieving sub-millisecond performance
If you were to now throw in handling federated GraphQL servers, GraphQL Schema Stitching and REST APIs, which is what our UDG will be enabling, you will need to add the following steps to those above:
- Preparing multiple child GraphQL queries and REST requests
- Printing all child GraphQL queries
- Building up the final response
Try to execute these tasks for each individual request and you’ll quickly realise that no matter what programming language and frameworks you use, even with C++, you’ll not be able to achieve sub-millisecond performance.
Thankfully, it’s possible to make this list of tasks a lot shorter. In fact, I’ve spent the last three years immersed in doing so, focusing on what can be removed and what can’t, as well as how to execute the tasks carried out in real time when processing a request at lightning-fast speed.
It’s a complex and fascinating area of work. One of the key issues is that you have to make a tradeoff between code that’s fast to execute and code that’s easy to understand. You can parallelise the code to speed it up, but the more you optimise it, the harder it becomes to understand and test.
How Tyk is doing things differently
To achieve the perfect blend of code that executes fast but is easy to understand and reason about, we split the execution into multiple phases:
- Hash the request (every request)
- Prepare the execution plan and store it in a hash map (only once per request)
- Execute the query plan
Splitting the execution into these three steps means we are able to optimise the first and third phases for the computer and the second phase for humans. As part of this, we move a lot of the complexity into the planning (second) phase to make the execution (final) phase very simple and high performance.
During the planning phase, we can build up a data structure that is optimised for extremely fast parallel execution. As we’re transforming ASTs multiple times during this process and the output is an execution tree, we call this process Query Compilation.
In the end, this architecture enables us to execute GraphQL requests with sub-millisecond performance. This has to do with the choice of technologies as well as the overall architecture. For instance, Node.js, while popular, might not be the ideal language to implement a GraphQL gateway. Even if you rewrite the Gateway in Rust, you might still want to take some inspiration from our architecture. Or just use Tyk’s UDG. We’re not only proxying the request but also making sure it’s valid, enforcing security policies for field level authorisation and applying rate limiting and quotas.
Conclusion
Executing GraphQL queries is complex; even more so when executing Queries with Federation, Schema Stitching and wrapped REST APIs. By splitting the query execution into multiple phases, Tyk’s UDG optimises the hot path for computers, while optimising everything outside of it for humans.
This approach keeps the code easy to maintain and test, but also extremely fast. While it’s still early days, our initial testing shows that UDG is between 37x and 50x faster than the current market leaders. Of course, we’ll be carrying out plenty more tests, so you can look forward to some detailed insights a little further down the line. Why not contact us to find out more about how you could benefit from this combination of performance and speed?