How we did it: adding API management to Kubernetes

Tyk is leading the field when it comes to API management on Kubernetes. But how and why is our solution different from anything else on the market? Read on to find out, as we take an in-depth look at Tyk on Kubernetes, courtesy of Go Engineer Geofrey Ernest and Head of Research and Development Ahmet Soormally.

Let’s start at the beginning. What was the problem you were trying to solve when developing a way to add API management to Kubernetes?

Ahmet: With the adoption of microservices, we’ve got challenges such as decentralisation. We have a mantra of, “you build it, you own it” so individual teams are responsible for their own stacks – for maintaining the software that they build.

Then you’ve got DevOps teams and DevOps practices, which are all about automation, then GitOps practices as well, where the idea is that Git is the source of truth for things like configuration. You plug that into your CI/CD pipelines and you have some kind of agent that comes along and pulls that configuration from Git as your database, then applies it.

This is all becoming increasingly important in the cloud-native world.

This brings new challenges because, each tool works in a completely different way. It’s one thing to be able to automate each tool individually using every vendor’s specific tooling or scripts, but as you onboard more technologies and dependencies, and also as your services grow, it can become extremely complex to synchronise, upgrade and rollback. How do you maintain that across potentially hundreds of different tools and services, each providing their own piece of the puzzle?

API management in the Kubernetes era is different. Kubernetes was initially a framework for orchestrating containers, but it has evolved into so much more. By extending the Kubernetes API, Tyk has enabled our customers to develop their microservices as they normally would and publish them as they normally would. And they can just as easily configure an API gateway and orchestrate API management using a declarative Kubernetes native API that they are already familiar with. They can publish APIs to the development portal in the same way as they deploy their services. That’s what we mean by bringing API management to Kubernetes.

Why was it important to solve this problem?

Geofrey: There’s a traditional way of doing things, which organisations were used to before Kubernetes arrived on the scene. When Kubernetes arrived, organisations had the chance to try new ways of doing things.

So there’s a pre-Kubernetes way and a post-Kubernetes way. The way that organisations did things in the pre-Kubernetes world is obsolete in the post-Kubernetes world!

Companies need to evolve to achieve their objectives in a way that gets true value out of Kubernetes.

From Tyk’s perspective, we were one of those organisations doing things the traditional way: API management was something that we had solved in a traditional way. Kubernetes meant that we needed to adapt and evolve that solution. It introduced new requirements.

You still have to do API management with Kubernetes, just in a different way. So we developed a solution to solve a problem we had already solved – API management – just in a different way.

How big was the problem – was it just a small proportion of companies that were facing it?

Ahmet: Kubernetes and the CNCF is exploding in terms of its growth. More and more of our prospects and customers have Kubernetes on their radar.

What Kubernetes is doing is giving you a consistent way to manage everything – your entire infrastructure. It’s reliable and more and more tech teams are becoming familiar with it. That means everyone from start-ups to large enterprises are adopting Kubernetes. The patterns and framework that it provides give good guidance – a good baseline of how you should be doing things.

When you give someone a framework, you’re removing the necessity for each individual development team to make their own decisions. So a consistent way of doing things can enable development teams to focus more on business problems and less on cross-cutting concerns.

What was your initial idea to solve this? What did you try to create? 

Ahmet: Our eventual solution was pretty close to the initial concept. We started with just an ingress controller, which was relatively straightforward and was a great way to learn the Kubernetes API, but our ingress controller was more powerful around native API definition, security policies, API descriptions and portal objects.

This gives us a lot more flexibility – because there are a lot of shortcomings with native ingress. Ingress is how you go about getting traffic into the Kubernetes cluster and it’s very limited in what you can achieve with it and working with annotations is really not a great experience. We’ve extended the Kubernetes API and installed a bunch of controllers as part of our Kubernetes operator, which has given us a very rich, strongly typed API, enabling Kubernetes to perform full lifecycle API management.

While our solution didn’t stray too far from our original idea, I think it’s fair to say we’ve taken it significantly further than we initially thought we could achieve.

Geofrey: When we were initially exploring Kubernetes and using the Kubernetes ingress, we were focusing just on a small part of the Kubernetes framework. But our customers quickly outgrew the initial solution that we were providing, so there was a vacuum that we needed to fill. The ingress was just one part of the Kubernetes framework and our customers wanted to use other parts – so we addressed that with the Tyk Operator project. It evolved organically into a full Kubernetes API management experience.

What was the hardest part of creating this solution?

Ahmet: The act of abstracting away the complexity of API Management by codifying it into a collection of idempotent controllers so that the end-user can have a nice and clean declarative interface is a pretty tough challenge in itself. In order to do that, we had to think about all of the scenarios that might happen and then codify it. We’ve taken a lot of responsibility away from the user and done it ourselves – and that was a challenge.

We also imposed some design-time constraints on ourselves, which made this project even more challenging. For example, we needed to ensure that the API we exposed was completely compatible with our core Tyk management APIs, so that our existing customers could enjoy a smooth, painless transition into Kubernetes.

So even if we spotted things along the way that didn’t look quite right or didn’t work quite how we wanted them to, we had to document them for addressing further down the line and just bite the bullet and get on with it. We didn’t want to change the core product because we wanted our existing customers to migrate in, so we abstracted away a lot of complexity into the operator’s controllers which was quite challenging. We needed to understand exactly how our customers would be using the operator and create those scenarios and codify it, so that we can achieve the customer goals consistently.

Were there any complete dead ends where you thought you would need to scrap the idea and start again?

Geofrey: Overall, I think that our general direction has been on-point. We were very careful and considered in our approach right from the outset. We took time iterating the idea and proving the concept before executing it, so we haven’t found ourselves in a situation where we’ve had to rollback any changes or features.

Most of the time, our new features are based on our users’ needs. They let us know what they need, and we then find a way to address that and deliver value. So we already know what the end game is, we just have to work out how to get there. And we take the time to do that the right way, so we don’t have to backtrack.

Ahmet: There are always reasons to say, “We could do this in a better way” and that means we are constantly evolving the product. But we’ve not made a bad decision and had to start over at any point.

Was there an “Aha!” moment, when you knew you had cracked it?

Ahmet: I remember sending a little video to the rest of the team to say that I’d created my first Kubernetes controller. That was a good moment. The ability to go ahead and deploy the gateway declaratively, to be able to give Kubernetes a manifest and have an agent that sits there watching the Kubernetes API and operating on it and making that controller idempotent… as soon as the team saw that  flow, things clicked and we realised that we could also bring API management to Kubernetes.

That was the “Aha!” moment for me: when we realised that, using these methods, we could fully support modern GitOps best practices for API management.

Were any users, customers or community members involved in building this feature?

Geofrey: Our users have been central in validating our ideas and our value propositions. Kubernetes was a new front for Tyk, so there was a degree of experimentation. We like to innovate and find new ways of doing things, so have a new framework to explore was a great opportunity.

Our users were central to most of the decisions we took and to which features we added and in which order. Being open source, we’ve had users participating in road maps, raising tickets, joining discussions, asking for features… that’s a hugely useful resource to have. It meant we had a powerful feedback loop in place for each new feature that we introduced. There’s always a lot of interaction with the open source community and with our customers too.

I don’t think we would have reached this point of stability so quickly without our users shaping things and providing feedback. We had a clear understanding of how and why each feature was needed by the time we shipped it.

Ahmet: I specifically want to mention one of our customers – GPS Insight – in relation to this. They’re a fleet tracking (telematics) provider and they’ve been amazing to partner with, right from the outset. They’re confident in Kubernetes and already have well established GitOps processes in place. We opened up a shared Slack channel, which was used to bounce ideas and get continual feedback with our approach. That’s really rewarding, to have that kind of engagement.

As Geofrey said, we don’t just build stuff that we think might be useful – it’s stuff people want. Within days of releasing any new capability, we have feedback about it.

Also, as we’re open source, our issues, our road map and our defects list are all public. We have a GitHub forum where people can discuss what features they want. That was how we knew that people wanted us to introduce portal capabilities into the operator – then we went ahead and built it.

What are other API management companies doing about Kubernetes?

Ahmet: At the moment, very few API management solutions have the level of native support for Kubernetes that Tyk has. We’re one of the more modern stacks and we have a truly cloud native architecture, which helps. Many vendors are SaaS only, or they have a very complex architecture. Their products are complicated and not easy to adapt for Kubernetes. I think most solutions don’t extend very far beyond simple API definitions and ingress. If you take a look at the Gartner Magic Quadrant for Full Lifecycle API Management, Tyk is the only vendor that allows you to publish APIs to your portal catalogue using Kubernetes custom resources. Similarly, we are the only vendor that will enable you to declaratively expose your REST API as GraphQL, using our Universal Data Graph.

A lot of API management vendors who have a Kubernetes native offering pivoted and started competing in the service mesh space. We chose to focus on core API management capabilities. I believe that service mesh and API management can happily live together; they solve a different set of problems.

What is the most innovative part of Tyk’s solution for adding API management to Kubernetes?

Geofrey: It’s an interesting question. All of our competitors who have a Kubernetes offering have some limitations around it – they don’t offer a fully featured API management solution on Kubernetes. We took a different approach for the Tyk Operator, so that’s really the innovative part.

Our competitors have faced shortcomings in relation to the choices they’ve had to make about how their products get used on Kubernetes, as they are using Kubernetes native resources. Those are very limiting – you can’t offer a full API management solution that way.

Tyk, however, is leveraging custom resource definitions, which is a new concept in Kubernetes that allows us to define something, to describe something in a rich way. That means users can use it for API management. It’s a native way of doing things that gives full power to our customers. So when it comes to API management on Kubernetes, whatever they can do the traditional way, they can now do it on Kubernetes, without sacrificing anything.

That’s the real difference in what we offer – Tyk Operator means that there’s no difference to what you can do when you switch to using Kubernetes. Our customers are very happy about that!

Ahmet: We are able to leverage the Kubernetes API to do validation for us – to validate the custom resources that we create. That means that we not only have everything within Tyk now completely native within Kubernetes, we’ve even got the Kubernetes API doing validation for our customers. They can have a Kubernetes native experience with the full power of Tyk – it’s a different experience.

How long did the production process take, from initial concept through to the feature reaching Tyk’s users?

Ahmet: We began by interviewing some of our existing customers and getting them involved in the project. It gave us a good overview of some of the current problems that they were having. We then started working on Tyk Operator in about August or September 2020. Our first public release was in November, so it took about 2.5 months before customers were first able to get their hands on it.

The early validation was so important to us. We iterated and iterated. We knew we’d cracked it when one of our early adopters whipped up the first version of the Operator’s helm chart so that they could effectively deploy and use it in production. That put a big smile on our face.

That first release, having been created in 2.5 months, was a little rough around the edges. We’ve made it a lot more stable since then! But getting it out there, and having people using it so fast, was just phenomenal.

What were the key learnings for you from this process of developing an API management solution for Kubernetes?

Ahmet: One of Tyk’s values is that the only stupid idea is the untested one. Another is that it’s ok to screw it up. In the context of the Tyk Operator project, these values gave us the autonomy to experiment, move fast, break things, identify problems, learn, fix and iterate very rapidly. Without those core values, I don’t think we would have had the freedom to achieve what we did.

The best thing we did right from the beginning was to find the right customers to partner with to make the project a success. That helped us to ensure that we were continually providing value and could proactively check in to validate assumptions and reduce risk.

If we had supported the ‘kitchen sink’ of API management into Tyk Operator, we would have wasted a lot of time and introduced features that weren’t useful. By only introducing core capabilities, and incrementally introducing more complex capabilities as we saw demand from users, we were able to keep the project on time, in scope and better tested. So those were some great learnings from my perspective.

Geofrey: It was really important to be lean and composed in our approach. Focusing on providing the core features and then gradually evolving them from that baseline worked very well. That was an important part of our approach – it was a successful way to handle the project.

What impact has this made for Tyk’s customers?

Ahmet: I asked a couple of our customers about this the other day.

GPS Insight is using the operator to empower developers to manage their APIs and security policies through Tyk in a repeatable way. They deploy Tyk configurations right alongside the actual application being developed and are able to test against a locally deployed open source Tyk gateway. Being able to define Tyk API/security configurations as code ensures there is no drift between staging and production environments, and it makes things easy for developers to test locally as well. They commented:

“It has also allowed us to take API documentation to the next level. The swagger docs we write are automatically deployed and accessible by external customers and other internal developers who need to interact with APIs.”

Another customer – a global eCommerce platform that’s been built organically over the past 20 years or so – also had some great feedback. As I’m sure you can imagine, they have a lot of services, very few of which are shiny and new. The fact that Tyk has full native support for Kubernetes, ready for CI/CD automation, enabled Tyk to be simply dropped into their existing pipelines. That meant that they were able to fully onboard circa 30 public facing APIs within a month. They said:

“We managed to migrate 40 million requests a day to it, without our customers even noticing. We’ve changed the tyres on the bus while it’s doing 90mph with a full load. No one’s even noticed. That’s how it should be!”

Thank you both for explaining Tyk’s approach to API management on Kubernetes – it’s been a fascinating topic to explore.