Open source: how and why we built the most popular cloud-native API gateway

Tyk has made a name for itself around the world as a lightweight, cloud-native API gateway. But how did the gateway come to exist in the first place and why was it open source?

We sat down with Tyk CEO Martin Buhr to take a deep dive into the origins of Tyk and put these burning questions to him, as well as to dig into the value of open source and why it’s so important to modern enterprises. Over to you, Martin… 

What was the problem you set out to solve when you first started developing Tyk?

Martin: I had this side project going on. It was a load testing engine – a system just designed to test other systems. It was working OK, it was built very quickly on a framework called Django, and it was making a bit of money.

I was getting really excited about this “API first” architecture, and it occurred to me that I should rewrite the whole thing to be API first. So I scrapped it – well, didn’t scrap it exactly, but I wrote a whole new version of the same system. It was really cool, because it gave me much more flexibility in what I wanted to do.

Then I reached the point where I thought, “Well, I’ve built this site, and I’ve built this engine, and it all works, but there’s no security. I can’t put this live without security.” 

So I looked around, knowing that I didn’t want to build all my own security – that would be a pain. So, at that point I was looking for an API gateway. I figured that’s what I needed to protect the APIs.

The system was a single page web app, all the requests were APIs and it was very dynamic, with lots of flexibility. So I looked around, the only things on the market were MuleSoft, Apigee, 3scale and WSO2.

I didn’t want to use a cloud provider – they were very expensive, and their free tiers were lacking (or they didn’t really have one because it was just a trial). The open source ones – WSO2 and MuleSoft – they have enormous systems with multiple components. You can’t just install and run them; they take forever to configure.

When I considered the journey of figuring out those platforms, it seemed that just writing my own authentication was easier. I thought, “You know what, I’ve been learning Go for a while, this is a good opportunity to work with a great, fast server-side language.”

So I wrote my own mini API gateway, which handled my authentication for me. That’s how it all started.

I had something that worked, then I thought that I needed to know more about what was going on: I needed to be notified of security issues, stuff like that. So I set it up to pull analytics out of the traffic stream. Of course, then I realised I needed a way to visualise that data.

So I started to look at things like Grafana, the open source graphing tools for this kind of thing. And I thought, “OK, I’ll build my own” (because that seemed sensible at the time!).

I built the whole UI, the dashboard, the management interface, and I built it API First. That gave me the analytics and the view on that. It was a great learning experience, because I had to figure out how to do it efficiently.

That’s where it started really. The side project was OK, but all my clients were in the US, and if something failed, I’d end up having to deal with it at 3am in the UK. And it wasn’t paying me enough to deal with that level of pain!

I decided to scrap the side project and give my wife and I fewer headaches! I also decided to open source the gateway. I had the UI and the dashboard that allowed people to fiddle around with it and configure things. I figured that kind of stuff usually costs money, so I would charge for that part.

At the time, I wanted to take my wife for a fancy dinner. That’s our “thing,” we’re foodies (this was pre-children!). So I decided to take her to Gordon Ramsey in London, which would cost around £450, for a table for two with the wine flight. That’s how I set the price for the dashboard. I figured if I sold one license then I could take my wife for dinner.

And I sold one – to a Chinese start-up. It took a while, but I did sell it.

We actually got quite a lot of traction early on. I went onto Hacker News, put up some posts about microservices and after the initial excitement of selling that licence (and taking my wife for an amazing dinner), I began to get phone calls from big companies.

I remember signing an NDA to talk to a huge Fortune 500 company about their API management requirements. At the time, a VP of engineering for another company’s microservices team called me and said, “Yes, it looks cool, but how are you going to maintain this? How are you going to scale your cloud-native API gateway at £450 for a lifetime license?”

I quickly realised he had a very valid point! That’s when I turned to James, my co-founder, and said, “Help me please”. James built out our commercial proposition, and that’s how the company started.

So the software very much started out as this open source thing, and a small side hustle to take my wife to dinner!

How big a problem was the lack of an API first gateway for those big companies? How were they managing before?

Martin: Well, there were API gateways out there, such as Apigee, MuleSoft and WSO2. They all had API gateway products. And a lot of the larger ones, like 3scale and Apigee, not only were they cloud based, they were also seriously expensive. That meant the big companies were their typical clientele.

Even big players like IBM and Dell had their own API gateways. But they were quite large, difficult to use, and didn’t really fit the modern use case of how APIs were evolving. They definitely weren’t designed for a cloud-native world.

At the time, my own use case was very simple. I had a small application, and I needed a very simple thing that would secure my traffic and tell me what was going on with these APIs. I needed something I could just plug in, that would have a minimal footprint, and that would provide me with the data and the security – a simple, cloud-native API gateway.

The other solutions didn’t do that. They were huge monoliths that didn’t really plug into a cloud-native architecture. This is when containers started coming around. Using a language like Go was really beneficial. The whole thing was written in one go, and there was no underlying framework.

A lot of these competitors would use another web server – doing proxying. I think two used NGINX. The idea was that they just configure these other proxies to work for them. They didn’t have their own system that they owned, top to bottom. That meant that they weren’t as flexible as they should have been – not in this new landscape where you need containers, autoscaling – all that kind of stuff.

That kind of thing was taking off, so we were in the right place at the right time – having an API gateway that was very small. We had a laser focus on APIs, nothing else. We provided exactly what people needed for their API management requirements.

Obviously, that’s ballooned now – it’s much different. There are other small, tactical API gateways out there. But the use cases have also changed. What I initially did was something very small and tactical, and what small teams in huge enterprises need is something very small and tactical. But that then needs to grow to become an enterprise-grade solution.

That’s the problem I believe we solved. We solved an existing problem, but just in a slightly better way – a more flexible way.

We’ve been called disruptive, but all we really tried to do was make things smaller and more independent – as opposed to building gigantic systems that try to be everything to everyone.

Where did the open source part come from? Why open source?

Martin: It came down to some articles I read. I was thinking about how to make money from open source. It’s a big question, and it’s hard to do. The only company that’s really done it well is Red Hat. They did it off the back of support and licensing, as opposed to actually selling a product.

That’s why we have an open source product, but a closed source enterprise add-on.

The real reason why I went open source with the engine, the main gateway, is that the modern enterprise is moving towards open source. It’s much preferred because if you go with a proprietary vendor…say you’re putting in something like an API gateway…it’s going to be around in your company for a while.

You’ll have that system for three, five, seven years, and you won’t want to have to replace it. But in that time, there’s always the risk of a vendor being acquired, changing their business model, or stopping support for the product altogether.

With an open source product, it’s public by default, and you don’t have that risk. What you hope to do is end up with a community-run product – you’re democratising that responsibility.

So open source, particularly with something mission critical like an API gateway, is good from a buyer’s perspective. It’s also good from a transparency perspective. If the source code is open, if there’s a hack or somebody trying to attack it, they have the source code available to them, so they have the lowest-level insight into how it works.

It means, essentially, that we can be more battle tested. With a proprietary piece of software, it needs to go through a private penetration test, or something like that. With open source it can be fixed by anyone – so if there’s a zero day exploit it can be immediately patched by the vendor, the buyer, whoever wants to patch it.

It’s also hugely transparent. If somebody decides, “Let’s audit this thing,” they can do it without having to talk to the vendor or go through some awkward process. That’s a big deal for a piece of software intended as a security layer for a business.

Open source is a natural place to go for that kind of confidence.

What was hardest about creating Tyk? What went wrong, and what was harder than you expected?

Martin: I think the hardest problem to solve was efficiency. Go is a low-level language. It’s got something called “Garbage Collector,” so it allocates memory dynamically. Then the system itself manages the memory for you.

That’s really good, because it means – in theory – that you don’t have to worry about it. With low-level languages like C, you have to manage your own memory. You have to say, “For this variable, there’s this much space,” then say, “I’ve used that now,” and give it back to the operating system.

If you don’t do that, your application will keep taking up more and more memory – all the RAM in the machine – and crash it.

In Go, you have this garbage collector. Essentially, it identifies any unused or unlinked variables, and automatically removes them – to return memory back to the system.

However, there are conditions where you can create something called a memory leak. This is where the garbage collector can’t clean up a specific bit of memory as you’re still using it in an arbitrary way. That can happen, and again you end up with the problem where this keeps slowly ballooning until it crashes your server.

The thing is that this isn’t a very fast process. It can happen over days or weeks. It’s like a dripping tap – leave it dripping too long and it can flood your house. But it takes months.

That kind of bug is so hard to track down. Especially for somebody like me who wasn’t an expert coding engineer at the time. I was working with this code and these things showed up, and I was like, “How do I fix this?!” One of the most frustrating bug hunts is a memory leak hunt.

So that was one of the biggest frustrations in the early days. When you’re running in the cloud and you have a memory leak, it will show up eventually. It’s boring as hell, but the thing I always fear the most is a memory leak!

Another minor thing was simply making sure we always did the right thing with the code. 

We were all – myself included – very opinionated about how we wanted it to work. But then others would say, “Well that’s not right. That’s not how it should work! You’re using European spelling; it should be US spelling.”

And, actually, in tech speak, you should be using US spelling. If you go into CSS and put in a colour code, it’s “color,” not “colour.” So there were lessons learned there.

Does being open source help with things like the memory leaks? Does the community come together to help?

Martin: A little. Open source can actually be quite thankless for the person maintaining it, especially if you’re not being paid for it.

You read a lot in the press, saying that open source contributors reach burn-out quite quickly. The main way people contribute to open source is by filing bugs. So what developers see are bugs. 90% of the feedback is somebody complaining about the code – not fixing it, just saying it doesn’t work!

With the internet being the internet, some people are great at debugging a problem and helping you solve it. But there are others who are just rude. That can happen a lot. You end up with arguments and flame wars on your issue tracker. That can be hugely frustrating for developers because there’s a lot of expectation for free work. It takes up more and more of your time.

Where the community really comes into its own is that they can highlight problems – say, “Look, there’s a bug here.” But it’s not so great in terms of actually solving each problem, unless people are actually contributing to the source code.

In our case, we didn’t have that many core contributors fixing code. We had a few. But it was more when we started hiring a team that we were really able to work directly on those problems.

So, it’s a mixed blessing. A lot of open source projects are like that – there are just a couple of maintainers, even though there’s a huge community around the project. It can be that the community is simply asking questions. That can be slightly frustrating, and a thankless part of open source.

Can the community help with user testing and trialling new features?

Martin: We do go to the community and build out community champions, with the idea that some people are real advocates for the product. They help us with the software a lot in terms of giving feedback, feature requests and things like that.

But the open source product doesn’t have a UI, as such. It’s headless, so there’s no real user testing to do. It’s more around feature requests.

An example is the work we do on our Kubernetes operator. That’s our most popular repository at the moment in terms of community engagement. That’s weird because it’s really just an enabler for the main piece of code. But because Kubernetes is hot at the moment, it attracts more people, so you get great feedback from that community.

So really, that kind of thing depends a lot on what specific project you’re looking at.

Has anybody else created a product that’s comparable to Tyk?

Martin: Well, there are certainly loads of API gateways! When we came along, we were a small, tactical, API first gateway. There was Kong, another small, tactical API gateway.

There are three, maybe four smaller projects right now that are gaining momentum, and are back into focussing on that cloud-native API gateway component – that really small, tactical solution. There’s Ambassador, Solo…they’re very big in the service mesh space, they have a really cool product. There’s Gravity, and there’s another called KrakenD.

These are again open source projects. Gravity, Solo and Kong obviously are more mature. The smaller projects are open source, community driven. They do have revenue models attached to them, but they’re still very much solving the problem I was solving in the early days, as opposed to solving enterprise-level problems, which is what we’re doing now.

There’s just a scaling difference really, with the kind of complexity you’re trying to deal with.

Something I find very interesting is that a lot of the smaller projects are very opinionated about best practice. That’s fantastic, and it’s great to say, “This is the best way to do something, and this is the opinion I’m taking.”

But the reality is that while everybody wants to aspire to best practice, everything tends to grow organically. It grows warts, it grows over time, there are components that don’t change, components that do change.

So if somebody comes along and says you have to do things a certain way, unless you have a greenfield project where you can architect it that way, you can’t use that best practice. You can only aspire to it.

A good solution for a large enterprise needs to openly accept the fact that reality is messy, and deal with that.

One thing we’ve been very good at doing is having a very flexible solution that enables people to eventually get to best practice, but also cope very well with all the bizarre skeletons in the closet – the legacy spaghetti.

Many of the smaller players do have the attitude that there is only one way to do things properly. Thoughtworks has this thing where they think that API gateways are now “over ambitious”. We definitely fall into that over ambitious category. They say it has to fit into a tiny box that only does one thing.

Thoughtworks are very intelligent people with a great understanding of the market, but they put things in boxes. They say, “This is this component, and it has to work this way.” It’s like all the grey matter around it doesn’t seem to exist, even though it obviously does.

If it didn’t, Tyk wouldn’t exist. The market is there, and it’s a problem we need to solve. Solo, for example, have hitched their wagon very much to service mesh. It’s a very exciting technology, but it’s not something you organically grow into. It’s a kind of tech that you have to transition everything to.

If you do that, great. But if you’re a bank with thousands of web services… different sub-units that are all doing their own thing… then that homogenisation isn’t possible, or certainly isn’t easy over a short period. You’re looking at a three to ten year project, and a lot of investment.

So when it comes to innovation, Tyk is innovative perhaps by being a little boring! We do a lot of cool stuff, but we do it for enterprise customers.

The open source user typically has a much smaller use case – their scope is much smaller. They just need something small and lightweight that will handle their traffic. We certainly want to see continued rapid growth of our open source users. It’s something we’re investing heavily in, and we want companies to have the option of the open source version if they don’t want to pay us money.

Some of the newer competitors are laser focussed on something smaller. That’s not a bad thing, and it’s what you’d expect from a market that’s maturing.

And that’s fine, there’s plenty of space.

What have been the key learnings from your open source journey?

Martin: We bootstrapped the company. We didn’t have a seed investment or anything like that, so we started off with just us and initial revenue.

To get that going, you have to focus very heavily on your paying customers – what they need and what they want.

So, in the first two or three years, we didn’t engage enough in creating an open source developer ecosystem. That’s hugely important if you want to build a really popular open source product.

We over-optimised on revenue, which meant we lost out on the momentum you can build with a big developer community.

It’s never too late to change, and we’ve done that now. We’ve repositioned our investment, time and attention, making sure we give more to the open source community – our users and the people who are contributing.

If I could start again, I’d probably try to engage the developer community better, and get more of a community around Tyk before going corporate. But you also have to think about how you’re going to pay for food!

The way to get around that is to take some seed money or some angel money, and you spend that money exclusively on building a community. But you’re burning cash at that point because you still don’t have a product to sell. However, that community is a strong foundation, ready for when you have built a bigger product and found a market fit.

I hate the term, but in start-up land they call it a “flywheel” or a “dynamo” to get you into the next phase. It’s a traditional way to spend seed capital.

The way we did it was going straight to the market and optimising for new customers – so it was rather different.

How do your open source origins impact your enterprise customers?

Martin: I think they love it! They like the fact they can go and say, “There’s the bug!” and some of the more technical people will specifically say, “Fix that!”.

The impact is that they can rely on the system and that they have flexibility. When you’re trying to build a reliable product that does what it needs to do and doesn’t get in the way, being boring is good. Having an open source product helps with that.

The thing that does the critical work – not the UI or the dashboard – the critical bit of software that does all the heavy lifting, everything in that package you can take, look at and use immediately. It’s all visible, warts and all.

So, what’s next?

We’re planning on building the open source out further. For example, we’re looking at our testing frameworks, and making those open source. That will mean our customers and users can see exactly what we’re testing, and how we’re testing it. That could include some of our proprietary products.

They can say, for example, “there’s an edge case you’re missing,” and contribute directly to the project. And they will also have full visibility on how we’re ensuring that the software continues to be high quality over time.

That’s another really powerful feature of open source. You can take not just the actual product, but the bits around it, and make them more transparent. That can be a superpower.

As well as that, we’re looking at a command line interface for our cloud. We have one we use internally and it’s an extremely powerful tool. The internal one needs to be polished, but we plan to open source that, so that when users are interacting with our cloud, they have a really good developer experience. And they’ll be able to influence it too.

So they’re the two big ones right now: the testing frameworks, and the CLI tool.