James "Hirsty" Hirst - Blog Post Header for article about the Native MCP Gateway

We hired an agent. It onboarded like everyone else.

In 1935, the US Army held a fly-off to choose its next bomber.

Boeing turned up with the Model 299. It was the clear favourite. Bigger, faster, carried more than anything else on the field. The contract was as good as signed.

On the demo flight, it took off, climbed, stalled, and crashed in a ball of flame. Two of the five crew died. One of them was the pilot. And not just any pilot. One of the most experienced test pilots in the country.

The cause? He’d forgotten to release a simple lock on the controls.

The papers said the plane was “too much aeroplane for one man to fly.”
The Army couldn’t have demanded better pilots, they already had the best. So they did something else.

They wrote a checklist.

A few steps, on a card, done out loud, every flight. With it, that “unflyable” aircraft went on to fly 1.8 million miles without a serious accident. It became the B-17.

The problem was never the pilot’s brains. The problem was that nobody had written the steps down.

Nobody demos this part

We have an AI agent on our engineering team.

It raises pull requests. It files tickets. It answers questions in Slack.

It has a name. People @-mention it like a colleague.

And last week, in front of everyone, an engineer had to tell it off.

It had assigned a ticket to a team that doesn’t exist anymore.

Every vendor demo of agentic AI is a triumph. You ask. It does. Applause.

Real life is messier. And more interesting. Because it’s more human.

Our agent had been inventing its own names for things in our issue tracker. Using a label of “Gateway” instead of the exact label the team filters on. Looks harmless.

It isn’t. The team filters by the standard label. Nobody checks the made-up ones. So tickets were quietly vanishing from the views people rely on.

An engineer flagged it. Publicly and Plainly. Use the standard labels. Here’s why. These matter.

The agent didn’t just apologise. It opened a pull request to change its own configuration so it would get it right next time. And it told its maintainers, so the fix would stick.

A day later, same thing. This time it assigned a ticket to a disabled team. Got corrected. Updated itself again.

Then someone asked it to start crediting the human who requested each PR, with a link to the conversation. It did that too. Fixed a detail. Shipped it.

Now swap the word “agent” for “graduate hire.”

Made a reasonable mistake. Got clear feedback. updated the checklist. Improved the process so it wouldn’t happen again.

That’s just onboarding.

The only difference is the loop closed in hours, out in the open, and the new joiner rewrote its own instructions instead of waiting for a review.

What I didn’t see coming

Onboarding an agent is onboarding a colleague. The mistakes weren’t exotic AI failures. They were the mistakes every smart new starter makes. Not knowing the local rules. The tribal knowledge. The “we always filter by component” stuff that lives in people’s heads and nowhere else.

We didn’t need a cleverer agent. We needed a checklist.

The rules were never written down. And now they have to be. Because an agent forces you to make the unwritten explicit. That’s a gift. The documentation we’re writing for the machine is the documentation our humans needed all along.

Correct it in public. The temptation is to fix the agent’s mess quietly, so it all looks seamless. Don’t. My team corrected it in the open, explained the why, and let everyone see the standard reinforced. That’s good for the agent. It’s better for the people watching, who now know the right way too.

Self-updating changes everything. An agent that rewrites its own behaviour off the back of feedback is a different animal from a tool you configure by hand. The loop is near-instant and the fix is permanent.

Which is exactly where it gets dangerous.

Yes, but the database it deletes

“You gave a probabilistic system the power to change its own behaviour and touch your engineering systems. And you’re writing a cheerful post about it!?”

Where’s the agent that confidently drops a production database? Or merges something subtly wrong that nobody catches?

Take that seriously. I do and anyone who doesn’t is kidding themselves.

This works for us for one reason. And it isn’t that the agent is wise. It’s that we’ve boxed in what it can do on its own.

It can suggest whatever it likes. It just can’t be the one to press the button on the stuff that matters.

Every self-update lands as a pull request that a human reviews and merges. Take those guardrails away and my anecdote becomes an incident report.

The agent isn’t safe because it’s clever. It’s safe because we kept the blast radius small.

The moment agents act on your systems, you need to know which one did what, what it was allowed to touch, and it must leave an audit trail you can check later.

We’re wrapping the same governance around the agents inside our own walls that enterprises require for agents calling outside tools.

Identity. Permissions. An audit log. They matter just as much when the actor is one of yours.

The uncanny bit

Watching a non-human teammate take feedback gracefully, fix itself, and carry on is a strange experience. I haven’t fully made my peace with what it means for how we build teams.

There’s an exciting version of this. There’s an unsettling one. Most days I think it’s both at once.

But the thing I keep coming back to is how ordinary it felt. No singularity. No drama.

Just a teammate who didn’t know our conventions, learned them when we explained, and got a bit better at the job.

I keep waiting for the part where it gets frightening.

So far, the most unsettling thing is realising how much we’d never bothered to write down.

 


 Further reading

Share the Post:

Related Posts

Start for free

Get a demo

Ready to get started?

You can have your first API up and running in as little as 15 minutes. Just sign up for a Tyk Cloud account, select your free trial option and follow the guided setup.