By Jonas Keller · May 10, 2025 · 11 min read

Building an AI customer support agent: what we learned from doing it three times

Abstract glass molecular structure illustration representing complex AI agent architecture and data flow

The third time you build something, you start to see what actually matters. The first time, you're learning what the problem even is. The second time, you're applying what you learned — and discovering new problems you didn't know existed. By the third, you have opinions.

I've now built AI customer support agents for three different companies. A SaaS business, a logistics company, and a professional services firm. Here's what I would do differently on the first two, and what we've standardised since.

What most "AI support" deployments actually are

A lot of what gets called an "AI customer support agent" is a FAQ bot with a language model skin. You give it a knowledge base, it finds the closest match, it outputs a response. That's fine for very simple queries. It doesn't handle anything that requires contextual understanding or action in external systems.

What I'm describing here is more than that: an agent that can read a customer's query, look up their account status, check a policy, take a defined action (like issuing a refund below a certain threshold), and respond with accurate information — without a human touching it. The distinction matters, because the engineering and the testing look completely different.

The first project: what I got wrong

The first AI support agent I built worked. It handled the most common queries reliably. But I underestimated how much edge case volume there would be, and I underestimated how important the escalation path was.

By "edge case," I mean anything that falls outside what the agent was built to handle. On the first project, those queries were being dropped — not escalated, not flagged, just not responded to — because I hadn't designed a clean fallback path. The agent either handled it or it didn't. There was no "I don't know, here's a human."

That's a failure mode that sounds obvious in retrospect. Design the happy path first, design the failure path second — and treat the failure path as equally important. Every AI support agent needs a clear, tested escalation mechanism.

The second project: what I still got wrong

By the second project, I had a proper escalation path. What I underestimated this time was knowledge base maintenance.

The agent's quality depends entirely on the information it has access to. That information goes stale. Pricing changes. Policies update. New products launch. Features get deprecated. If the knowledge base isn't maintained, the agent starts giving wrong answers — which is worse than not answering at all, because it looks confident doing it.

We now build knowledge base update processes into every support agent project as a standard component. Not as an afterthought, not as "you should update this occasionally" — as a defined, documented process with a responsible person attached to it. That's a non-negotiable.

What we've standardised after three projects

Escalation logic gets designed before anything else. Before we write a single line of agent logic, we map out every scenario where the agent should not handle the query — and what happens in those cases. Every escalation triggers a notification. Nothing silently falls through.

Testing against negative examples, not just positive ones. The first project was tested against "queries the agent should handle." That's not enough. We now test explicitly against queries it shouldn't handle — ambiguous requests, hostile queries, requests for things outside its scope — and verify the escalation behaviour is correct.

GDPR handling gets its own design pass. Support queries contain personal data. How that data flows through the agent, what gets stored, how long it's retained, and who has access — these questions need answers before build starts. Not after. We document this in a data flow diagram that's part of every project's scope.

Response tone review with the client. The agent speaks to your customers. Its tone needs to match your brand. This sounds obvious but it's easy to skip if you're focused on the technical parts. We now do a dedicated tone review session mid-build, with real test queries, before finalising.

One thing I'm still uncertain about

When to trust the agent to act autonomously, and when to require confirmation before taking an action. For low-stakes actions — looking up information, classifying a ticket, sending a template response — autonomous is fine. For higher-stakes actions — processing refunds, updating account information, sending external communications — I'm more cautious.

The answer varies by company, risk tolerance, and the specific action. There's no universal rule. But it's a question worth thinking through carefully before you decide how to configure your agent's autonomy level. Getting it wrong in the too-autonomous direction is harder to fix than getting it wrong in the too-cautious direction.

Thinking about a support agent?

The things I described here — escalation design, knowledge base maintenance, GDPR handling — are part of every support agent project we scope. Start with a discovery call.

Book a discovery call →