How to Build a Multi-Agent System (Part 2/3): From Architecture to Implementation

Cristian Dordea
Jan 9
9 min read

Updated: Jan 13

Miami AI Agent Summit 2025 — How to Build a Multi-Agent System (Part 2/3): From Architecture to Implementation

Welcome to Part 2: From Architecture to Implementation (Multi-Agent System)

In Part 1, we transformed a drowning customer service operation into a well-designed multi-agent system on paper. We identified our pain points, designed 11 specialized agents, and mapped out how they'd work together. But a blueprint isn't a building.

Now comes the exciting part, shifting that design into reality. This is where many teams stumble. They have a beautiful diagram, but when they start coding, they realize they haven't thought through critical details:

Where does memory live?
How do agents actually talk to each other?
What happens when one agent fails mid-conversation?

In this part, we'll make the architectural decisions that determine whether your system elegantly scales to thousands of users or collapses under its own complexity.

We'll start with just three agents—not because we lack ambition, but because shipping something small that works beats planning something perfect that never launches.

Let's build.

Phase 3: Architecture Decisions

Choose Your Orchestration Model

I had a choice: build a democracy where agents coordinate themselves, or create a hierarchy with clear leadership. I chose hierarchy (a centralized orchestrator) for practical reasons:

Need clear audit trails for compliance
Complex routing rules require central logic
Debugging is easier when you can trace every decision

The Customer Service Orchestrator Agent acts like an air traffic controller, knowing where every request is and what each agent is doing.

Design State Management - Memory Matters More Than You Think

One of critical decisions when designing a multi-agent system is state management or the the coordination and persistence of shared data, context, and operational information across multiple agents throughout their interactions.

State management is the architectural decision that separates demos from production systems. Without proper memory architecture, every customer interaction starts from zero, frustrating users and crippling agent effectiveness.

Multi-agent systems require two distinct types of memory, each serving critical, but different purposes. Confusing these types or implementing only one leads to systems that either forget everything between conversations or drown in unnecessary data persistence.

Working Memory or Ephemeral State

Working memory handles the operational data of the active conversation. This includes:

Current issue classification and confidence scores
Which agents are actively processing
Solution iterations and attempts
Immediate conversation context and variables

This memory lasts only for the duration of the conversation. It's optimized for speed and flexibility, typically stored in application memory or cache layers. When the conversation ends, this data should be deliberately cleared to prevent memory leaks and maintain performance.

Long-term Memory or Persistent State

Persistent memory forms the system's institutional knowledge:

Customer interaction history and preferences
Previous tickets and their resolutions
Identified patterns and common issues
Agent performance metrics and success rates

This memory must survive system restarts, deployments, and infrastructure changes. It requires a proper database design that considers query patterns, data retention policies, and scaling strategies.

The Implementation Reality

Many teams start by storing everything in working memory because it's simpler, no database setup, no persistence layer, just variables in code. This approach works during development but fails spectacularly in production. The first server restart, deployment, or scaling event erases all context, forcing customers to explain their issues repeatedly.

Planning for Failure (Because It Will Happen)

Production systems face constant failures, LLM timeouts, malformed outputs, and rate limits.

Build resilience from day one:

Retry Logic with Exponential Backoff

When the Technical Support Agent times out, don't immediately retry. Wait 1 second, then 2 seconds, then 4 seconds. This prevents hammering a struggling service. After three failures, move to your fallback strategy. For faster agents, such as the Classifier, allow only one retry with a 500ms delay.

Fallback Paths That Degrade Gracefully

Every specialist needs a backup plan. When the Technical Support Agent fails, route to a General Support Agent with broader knowledge. If that fails too, escalate to a human. Each fallback should be progressively more conservative; it's better to give general guidance than confident but wrong specifics.

Technical Support Agent → General Support Agent → Human Expert

Validation Gates Between Handoffs

Agents produce invalid outputs more often than you'd expect. Gates check for malformed JSON, incomplete responses, and exposed sensitive data before passing outputs to the next agent. Failed validation triggers either a retry with specific feedback or escalation to the fallback path.

Human Escalation with Clear Triggers

Define explicit conditions: confidence below 60%, both primary and fallback agents failed, sensitive scenarios (legal, health, payments), or direct customer request. When escalating, include the full context, including classification, attempted solutions, and the reasons why escalation was triggered. Don't make humans start from scratch.

Building Security Into Your Multi-Agent Architecture

Security in multi-agent systems starts with access control, but it doesn't end there.

Access Levels - The Foundation

We implement strict access tiers based on the principle of least privilege:

PUBLIC:

Response Generator Agent: Only sees sanitized, customer-ready information

INTERNAL:

Classifier Agent, Solution Validator Agent: Access workflow data but no PII or financial records

SENSITIVE (Read-Only):

Billing Agent: Views payment history and subscriptions but cannot modify anything

ADMIN:

Customer Service Orchestrator Agent, Escalation Agent: Coordinate workflows without direct database access

Why this granularity matters: When a customer tries prompt injection to modify their subscription, the Billing Agent literally can't; it has read-only access. The Response Generator can't leak payment details that it never receives. Each agent operates in its own sandbox, so even if one misbehaves, the damage is contained within its own boundaries.

Each agent is sandboxed with exactly the permissions it needs, nothing more. It's not paranoia—it's the principle of least privilege. Even if one agent misbehaves, it can't access data outside its clearance level.

Five Security Controls You Can't Skip

Prompt Injection Protection: Every agent needs input sanitization and output validation. This mitigates the risk of a creative customer asking the Technical Agent to explain the entire infrastructure. Now we isolate system instructions from user input and flag any agent that suddenly ventures outside its domain.

Agent Authentication: How do you know which agent is actually making that request? We use cryptographic signatures—every inter-agent message gets signed and verified. No signature, no processing.

Audit Logging: Each request gets a unique ID that follows it through every agent interaction. We log not just what happened, but why, including confidence scores. Yes, it's a lot of data, but storage is relatively inexpensive, and compliance audits can be expensive.

Data Isolation: Beyond access controls, we enforce actual isolation: network segmentation, short-lived tokens, data masking, the works. Our Response Generator sees "****1234" not full account numbers. Think defense in depth.

Human Escalation: Your agents need to know their limits. We automatically escalate when confidence drops below 70% for sensitive operations or when we observe patterns, such as three failures within five minutes. It's better to involve a human early than to explain an incident later.

Build these controls from the start. Retrofitting security after you're in production is a nightmare that will cost you sleep and budget.

Phase 4: Implementation - Actually Building It

Start with Your Core Three

Don't build all 11 agents at once. Start with:

Request Classifier Agent: Proves the routing concept works
Technical Support Agent: Handles 40% of tickets, biggest impact
Response Generator Agent: Makes sure we sound professional

This minimal system delivers value in week one while you build the rest. Show progress to stakeholders while perfecting the complex pieces.

Implement Basic Orchestration

First part is starts with just a simple flow:

Customer Query → Classifier Agent (determines type) → Technical Support Agent (if technical) → Response Generator Agent → Customer

This basic chain proves your agents can communicate and hand off information. It's deliberately simple—no parallel processing, no complex routing, just A to B to C.

Second part, adding parallel processing:

Customer Query → Three agents working simultaneously (Classifier Agent + Urgency Detector Agent + Knowledge Retrieval Agent) → Technical Support Agent (with enriched context) → Response Generator Agent → Customer

Now the Technical Agent receives classification, priority level, and relevant documentation simultaneously. What took sequential steps now happens in parallel. Response time drops from minutes to seconds, and accuracy improves because the specialist has full context from the start.

Implement Validation Gates

Here's what most teams skip and regret: quality checkpoints between agents. Think of these as safety nets that catch errors before they cascade through your system.

Without validation gates, one agent's mistake becomes everyone's problem. With gates? We catch it and correct course. Set up three critical checkpoints:

Gate 1: Post-Classification - The Confidence Check

Check: Category confidence > 70%?
If no: Activate multiple specialist agents to handle ambiguity
Real example: "I can't access my account and was charged twice" → 60% technical, 40% billing → Activate both agents

Gate Check Example: Classification Confidence:

Input: "I can't log in and also I was charged twice last month"
Classifier Output: 
{
 "primary_category": "technical",
 "confidence": 0.55,
 "secondary_category": "billing",
 "confidence": 0.45
}

Gate Logic:
IF confidence < 0.70 for any category:
 ACTION: Route to BOTH Technical and Billing agents
 FLAG: "Multi-domain request - requires parallel processing"

Gate 2: Post-Solution - The Completeness Check

Check: Solution includes steps, outcome, timeline?
If no: Return to specialist with specific feedback like "Add code example"
Why it matters: "Restart the application" is useless."
1. Click Settings
2. Select Advanced
3. Click Restart (takes 30 seconds)" actually helps

Gate Check Example: Response Completeness

Technical Agent Output: "Try restarting the application"
Validation Check: FAIL - Missing required elements
Required: [Problem diagnosis, Root cause, Numbered steps, Expected outcome, Time estimate]
Action: Return to agent with feedback: "Add step-by-step instructions with expected outcomes"

Gate 3: Pre-Customer - The Safety Check

Check: Appropriate tone? No sensitive data exposed?
If no: Response Generator Agent revises before customer sees it
Saved us from: Accidentally sending internal URLs (caught 3 times) and wrong tone for enterprise clients (caught weekly)

These gates take milliseconds but prevent hours of cleanup. The beauty is that gates create learning opportunities which lead to improvement of our multi-agents.

Gate Check Example: Safety & Tone Validation

Response Generator Output:"Your subscription will renew at $99/month on 2024-03-15. Just cancel it before then if you don't want to be charged. Check your billing at dashboard.internal.staging.com/billing or contact billing@internal-support.com for help."

Validation Check: FAIL - Multiple violations 
Issues Found:
- Staging URL exposed (should be production URL)
- Internal support email shown (not customer-facing)
- Dismissive tone about cancellation ("Just cancel it")
- No empathy for potential pricing concern

Action: 
Return to Response Generator with feedback:
"Replace staging URL with production link, use public support email, adopt helpful tone regarding subscription management"

Corrected Output:
"Your subscription is set to renew at $99/month on March 15, 2024. If you'd like to make any changes to your subscription, you can manage it directly in your account settings at app.company.com/billing. 
We're here to help if you have questions about your plan options - reach out at support@company.com."

What We've Built So Far

You now have a working multi-agent system. Not a complete one, but a real system that routes requests, solves problems, and generates responses, all while maintaining security boundaries and catching errors before they cascade.

Your three-agent foundation (Classifier Agent, Technical Support, Response Generator) is processing tickets. Your validation gates are preventing embarrassing failures. Your memory architecture distinguishes between what needs to persist and what doesn't. Most importantly, you've built this with security and failure handling baked in, not bolted on.

This is the critical insight: you don't need all 11 agents working flawlessly before you ship. You need a secure, resilient foundation that proves the concept and delivers value. Every additional agent you add now follows the same patterns, explicit permissions, validation gates, and fallback paths.

What's Next: From Evaluation to Production

In Part 3, we'll tackle the challenge that kills most AI projects: knowing if your system actually works. You'll learn how to build evaluation systems that catch failures before your customers do, implement phased rollouts that minimize risk, and create monitoring dashboards that actually tell you what's broken.

We'll also delve into advanced patterns, including Multi-Agent RAG for complex queries, continuous learning from production failures, and the metrics that determine whether you're building a valuable system or an expensive experiment.

The difference between a demo and production isn't features, it's confidence.

Part 3 will give you that confidence through rigorous evaluation and battle-tested deployment strategies.

About me

Cristian, Cristian spent over 12 years leading data and software teams in delivering large-scale, complex projects and initiatives exceeding $ 20 M for Fortune 500 companies, including FOX Corporation, Ford, Stellantis, Slalom, and Manheim. At FOX, he scaled Agile delivery across 60+ data professionals, architecting AI/ML solutions, including recommendation engines and identity graph systems.

Now specializing as an AI Product & Delivery Leader focused on AI Agent solutions and enterprise-scale transformations, Cristian is currently building production-ready Multi-Agent AI systems.

Bridging technical depth with business strategy, he completed intensive training in Agentic AI Engineering and AI Product Management, mastering multi-agent architecture design, orchestration, agentic workflows, advanced prompt engineering, and AI agent evaluations.

This unique combination of scaled enterprise delivery experience and hands-on AI agent development enables him to translate complex AI capabilities into measurable business outcomes.

#AI #AIAgents #AIPM #MultiAgents #AgenticAI