The previous three projects — inbox triage, SQL query, RAG document — each solved one problem with one agent. This project asks a different question: what happens when one agent isn't enough?
Customer support is messy. Someone asks about their order status. Someone else wants to know the return policy. A third person is angry about a broken product. These are fundamentally different problems that require different knowledge, different tools, and different tones.
One monolithic agent trying to handle all of this becomes a confused generalist. It hallucinates order statuses because it doesn't have database access. It makes up policies because it doesn't have the docs. It responds to complaints with the same cheerful tone it uses for FAQs.
The answer is specialization.
The architecture
Customer message
│
▼
┌─────────────────────────┐
│ │
│ ORCHESTRATOR AGENT │
│ │
│ Reads the message, │
│ understands intent, │
│ routes to the right │
│ specialist. │
│ │
└─────┬─────┬─────┬──────┘
│ │ │
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────────┐
│ORDER│ │ FAQ │ │COMPLAINT│
│AGENT│ │AGENT│ │ AGENT │
└──┬──┘ └──┬──┘ └────┬────┘
│ │ │
▼ ▼ ▼
SQL RAG Empathy
Query Search + Logging
on DB on Docs + Escalation
The orchestrator doesn't answer anything itself. It's a router. It reads the customer's message, figures out what kind of problem it is, and hands it to the agent that's built for that exact job.
The three specialists
Order Agent — has database access. When someone asks "where's my order?" it doesn't guess. It runs a real SQL query against the orders table, finds the actual status, and responds with facts. This is the SQL Query Agent pattern from the previous project, repurposed as a worker.
FAQ Agent — has document access. Company policies, return windows, shipping info, working hours — all embedded as vectors in pgvector. When someone asks "what's your return policy?" it retrieves the relevant chunks and answers from real documentation. This is the RAG pattern, repurposed as a worker.
Complaint Agent — has no tools. Its job is emotional, not informational. It acknowledges the frustration, apologizes sincerely, logs the complaint, and escalates to a human. The system prompt is tuned for empathy, not efficiency. Sometimes the right answer isn't an answer — it's being heard.
How routing works
The orchestrator uses the LLM to classify intent. Not keywords — intent. "My package never arrived" and "order #4521 status" both route to the Order Agent, even though they share zero words. "This is unacceptable, I've been waiting three weeks" routes to Complaint, not Order — because the intent is frustration, not information.
"Where is my order #4521?"
→ Orchestrator: this is an order inquiry
→ Routes to: Order Agent
→ Order Agent: SELECT * FROM orders WHERE id = 4521
→ "Your order #4521 is currently in transit.
Expected delivery: April 5th."
"What's your return policy?"
→ Orchestrator: this is a policy question
→ Routes to: FAQ Agent
→ FAQ Agent: retrieves return policy chunks
→ "You can return any item within 30 days
of delivery for a full refund."
"I've been waiting 3 weeks and nobody responds"
→ Orchestrator: this is a complaint
→ Routes to: Complaint Agent
→ "I'm really sorry about this experience.
Three weeks without a response is not
acceptable. I've escalated this to our
support team — someone will reach out
to you within 24 hours."
Why this matters
Single-agent systems hit a ceiling fast. The moment you need database access AND document retrieval AND emotional intelligence in the same conversation, one agent with one system prompt can't do it well.
Multi-agent systems solve this by letting each agent be excellent at one thing. The orchestrator is excellent at routing. The order agent is excellent at SQL. The FAQ agent is excellent at retrieval. The complaint agent is excellent at empathy.
This is how real production AI systems work. Not one giant model doing everything — a team of focused agents, each with their own tools, their own context, their own personality.
Architecture decisions
Why an orchestrator instead of a classifier?
A traditional classifier maps input to a fixed set of categories. An LLM orchestrator understands nuance — it can handle "I ordered the wrong size, can I exchange it?" which is both an order question AND a policy question. The orchestrator can route to FAQ first (exchange policy), then to Order (initiate the exchange). Sequential routing, not just classification.
Why separate system prompts per agent?
The complaint agent needs to be warm and apologetic. The order agent needs to be precise and factual. These are contradictory personalities. One system prompt can't be both empathetic and clinical. Separation lets each agent have the exact tone its job requires.
Why reuse the patterns from previous projects?
The Order Agent is literally the SQL Query Agent with a customer support system prompt. The FAQ Agent is literally the RAG Document Agent pointed at support docs. Building agents as composable patterns — not monolithic applications — means every new system you build gets faster because you're assembling, not rebuilding.
What this demonstrates
- Multi-agent orchestration — routing, not just responding
- Agent specialization — each agent has its own tools, context, and personality
- Pattern composition — SQL agent + RAG agent + empathy agent, assembled into a system
- Intent-based routing — LLM understands what the customer needs, not just what they said
- The ceiling of single-agent systems — and how to break through it
Tech stack
| Layer | Technology |
|---|---|
| Language | Java 17 |
| Framework | Spring Boot 3.3.4 |
| AI integration | Spring AI 1.0 |
| LLM | Llama 3.3 via Groq |
| Embeddings | nomic-embed-text via Ollama |
| Vector store | pgvector |
| Database | PostgreSQL |
| Build | Maven |
What's next
- Conversation memory — maintain context across multiple messages in a session
- Agent handoff protocol — let agents transfer context to each other mid-conversation
- Feedback loop — track which routes were correct, improve orchestrator over time
- Human escalation API — complaint agent triggers real support ticket creation
- Load balancing — route to different LLM providers based on latency and cost