Maya: A Multi-Agent Architecture for Conversational AI
Best resource I found on this topic: Shrivu’s post on building multi-agent systems.
The monolithic prompt problem
Maya started as one mega-prompt: “You are a yoga instructor who can search poses, create sequences, understand anatomy, be warm and encouraging, don’t forget to validate input, oh and also remember to…”
2000 tokens of instructions. The AI would forget what it was supposed to do halfway through conversations.
Approaches I considered
- The Monolith - One prompt does everything. Simple to start, nightmare to maintain. Where I began.
- Tool-Calling Agent - Single agent that calls different functions. What OpenAI’s Assistants API demonstrates. Good for simpler use cases.
- Orchestrated Specialists - Separate agents for separate concerns. Intent recognition split from response generation. What Maya became.
- Agent Frameworks - LangChain, LangGraph. Agents delegating to agents. Powerful but often overkill.
I landed on orchestrated specialists. AI models evolve fast, and heavy abstraction layers become tech debt when the underlying models improve every few months. I wanted something I could understand and swap out individually.
The architecture
Maya is now five agents, each with one job:
// The conductor: Main orchestration service
export async function handleMessage(
message: string,
context?: { recentMessages?: string[] },
userId?: string
): Promise<MayaResponse> {
// 1. Intent Agent: What does the user want?
const intent = await analyzeIntent(message);
// 2. Tool Agents: Execute the appropriate action
const data = await executeAction(intent);
// 3. Response Agent: Generate conversational response
const response = await generateResponse({ userMessage, action, data });
// 4. Suggestions Agent: Create contextual follow-ups
const suggestions = await generateSuggestions(message, response, action);
return { response, suggestions, data };
}
Based on the call center pattern from Shrivu’s blog: an intent agent routes requests to specialists, each building on the previous agent’s output.
The intent agent
The most important agent. Instead of one prompt trying to understand AND respond, this one does nothing but classify what the user wants.
const INTENT_ANALYSIS_SCHEMA = {
type: Type.OBJECT,
properties: {
action: {
type: Type.STRING,
enum: ["create_sequence", "search_poses", "search_sequences",
"search_classes", "general_chat"],
description: "The primary intent of the user message"
},
params: {
type: Type.OBJECT,
properties: {
name: { type: Type.STRING },
sequenceType: { type: Type.STRING, enum: SEQUENCE_TYPES },
difficulty: { type: Type.STRING, enum: DIFFICULTY_LEVELS },
targetDurationMinutes: { type: Type.NUMBER },
intensity: { type: Type.STRING, enum: INTENSITY_LEVELS },
focusAreas: { type: Type.ARRAY, items: { type: Type.STRING, enum: FOCUS_AREAS } }
}
},
searchParams: {
type: Type.OBJECT,
properties: {
nameSearch: { type: Type.STRING },
difficulty: { type: Type.STRING, enum: DIFFICULTY_LEVELS },
poseType: { type: Type.STRING, enum: POSE_TYPES },
// ... other search parameters
}
}
}
};
and the prompt is structured like this:
const INTENT_PROMPT = `Analyze this yoga-related message and determine the user's intent.
User message: "${message}"
Determine ONE of these intents:
1. create_sequence - User wants to CREATE/MAKE/BUILD a NEW sequence
2. search_poses - User asking about specific poses
3. search_sequences - User looking for existing sequences
4. search_classes - User looking for yoga classes
5. general_chat - General conversation or questions
Important distinctions:
- "make me a new sequence" = create_sequence
- "show me sequences" = search_sequences
- "what sequences do you have" = search_sequences
- "create a flow for hamstrings" = create_sequence
- "find hip opening poses" = search_poses
Return JSON matching the schema...`;
Structured output schemas make classification reliable. One thing I didn’t expect: even with constrained JSON output, adding concrete examples to the prompt (“make me a sequence” = create, “show me sequences” = search) improved accuracy a lot.
The other agents
Intent Agent
Classifies user intent and extracts parameters
- Input: Raw user message
- Output: Structured intent with typed parameters
- Prompt size: ~500 tokens
Search Agents
Query the database with extracted parameters
- Input: Typed search parameters
- Output: Database results
- No AI involved - pure database operations
Response Agent
Generates Maya’s conversational response
- Input: User message + action taken + data found
- Output: 2-3 sentence warm response
- Prompt size: ~300 tokens
Suggestions Agent
Creates contextual follow-ups
- Input: User message + Maya’s response + context
- Output: 4 natural follow-up suggestions
- Prompt size: ~200 tokens
Generation Agent
Creates new yoga sequences
- Input: Sequence parameters
- Output: Complete sequence with poses, transitions, and cues
- Uses function calling to search pose database first
- Prompt size: ~800 tokens
Orchestration
The coordinator is boring on purpose. Route to the right agent, pass context, handle errors:
switch (intent.action) {
case 'create_sequence':
responseData.shouldGenerateSequence = true;
responseData.sequenceParams = intent.params;
break;
case 'search_poses':
const poseResults = await searchPoses(intent.searchParams);
responseData.foundPoses = poseResults.poses;
usedDatabase = true;
break;
case 'search_sequences':
const sequenceResults = await searchSequences(intent.searchParams, userId);
responseData.foundSequences = sequenceResults.sequences;
usedDatabase = true;
break;
// ... other cases
}
Validation
After the generation agent creates a sequence, I validate every pose against the database:
export async function validateSequencePoses(
aiPoses: AIPoseInput[],
databasePoses: FormattedPoseData[],
) {
const databasePoseNames = new Set(databasePoses.map(p => p.nameEn));
const validatedPoses = [];
const invalidPoses = [];
for (const aiPose of aiPoses) {
if (databasePoseNames.has(aiPose.nameEn)) {
validatedPoses.push(/* validated pose */);
} else {
invalidPoses.push(aiPose.nameEn);
}
}
if (validatedPoses.length < 3) {
throw new AIResponseError(
`Only ${validatedPoses.length} valid poses found (minimum 3 required)`
);
}
return validatedPoses;
}
Why not constrain with enums? With 300+ poses, Gemini’s enum constraints become unreliable beyond 80-100 values (and the token cost is painful). So the generation agent searches the database via function calling first, then I validate its output as a safety net.
Results
While I didn’t rigorously benchmark everything, the improvements were clear:
- Responses feel faster since agents can work in parallel
- Token usage is definitely lower with smaller, focused prompts
- Intent classification works reliably (no more “I don’t understand” loops)
- Changes are isolated - I can update Maya’s personality without breaking search
- Debugging is straightforward - errors point to specific agents
What I learned
Structured output schemas from day one. I wasted time parsing free-text LLM responses before switching to typed JSON. The intent agent alone eliminated the “I don’t understand” loops that plagued the monolithic prompt.
Five focused agents outperform one generalist, especially with smaller context windows. The coordinator doesn’t need to be clever, it needs to be reliable. And 300-token prompts are easier to debug than a 2000-token monster.
Always validate AI output against your source of truth. The generation agent invents plausible-sounding poses. The validation layer catches them before they reach the user.
What’s next
Adding new agents doesn’t require rewriting anything, you add another specialist to the chain. The ones I’m planning:
- Memory Agent - track user preferences and progress over time
- Recommendation Agent - suggest practices based on history and goals
- Personalization Agent - adapt Maya’s tone to user preferences
When Maya suggests “Downward Facing Elephant” as a real pose, I know exactly which agent to fix. That’s the whole point of splitting it up.
Maya is live at joshdesk.live. Built with Bun, Elysia, Google’s Gemini, and a lot of careful orchestration.
Liked this? Get an email when I publish a new post.
Powered by Buttondown