Maya: A Multi-Agent Architecture for Conversational AI

Best resource I found on this topic: Shrivu’s post on building multi-agent systems.

The monolithic prompt problem

Maya started as one mega-prompt: “You are a yoga instructor who can search poses, create sequences, understand anatomy, be warm and encouraging, don’t forget to validate input, oh and also remember to…”

2000 tokens of instructions. The AI would forget what it was supposed to do halfway through conversations.

Approaches I considered

The Monolith - One prompt does everything. Simple to start, nightmare to maintain. Where I began.
Tool-Calling Agent - Single agent that calls different functions. What OpenAI’s Assistants API demonstrates. Good for simpler use cases.
Orchestrated Specialists - Separate agents for separate concerns. Intent recognition split from response generation. What Maya became.
Agent Frameworks - LangChain, LangGraph. Agents delegating to agents. Powerful but often overkill.

I landed on orchestrated specialists. AI models evolve fast, and heavy abstraction layers become tech debt when the underlying models improve every few months. I wanted something I could understand and swap out individually.

The architecture

Maya is now five agents, each with one job:

// The conductor: Main orchestration service
export async function handleMessage(
  message: string,
  context?: { recentMessages?: string[] },
  userId?: string
): Promise<MayaResponse> {
  // 1. Intent Agent: What does the user want?
  const intent = await analyzeIntent(message);

  // 2. Tool Agents: Execute the appropriate action
  const data = await executeAction(intent);

  // 3. Response Agent: Generate conversational response
  const response = await generateResponse({ userMessage, action, data });

  // 4. Suggestions Agent: Create contextual follow-ups
  const suggestions = await generateSuggestions(message, response, action);

  return { response, suggestions, data };
}

Based on the call center pattern from Shrivu’s blog: an intent agent routes requests to specialists, each building on the previous agent’s output.

The intent agent

The most important agent. Instead of one prompt trying to understand AND respond, this one does nothing but classify what the user wants.

const INTENT_ANALYSIS_SCHEMA = {
  type: Type.OBJECT,
  properties: {
    action: {
      type: Type.STRING,
      enum: ["create_sequence", "search_poses", "search_sequences",
             "search_classes", "general_chat"],
      description: "The primary intent of the user message"
    },
    params: {
      type: Type.OBJECT,
      properties: {
        name: { type: Type.STRING },
        sequenceType: { type: Type.STRING, enum: SEQUENCE_TYPES },
        difficulty: { type: Type.STRING, enum: DIFFICULTY_LEVELS },
        targetDurationMinutes: { type: Type.NUMBER },
        intensity: { type: Type.STRING, enum: INTENSITY_LEVELS },
        focusAreas: { type: Type.ARRAY, items: { type: Type.STRING, enum: FOCUS_AREAS } }
      }
    },
    searchParams: {
      type: Type.OBJECT,
      properties: {
        nameSearch: { type: Type.STRING },
        difficulty: { type: Type.STRING, enum: DIFFICULTY_LEVELS },
        poseType: { type: Type.STRING, enum: POSE_TYPES },
        // ... other search parameters
      }
    }
  }
};

and the prompt is structured like this:

const INTENT_PROMPT = `Analyze this yoga-related message and determine the user's intent.

User message: "${message}"

Determine ONE of these intents:
1. create_sequence - User wants to CREATE/MAKE/BUILD a NEW sequence
2. search_poses - User asking about specific poses
3. search_sequences - User looking for existing sequences
4. search_classes - User looking for yoga classes
5. general_chat - General conversation or questions

Important distinctions:
- "make me a new sequence" = create_sequence
- "show me sequences" = search_sequences
- "what sequences do you have" = search_sequences
- "create a flow for hamstrings" = create_sequence
- "find hip opening poses" = search_poses

Return JSON matching the schema...`;

Structured output schemas make classification reliable. One thing I didn’t expect: even with constrained JSON output, adding concrete examples to the prompt (“make me a sequence” = create, “show me sequences” = search) improved accuracy a lot.

The other agents

Intent Agent

Classifies user intent and extracts parameters

Input: Raw user message
Output: Structured intent with typed parameters
Prompt size: ~500 tokens

Search Agents

Query the database with extracted parameters

Input: Typed search parameters
Output: Database results
No AI involved - pure database operations

Response Agent

Generates Maya’s conversational response

Input: User message + action taken + data found
Output: 2-3 sentence warm response
Prompt size: ~300 tokens

Suggestions Agent

Creates contextual follow-ups

Input: User message + Maya’s response + context
Output: 4 natural follow-up suggestions
Prompt size: ~200 tokens

Generation Agent

Creates new yoga sequences

Input: Sequence parameters
Output: Complete sequence with poses, transitions, and cues
Uses function calling to search pose database first
Prompt size: ~800 tokens

Orchestration

The coordinator is boring on purpose. Route to the right agent, pass context, handle errors:

switch (intent.action) {
  case 'create_sequence':
    responseData.shouldGenerateSequence = true;
    responseData.sequenceParams = intent.params;
    break;

  case 'search_poses':
    const poseResults = await searchPoses(intent.searchParams);
    responseData.foundPoses = poseResults.poses;
    usedDatabase = true;
    break;

  case 'search_sequences':
    const sequenceResults = await searchSequences(intent.searchParams, userId);
    responseData.foundSequences = sequenceResults.sequences;
    usedDatabase = true;
    break;

  // ... other cases
}

Validation

After the generation agent creates a sequence, I validate every pose against the database:

export async function validateSequencePoses(
  aiPoses: AIPoseInput[],
  databasePoses: FormattedPoseData[],
) {
  const databasePoseNames = new Set(databasePoses.map(p => p.nameEn));
  const validatedPoses = [];
  const invalidPoses = [];

  for (const aiPose of aiPoses) {
    if (databasePoseNames.has(aiPose.nameEn)) {
      validatedPoses.push(/* validated pose */);
    } else {
      invalidPoses.push(aiPose.nameEn);
    }
  }

  if (validatedPoses.length < 3) {
    throw new AIResponseError(
      `Only ${validatedPoses.length} valid poses found (minimum 3 required)`
    );
  }

  return validatedPoses;
}

Why not constrain with enums? With 300+ poses, Gemini’s enum constraints become unreliable beyond 80-100 values (and the token cost is painful). So the generation agent searches the database via function calling first, then I validate its output as a safety net.

Results

While I didn’t rigorously benchmark everything, the improvements were clear:

Responses feel faster since agents can work in parallel
Token usage is definitely lower with smaller, focused prompts
Intent classification works reliably (no more “I don’t understand” loops)
Changes are isolated - I can update Maya’s personality without breaking search
Debugging is straightforward - errors point to specific agents

What I learned

Structured output schemas from day one. I wasted time parsing free-text LLM responses before switching to typed JSON. The intent agent alone eliminated the “I don’t understand” loops that plagued the monolithic prompt.

Five focused agents outperform one generalist, especially with smaller context windows. The coordinator doesn’t need to be clever, it needs to be reliable. And 300-token prompts are easier to debug than a 2000-token monster.

Always validate AI output against your source of truth. The generation agent invents plausible-sounding poses. The validation layer catches them before they reach the user.

What’s next

Adding new agents doesn’t require rewriting anything, you add another specialist to the chain. The ones I’m planning:

Memory Agent - track user preferences and progress over time
Recommendation Agent - suggest practices based on history and goals
Personalization Agent - adapt Maya’s tone to user preferences

When Maya suggests “Downward Facing Elephant” as a real pose, I know exactly which agent to fix. That’s the whole point of splitting it up.

Maya is live at joshdesk.live. Built with Bun, Elysia, Google’s Gemini, and a lot of careful orchestration.