Why I Built a Multi-LLM Orchestration System (And You Might Want One Too)

The Cost Problem Nobody Talks About

Last October, I looked at my Anthropic bill and felt my stomach drop. $487 for a single month. I was using Claude Opus for everything: research, code reviews, simple questions, file exploration. Every task, regardless of complexity, went straight to the most expensive model.

The thing is, Claude Opus is genuinely incredible. It's the best reasoning model I've used. But here's what I realized: 80% of my tasks didn't need that level of capability. Asking Opus to find files in my codebase is like hiring a Formula 1 driver to pick up groceries.

That realization started a three-month journey that changed how I work with AI. I built a system called Claude Swarm that routes tasks to the cheapest capable model, orchestrates multiple LLMs working in parallel, and reduced my monthly AI costs from ~$500 to ~$40.

This isn't about being cheap. It's about being smart. When you're paying per token, cost optimization becomes architecture design.

The Evolution: From Copy-Paste to Orchestration

My AI journey mirrors what I've seen across the industry:

2023: The ChatGPT Era. Copy-paste prompts. "Write me a function that does X." Context lost between conversations. Useful, but primitive.

2024: Claude Code Changes Everything. Anthropic released Claude Code, and suddenly AI could see my entire codebase. Custom instructions in CLAUDE.md. MCP servers connecting to GitHub, databases, filesystems. This was when AI went from "helpful tool" to "pair programmer."

Early 2025: The Complexity Ceiling. Projects got bigger. I found myself running missions that took hours: "Research this technology, design a system, implement it, test it, document it." A single Claude conversation couldn't maintain context across all those phases. I needed something more.

Late 2025: Claude Swarm. The hybrid multi-LLM orchestration system. Claude Opus coordinates. Claude Sonnet implements. Gemini Flash researches. Each model does what it's best at. Workers run in parallel. Missions execute across multiple phases with automatic checkpointing.

The Stack: What I Actually Use

Here's my complete AI-assisted development setup:

Claude Code Foundation

Everything starts with CLAUDE.md, a file in my project root that Claude Code reads automatically. Mine is over 2,000 lines, containing:

Project architecture and conventions
Available commands and workflows
Cost optimization guidelines
Integration instructions for the swarm system

This persistent context means Claude starts every conversation understanding my project deeply.

MCP Servers (8 Integrations)

Model Context Protocol servers extend Claude's capabilities:

Server	What It Does
GitHub	PR creation, issue management, code search
Git	Commits, branches, diffs without leaving the conversation
Filesystem	File operations with proper sandboxing
PostgreSQL	Natural language database queries
Redis	Cache operations
Memory	Persistent knowledge graph across sessions
Brave Search	Web research when needed
Context7	Documentation lookup for libraries

Each server adds capabilities without me copying and pasting data around.

Custom Skills (5 Slash Commands)

I built slash commands for repetitive workflows:

/commit-push-pr: After finishing work, one command stages changes, generates a commit message following conventional commits, pushes the branch, and creates a PR with a proper description. What used to take 5 minutes now takes 30 seconds.

/verify-build: Runs the entire CI pipeline locally (type-check, lint, test, build), and if anything fails, automatically fixes it and re-runs. It loops until everything passes.

/code-simplify: Analyzes recently modified files for code smells: unnecessary complexity, redundancy, poor naming. Applies fixes while verifying tests still pass.

/research-web: Routes research tasks to Gemini Flash (186x cheaper than Claude) and returns structured findings with citations.

/spawn-workers: Dispatches multiple parallel tasks to different models simultaneously. "Research auth patterns, implement login, review security" becomes three workers running in parallel.

PostToolUse Hooks

A hook runs after every file edit, auto-formatting code based on language:

Language	Formatter
TypeScript/JS	Prettier (fallback: Biome)
Python	Black (fallback: Ruff)
Rust	rustfmt
Go	gofmt
JSON	jq
YAML	yq

This means AI-generated code is always formatted correctly. No more style inconsistencies.

Claude Swarm: The Orchestration Layer

The swarm system coordinates everything:

Mission files define multi-phase projects in markdown
The coordinator (Claude Opus) reads missions and decomposes work
Workers (various models) execute tasks in parallel
Results are synthesized and phases advance automatically
Checkpoints save state, allowing mission resumption after interruptions

A typical mission has 3-5 phases: Discovery, Design, Implementation, Testing, Documentation. The coordinator spawns workers for each phase, waits for completion, and advances.

The Cost Optimization Strategy

This is where things get interesting. Here's the per-million-token pricing:

Model	Input	Output	vs Opus
Gemini Flash	$0.10	$0.40	186x cheaper
Gemini Pro	$1.25	$5.00	12x cheaper
Claude Haiku	$0.25	$1.25	60x cheaper
Claude Sonnet	$3.00	$15.00	5x cheaper
Claude Opus	$15.00	$75.00	baseline

Gemini Flash is 186 times cheaper than Claude Opus. For research tasks, exploration, and summarization, it's nearly as capable. That insight changed my architecture.

The Hybrid Routing System

Every task gets classified by keywords:

Research Tasks → Gemini Flash Keywords: "research", "find", "explore", "documentation", "what is" Cost per typical task: ~$0.01

Code Review → Gemini Pro Keywords: "review", "analyze", "evaluate", "quality" Cost per typical task: ~$0.05

Implementation → Claude Sonnet Keywords: "implement", "write", "create", "fix", "edit" Cost per typical task: ~$0.15

Architecture → Claude Opus Keywords: "design", "architect", "strategic", "critical" Cost per typical task: ~$0.50

The system automatically routes based on task type. I don't have to think about which model to use.

Real Numbers

Before (all Claude Opus): ~$500/month After (hybrid routing): ~$40/month Savings: 92%

The breakdown:

35% of tasks are research/exploration → Gemini Flash
15% are code review → Gemini Pro
35% are implementation → Claude Sonnet
10% are simple queries → Claude Haiku
5% are strategic/architecture → Claude Opus

A Day in the Life

Here's how this actually works during a typical development day:

Morning (Research Phase)

/spawn-workers research React 19 features, explore current auth implementation, summarize competitor APIs

Three Gemini Flash workers spin up in parallel. Within 2 minutes, I have:

Structured research on React 19 with citations
A map of my current auth code
Competitor API analysis

Cost: ~$0.03 total. With Claude Opus alone, this would have been ~$1.50.

Midday (Implementation)

Now I implement based on the research. Claude Sonnet handles the actual code writing. I describe what I want, it writes the code, the PostToolUse hook formats it, and I review.

For complex features, I write a mission file:

# Mission: Add OAuth Integration

## Phase 1: Design
- Review research from morning
- Design OAuth flow architecture
- Define API contracts

## Phase 2: Implementation
- Implement OAuth provider abstraction
- Add Google/GitHub providers
- Write integration tests

## Phase 3: Testing
- Run full test suite
- Security review
- Performance testing

The coordinator executes each phase, spawning workers as needed.

Afternoon (Validation)

/verify-build

The CI loop runs. If tests fail, it fixes them. If linting fails, it fixes that too. Loops until everything passes.

End of Day

/commit-push-pr Add OAuth integration

One command. Clean PR with proper description. Done.

Results: What Actually Changed

After three months with this system:

Cost: $487/month → $38/month (92% reduction)

Speed: Research tasks that took 10 minutes now take 2 minutes with parallel workers. Overall, I estimate a 3-5x speedup for research-heavy work.

Quality: Using the right model for each task actually improves outputs. Gemini Flash is excellent at research. Claude Sonnet is excellent at implementation. Opus is excellent at architecture. Playing to strengths beats using one model for everything.

Consistency: Auto-formatting eliminates style debates. CI validation loops catch issues before they reach PR review. The workflow is predictable.

The Honest Limitations

This system isn't for everyone. Here's what's hard:

Setup Complexity: This took months to build. MCP servers need configuration. Custom skills need development. The coordinator system has failure modes. It's not beginner-friendly.

Debugging Multi-LLM Failures: When something goes wrong, was it Claude? Gemini? The routing logic? A timeout? The handoff between models? Debugging distributed systems is harder than debugging a single agent.

Different Model Strengths: Claude and Gemini aren't interchangeable. Claude is better at nuanced code generation. Gemini is faster but sometimes less thorough. Knowing when routing fails you matters.

Context Fragmentation: Each model maintains its own context. The coordinator synthesizes, but information can get lost between workers. Long missions require careful context engineering.

Overkill for Simple Projects: If you're building a todo app, you don't need this. The overhead of mission files and worker coordination isn't worth it for small projects.

How to Get Started

If this interests you, here's the progression I'd recommend:

Week 1: Claude Code + CLAUDE.md

Start with Claude Code and a well-written CLAUDE.md file. Document your project architecture, coding conventions, and common workflows. This alone is a 10x improvement over raw ChatGPT.

Week 2-3: Add MCP Servers

One at a time. Start with GitHub (if you use GitHub). Then Git for direct commits. Then Filesystem for file operations. Each integration reduces friction.

Week 4+: Build Your First Custom Skill

Pick your most repetitive workflow. For me it was the git commit→push→PR flow. Encode it as a slash command. Save 5 minutes every time you use it.

When Costs Matter: Consider Hybrid

If you're spending $100+/month on AI and doing significant research or exploration, hybrid routing pays off. The initial investment is high, but the monthly savings are real.

What's Next

I'm continuing to evolve Claude Swarm. Current experiments:

Gemini Conductor for automated planning phases
Jules (Gemini's async coding agent) for routine refactoring
Better context handoff between models
Mission templates for common project types

The future isn't one AI assistant. It's an orchestrated team of specialized models, each doing what it's best at, coordinated by systems we build.

If you've built something similar, I'd love to hear about it. And if you have questions about any part of this setup, ask away.

Appendix: Reproducible Examples

Here are concrete snippets you can adapt for your own setup.

Sample CLAUDE.md Configuration

This is a simplified version of my model tier configuration:

## Model Tiers

### Claude Models
| Tier | Model | Input | Output | Use For |
|------|-------|-------|--------|---------|
| **Haiku** | claude-haiku-3-5 | $0.25/M | $1.25/M | Read, search, summarize |
| **Sonnet** | claude-sonnet-4 | $3.00/M | $15.00/M | Write code, edit, debug |
| **Opus** | claude-opus-4 | $15.00/M | $75.00/M | Architecture, coordination |

### Gemini Models (Hybrid Mode)
| Tier | Model | Input | Output | Use For |
|------|-------|-------|--------|---------|
| **Flash** | gemini-2.5-flash | $0.10/M | $0.40/M | Research, exploration |
| **Pro** | gemini-2.5-pro | $1.25/M | $5.00/M | Deep analysis, synthesis |

## Cost Optimization Strategy

- Use Gemini Flash for research/explore (~35% of tasks)
- Use Gemini Pro for review/audit (~15% of tasks)
- Use Claude Haiku for simple tasks (~10%)
- Use Claude Sonnet for implementation (~35%)
- Reserve Claude Opus for coordination (~5%)
- **Target: 92% cost savings** vs Claude-only workflows

Sample Mission File

Here's a real mission file structure:

# Mission: Fix Icon Display Bug

## Objective
Fix the issue where icon strings like "solar:calendar-bold" display
as text instead of rendering as actual icons.

## Background
The settings page uses Solar icons, but the SkillSettingsSection
component displays the icon prop as raw text.

## Phase 1: Investigate & Fix
**Objective**: Find and fix the component

### Tasks
1. Locate SkillSettingsSection component
2. Check how the `icon` prop is being used
3. Update to use the Icon component for rendering
4. Verify the fix works in the UI

### Deliverable
- Fixed component that properly renders icons

## Success Criteria
- [ ] Icons render visually instead of showing text
- [ ] All skill sections show proper icons
- [ ] No TypeScript errors

## Constraints
- Use existing Icon component
- Minimal changes - only fix the rendering issue

Run it with: ./swarm/bin/swarm run swarm/missions/FIX-ICONS.md

How /commit-push-pr Works

The custom skill automates this 6-step workflow:

# Step 1: Check state
git status --porcelain
git branch --show-current

# Step 2: Stage changes
git add -A
git diff --cached --stat

# Step 3: Create commit with conventional message
git commit -m "feat: Add user authentication

Co-Authored-By: Claude <noreply@anthropic.com>"

# Step 4: Push to remote
git push -u origin feature/auth

# Step 5: Create PR
gh pr create --title "Add user authentication" --body "
## Summary
- Implement login form
- Add password validation
- Create session management
"

# Step 6: Return PR URL

Safety guardrails built in:

Never force-push to main/master
Never skip pre-commit hooks
Stop if merge conflicts detected

This transforms a 5-step manual process into one command.

Want to see the actual code? The Claude Swarm system is at github.com/joeyhipolito/orchestrator-swarm. The CLAUDE.md file alone is worth reading if you're setting up Claude Code for the first time.

Building EA: Architecture Decisions for a Production AI Assistant — The production AI system I built using these workflows
LifeOS: Building an AI-Powered Personal Operating System — Applying the same AI skill architecture to personal knowledge management