The Cost Problem Nobody Talks About
Last October, I looked at my Anthropic bill and felt my stomach drop. $487 for a single month. I was using Claude Opus for everything: research, code reviews, simple questions, file exploration. Every task, regardless of complexity, went straight to the most expensive model.
The thing is, Claude Opus is genuinely incredible. It's the best reasoning model I've used. But here's what I realized: 80% of my tasks didn't need that level of capability. Asking Opus to find files in my codebase is like hiring a Formula 1 driver to pick up groceries.
That realization started a three-month journey that changed how I work with AI. I built a system called Claude Swarm that routes tasks to the cheapest capable model, orchestrates multiple LLMs working in parallel, and reduced my monthly AI costs from ~$500 to ~$40.
This isn't about being cheap. It's about being smart. When you're paying per token, cost optimization becomes architecture design.
The Evolution: From Copy-Paste to Orchestration
My AI journey mirrors what I've seen across the industry:
2023: The ChatGPT Era. Copy-paste prompts. "Write me a function that does X." Context lost between conversations. Useful, but primitive.
2024: Claude Code Changes Everything. Anthropic released Claude Code, and suddenly AI could see my entire codebase. Custom instructions in CLAUDE.md. MCP servers connecting to GitHub, databases, filesystems. This was when AI went from "helpful tool" to "pair programmer."
Early 2025: The Complexity Ceiling. Projects got bigger. I found myself running missions that took hours: "Research this technology, design a system, implement it, test it, document it." A single Claude conversation couldn't maintain context across all those phases. I needed something more.
Late 2025: Claude Swarm. The hybrid multi-LLM orchestration system. Claude Opus coordinates. Claude Sonnet implements. Gemini Flash researches. Each model does what it's best at. Workers run in parallel. Missions execute across multiple phases with automatic checkpointing.
The Stack: What I Actually Use
Here's my complete AI-assisted development setup:
Claude Code Foundation
Everything starts with CLAUDE.md, a file in my project root that Claude Code reads automatically. Mine is over 2,000 lines, containing:
- Project architecture and conventions
- Available commands and workflows
- Cost optimization guidelines
- Integration instructions for the swarm system
This persistent context means Claude starts every conversation understanding my project deeply.
MCP Servers (8 Integrations)
Model Context Protocol servers extend Claude's capabilities:
| Server | What It Does |
|---|---|
| GitHub | PR creation, issue management, code search |
| Git | Commits, branches, diffs without leaving the conversation |
| Filesystem | File operations with proper sandboxing |
| PostgreSQL | Natural language database queries |
| Redis | Cache operations |
| Memory | Persistent knowledge graph across sessions |
| Brave Search | Web research when needed |
| Context7 | Documentation lookup for libraries |
Each server adds capabilities without me copying and pasting data around.
Custom Skills (5 Slash Commands)
I built slash commands for repetitive workflows:
/commit-push-pr: After finishing work, one command stages changes, generates a commit message following conventional commits, pushes the branch, and creates a PR with a proper description. What used to take 5 minutes now takes 30 seconds.
/verify-build: Runs the entire CI pipeline locally (type-check, lint, test, build), and if anything fails, automatically fixes it and re-runs. It loops until everything passes.
/code-simplify: Analyzes recently modified files for code smells: unnecessary complexity, redundancy, poor naming. Applies fixes while verifying tests still pass.
/research-web: Routes research tasks to Gemini Flash (186x cheaper than Claude) and returns structured findings with citations.
/spawn-workers: Dispatches multiple parallel tasks to different models simultaneously. "Research auth patterns, implement login, review security" becomes three workers running in parallel.
PostToolUse Hooks
A hook runs after every file edit, auto-formatting code based on language:
| Language | Formatter |
|---|---|
| TypeScript/JS | Prettier (fallback: Biome) |
| Python | Black (fallback: Ruff) |
| Rust | rustfmt |
| Go | gofmt |
| JSON | jq |
| YAML | yq |
This means AI-generated code is always formatted correctly. No more style inconsistencies.
Claude Swarm: The Orchestration Layer
The swarm system coordinates everything:
- Mission files define multi-phase projects in markdown
- The coordinator (Claude Opus) reads missions and decomposes work
- Workers (various models) execute tasks in parallel
- Results are synthesized and phases advance automatically
- Checkpoints save state, allowing mission resumption after interruptions
A typical mission has 3-5 phases: Discovery, Design, Implementation, Testing, Documentation. The coordinator spawns workers for each phase, waits for completion, and advances.
The Cost Optimization Strategy
This is where things get interesting. Here's the per-million-token pricing:
| Model | Input | Output | vs Opus |
|---|---|---|---|
| Gemini Flash | $0.10 | $0.40 | 186x cheaper |
| Gemini Pro | $1.25 | $5.00 | 12x cheaper |
| Claude Haiku | $0.25 | $1.25 | 60x cheaper |
| Claude Sonnet | $3.00 | $15.00 | 5x cheaper |
| Claude Opus | $15.00 | $75.00 | baseline |
Gemini Flash is 186 times cheaper than Claude Opus. For research tasks, exploration, and summarization, it's nearly as capable. That insight changed my architecture.
The Hybrid Routing System
Every task gets classified by keywords:
Research Tasks → Gemini Flash Keywords: "research", "find", "explore", "documentation", "what is" Cost per typical task: ~$0.01
Code Review → Gemini Pro Keywords: "review", "analyze", "evaluate", "quality" Cost per typical task: ~$0.05
Implementation → Claude Sonnet Keywords: "implement", "write", "create", "fix", "edit" Cost per typical task: ~$0.15
Architecture → Claude Opus Keywords: "design", "architect", "strategic", "critical" Cost per typical task: ~$0.50
The system automatically routes based on task type. I don't have to think about which model to use.
Real Numbers
Before (all Claude Opus): ~$500/month After (hybrid routing): ~$40/month Savings: 92%
The breakdown:
- 35% of tasks are research/exploration → Gemini Flash
- 15% are code review → Gemini Pro
- 35% are implementation → Claude Sonnet
- 10% are simple queries → Claude Haiku
- 5% are strategic/architecture → Claude Opus
A Day in the Life
Here's how this actually works during a typical development day:
Morning (Research Phase)
/spawn-workers research React 19 features, explore current auth implementation, summarize competitor APIsThree Gemini Flash workers spin up in parallel. Within 2 minutes, I have:
- Structured research on React 19 with citations
- A map of my current auth code
- Competitor API analysis
Cost: ~$0.03 total. With Claude Opus alone, this would have been ~$1.50.
Midday (Implementation)
Now I implement based on the research. Claude Sonnet handles the actual code writing. I describe what I want, it writes the code, the PostToolUse hook formats it, and I review.
For complex features, I write a mission file:
# Mission: Add OAuth Integration
## Phase 1: Design
- Review research from morning
- Design OAuth flow architecture
- Define API contracts
## Phase 2: Implementation
- Implement OAuth provider abstraction
- Add Google/GitHub providers
- Write integration tests
## Phase 3: Testing
- Run full test suite
- Security review
- Performance testingThe coordinator executes each phase, spawning workers as needed.
Afternoon (Validation)
/verify-buildThe CI loop runs. If tests fail, it fixes them. If linting fails, it fixes that too. Loops until everything passes.
End of Day
/commit-push-pr Add OAuth integrationOne command. Clean PR with proper description. Done.
Results: What Actually Changed
After three months with this system:
Cost: $487/month → $38/month (92% reduction)
Speed: Research tasks that took 10 minutes now take 2 minutes with parallel workers. Overall, I estimate a 3-5x speedup for research-heavy work.
Quality: Using the right model for each task actually improves outputs. Gemini Flash is excellent at research. Claude Sonnet is excellent at implementation. Opus is excellent at architecture. Playing to strengths beats using one model for everything.
Consistency: Auto-formatting eliminates style debates. CI validation loops catch issues before they reach PR review. The workflow is predictable.
The Honest Limitations
This system isn't for everyone. Here's what's hard:
Setup Complexity: This took months to build. MCP servers need configuration. Custom skills need development. The coordinator system has failure modes. It's not beginner-friendly.
Debugging Multi-LLM Failures: When something goes wrong, was it Claude? Gemini? The routing logic? A timeout? The handoff between models? Debugging distributed systems is harder than debugging a single agent.
Different Model Strengths: Claude and Gemini aren't interchangeable. Claude is better at nuanced code generation. Gemini is faster but sometimes less thorough. Knowing when routing fails you matters.
Context Fragmentation: Each model maintains its own context. The coordinator synthesizes, but information can get lost between workers. Long missions require careful context engineering.
Overkill for Simple Projects: If you're building a todo app, you don't need this. The overhead of mission files and worker coordination isn't worth it for small projects.
How to Get Started
If this interests you, here's the progression I'd recommend:
Week 1: Claude Code + CLAUDE.md
Start with Claude Code and a well-written CLAUDE.md file. Document your project architecture, coding conventions, and common workflows. This alone is a 10x improvement over raw ChatGPT.
Week 2-3: Add MCP Servers
One at a time. Start with GitHub (if you use GitHub). Then Git for direct commits. Then Filesystem for file operations. Each integration reduces friction.
Week 4+: Build Your First Custom Skill
Pick your most repetitive workflow. For me it was the git commit→push→PR flow. Encode it as a slash command. Save 5 minutes every time you use it.
When Costs Matter: Consider Hybrid
If you're spending $100+/month on AI and doing significant research or exploration, hybrid routing pays off. The initial investment is high, but the monthly savings are real.
What's Next
I'm continuing to evolve Claude Swarm. Current experiments:
- Gemini Conductor for automated planning phases
- Jules (Gemini's async coding agent) for routine refactoring
- Better context handoff between models
- Mission templates for common project types
The future isn't one AI assistant. It's an orchestrated team of specialized models, each doing what it's best at, coordinated by systems we build.
If you've built something similar, I'd love to hear about it. And if you have questions about any part of this setup, ask away.
Appendix: Reproducible Examples
Here are concrete snippets you can adapt for your own setup.
Sample CLAUDE.md Configuration
This is a simplified version of my model tier configuration:
## Model Tiers
### Claude Models
| Tier | Model | Input | Output | Use For |
|------|-------|-------|--------|---------|
| **Haiku** | claude-haiku-3-5 | $0.25/M | $1.25/M | Read, search, summarize |
| **Sonnet** | claude-sonnet-4 | $3.00/M | $15.00/M | Write code, edit, debug |
| **Opus** | claude-opus-4 | $15.00/M | $75.00/M | Architecture, coordination |
### Gemini Models (Hybrid Mode)
| Tier | Model | Input | Output | Use For |
|------|-------|-------|--------|---------|
| **Flash** | gemini-2.5-flash | $0.10/M | $0.40/M | Research, exploration |
| **Pro** | gemini-2.5-pro | $1.25/M | $5.00/M | Deep analysis, synthesis |
## Cost Optimization Strategy
- Use Gemini Flash for research/explore (~35% of tasks)
- Use Gemini Pro for review/audit (~15% of tasks)
- Use Claude Haiku for simple tasks (~10%)
- Use Claude Sonnet for implementation (~35%)
- Reserve Claude Opus for coordination (~5%)
- **Target: 92% cost savings** vs Claude-only workflowsSample Mission File
Here's a real mission file structure:
# Mission: Fix Icon Display Bug
## Objective
Fix the issue where icon strings like "solar:calendar-bold" display
as text instead of rendering as actual icons.
## Background
The settings page uses Solar icons, but the SkillSettingsSection
component displays the icon prop as raw text.
## Phase 1: Investigate & Fix
**Objective**: Find and fix the component
### Tasks
1. Locate SkillSettingsSection component
2. Check how the `icon` prop is being used
3. Update to use the Icon component for rendering
4. Verify the fix works in the UI
### Deliverable
- Fixed component that properly renders icons
## Success Criteria
- [ ] Icons render visually instead of showing text
- [ ] All skill sections show proper icons
- [ ] No TypeScript errors
## Constraints
- Use existing Icon component
- Minimal changes - only fix the rendering issueRun it with: ./swarm/bin/swarm run swarm/missions/FIX-ICONS.md
How /commit-push-pr Works
The custom skill automates this 6-step workflow:
# Step 1: Check state
git status --porcelain
git branch --show-current
# Step 2: Stage changes
git add -A
git diff --cached --stat
# Step 3: Create commit with conventional message
git commit -m "feat: Add user authentication
Co-Authored-By: Claude <noreply@anthropic.com>"
# Step 4: Push to remote
git push -u origin feature/auth
# Step 5: Create PR
gh pr create --title "Add user authentication" --body "
## Summary
- Implement login form
- Add password validation
- Create session management
"
# Step 6: Return PR URLSafety guardrails built in:
- Never force-push to main/master
- Never skip pre-commit hooks
- Stop if merge conflicts detected
This transforms a 5-step manual process into one command.
Want to see the actual code? The Claude Swarm system is at github.com/joeyhipolito/orchestrator-swarm. The CLAUDE.md file alone is worth reading if you're setting up Claude Code for the first time.
Related Reading
- Building EA: Architecture Decisions for a Production AI Assistant — The production AI system I built using these workflows
- LifeOS: Building an AI-Powered Personal Operating System — Applying the same AI skill architecture to personal knowledge management