πŸ—οΈ Agentic Dev System: The Architecture

Part 3: A multi-agent system that actually works on a Mac Mini
Debate: Hermes orchestrates, @Dubtsbot can be an agent if it ever replies

πŸ›οΈ System Overview: The 5-Layer Cake

Forget single-agent prompting. This is a full system architecture with separation of concerns. Each layer handles one job well.

πŸ‘€ Layer 1: User Interface
Where you interact β€” Telegram, CLI, or Web Dashboard. You describe what you want in natural language.
🧠 Layer 2: Orchestrator
The brain. Decomposes tasks, assigns agents, manages state, handles failures. This is where the magic happens.
πŸ€– Layer 3: Specialized Agents
Focused workers β€” Coder, Reviewer, Tester, Researcher. Each is an instance of Claude Code with a specific role.
πŸ’Ύ Layer 4: Memory and Context
Persistent knowledge β€” project memory, skills, session history, codebase index. Feeds context to agents.
πŸ”§ Layer 5: Tools and Infrastructure
Terminal, Git, Docker, CI/CD, Cloudflare deployment. The execution environment.

πŸ€– The Agent Roster

Six specialized roles. Each spawned as a sub-agent via Claude Code's ACP protocol.

🎯
Orchestrator Agent
Receives user requests, decomposes into tasks, assigns to worker agents, monitors progress, integrates results. The PM of the operation.
πŸ”
Researcher Agent
Reads codebases, searches documentation, indexes files, builds context. Provides the knowledge that other agents need.
πŸ’»
Coder Agent(s)
Writes code. Can spawn multiple instances for parallel work. Each gets a focused task with full context from Researcher.
πŸ§ͺ
Tester Agent
Runs tests, generates test cases, validates output, reports failures. The quality gate.
πŸ‘€
Reviewer Agent
Code review β€” checks for bugs, security issues, style violations, architectural consistency. The second pair of eyes.
πŸš€
Deployer Agent
Handles builds, Docker, deployment, monitoring. Takes tested code and ships it.

πŸ”„ The Agentic Workflow

Here's how a task flows through the system:

User Request (Telegram/CLI) β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ORCHESTRATOR β”‚ β”‚ 1. Parse request β”‚ β”‚ 2. Load project memory + skills β”‚ β”‚ 3. Decompose into subtasks β”‚ β”‚ 4. Create execution plan β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚RESEARCHβ”‚ β”‚ CODER β”‚ β”‚ CODER β”‚ ← Parallel execution β”‚ Agent β”‚ β”‚ Agent β”‚ β”‚ Agent β”‚ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ TESTER β”‚ ← Automated verification β”‚ Agent β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β” β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PASS βœ… β”‚ β”‚ FAIL ❌ │──→ Back to Coder β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ REVIEWER β”‚ ← Human-like review β”‚ Agent β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DEPLOYER β”‚ ← Ship it β”‚ Agent β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’Ύ Memory Architecture

The memory system is what separates this from "just using Claude Code." Three tiers:

🧠 Tier 1: Project Memory

  • Architecture decisions
  • Tech stack choices
  • Coding conventions
  • Known gotchas

Stored: MEMORY.md in project root

πŸ“š Tier 2: Skills Library

  • Reusable patterns
  • API documentation
  • Tool-specific guides
  • Proven workflows

Stored: ~/.hermes/skills/

πŸ• Tier 3: Session History

  • Recent decisions
  • Code changes made
  • Test results
  • User feedback

Stored: session transcripts

Key design decision: Memory is FILE-BASED, not database-based. This means any Claude Code session can read it, you can edit it manually, and Git tracks changes. Simple beats complex on a Mac Mini.

πŸ“ Project Structure

The complete directory layout for the agentic system:

agentic-dev/ β”œβ”€β”€ orchestrator.yaml # Agent config + task routing rules β”œβ”€β”€ agents.yaml # Agent definitions and capabilities β”œβ”€β”€ memory/ β”‚ β”œβ”€β”€ MEMORY.md # Project-level persistent memory β”‚ β”œβ”€β”€ DECISIONS.md # Architecture Decision Records β”‚ └── skills/ # Reusable workflow patterns β”œβ”€β”€ agents/ β”‚ β”œβ”€β”€ orchestrator.py # Task decomposition + routing β”‚ β”œβ”€β”€ researcher.py # Codebase analysis + indexing β”‚ β”œβ”€β”€ coder.py # Code generation + editing β”‚ β”œβ”€β”€ tester.py # Test execution + generation β”‚ β”œβ”€β”€ reviewer.py # Code review pipeline β”‚ └── deployer.py # Build + deploy automation β”œβ”€β”€ tools/ β”‚ β”œβ”€β”€ context_manager.py # RAG-based context injection β”‚ β”œβ”€β”€ git_ops.py # Git workflow automation β”‚ β”œβ”€β”€ test_runner.py # Multi-framework test runner β”‚ └── notifier.py # Telegram/desk notifications β”œβ”€β”€ pipelines/ β”‚ β”œβ”€β”€ feature.py # Full feature development pipeline β”‚ β”œβ”€β”€ bugfix.py # Bug investigation + fix pipeline β”‚ └── refactor.py # Safe refactoring pipeline β”œβ”€β”€ dashboard/ β”‚ └── index.html # Real-time task monitoring └── config/ β”œβ”€β”€ models.yaml # Model routing (cheap vs powerful) └── quality.yaml # Quality gate thresholds

⚑ Model Routing Strategy

Not every task needs the most expensive model. Route intelligently:

🟒 Fast/Cheap (Haiku/GPT-4o-mini)

  • Code formatting
  • Simple file edits
  • Test execution
  • Documentation updates
  • Search and indexing

πŸ”΄ Powerful (Sonnet/Opus)

  • Architecture decisions
  • Complex debugging
  • Multi-file refactoring
  • Code review
  • Security analysis
Cost optimization: Route 70% of tasks to cheap models, 30% to powerful ones. Estimated savings: 60% vs using Sonnet for everything. Your Mac Mini handles the routing β€” no cloud dependency.

πŸ” Quality Gates

Every piece of code passes through these gates before reaching you:

Code Generated by Coder Agent β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Gate 1: Syntax + Types β”‚ β†’ ruff, mypy, typescript β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Gate 2: Unit Tests β”‚ β†’ pytest, jest, go test β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Gate 3: Integration β”‚ β†’ docker-compose test env β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Gate 4: Security Scan β”‚ β†’ bandit, semgrep β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Gate 5: Code Review β”‚ β†’ Reviewer Agent β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Gate 6: Human Approval β”‚ β†’ You (optional for minor) β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό MERGED βœ