🚀 Agentic Dev System: Implementation Guide

Part 4: Build it on your Mac Mini with Claude Code — step by step
Coding Agent Ready — feed this to Claude Code

📋 Prerequisites

RequirementSpecNotes
HardwareMac Mini (Apple Silicon)M1/M2/M3 — 16GB+ RAM recommended
OSmacOS 14+Sonoma or later
Claude CodeLatest versionnpm install -g @anthropic-ai/claude-code
Node.js20+For Claude Code runtime
Python3.11+For orchestrator and tools
Git2.40+Version control
DockerDesktop 4.xOptional — for integration tests

🔨 Phase 1: Foundation (Day 1)

Set Up Project Structure + Memory System ~2 hours

Create the directory structure and initialize the memory system that all agents will use.

Step 1.1: Create Project Scaffold

#!/bin/bash # Run this to create the project structure mkdir -p ~/agentic-dev/{agents,tools,pipelines,memory/skills,config,dashboard} cd ~/agentic-dev # Initialize git git init echo "__pycache__/" >> .gitignore echo ".env" >> .gitignore echo "node_modules/" >> .gitignore # Create core config files touch orchestrator.yaml agents.yaml touch memory/MEMORY.md memory/DECISIONS.md touch config/models.yaml config/quality.yaml

Step 1.2: Initialize Memory System

# memory/MEMORY.md — This is the brain file # Every agent reads this before starting work # Project Memory ## Project: [YOUR PROJECT NAME] ## Stack: [e.g., Python/FastAPI/PostgreSQL/React] ## Conventions: - Use type hints everywhere - pytest for testing, always with fixtures - 100 char line limit - Docstrings: Google style ## Known Gotchas: - [Document issues you've hit before] ## Active Tasks: - [Current work in progress]

Step 1.3: Create Agent Configuration

# agents.yaml agents: orchestrator: model: claude-sonnet-4-20250514 role: "Task decomposition and coordination" system_prompt: "You are the orchestrator. Break tasks into subtasks." tools: [file, terminal, session_search, delegate_task] researcher: model: claude-sonnet-4-20250514 role: "Codebase analysis and context building" tools: [file, terminal, search_files] coder: model: claude-sonnet-4-20250514 role: "Write, edit, and refactor code" tools: [file, terminal, patch] max_parallel: 3 tester: model: claude-haiku-4-20250514 # cheaper model OK role: "Run tests, generate test cases" tools: [terminal, file] reviewer: model: claude-sonnet-4-20250514 role: "Code review and security analysis" tools: [file, terminal]
Start simple: You don't need all 6 agents on Day 1. Start with Orchestrator + Coder + Tester. Add Reviewer and Deployer once the basics work.

🧠 Phase 2: Context Management (Day 2)

Build the RAG-like Context Injection System ~3 hours

This is THE key differentiator. Instead of dumping entire files into context, intelligently select what each agent needs.

Step 2.1: Codebase Indexer

# tools/context_manager.py import os import json from pathlib import Path class ContextManager: """Smart context injection for coding agents.""" def __init__(self, project_root: str): self.root = Path(project_root) self.index = {} self._build_index() def _build_index(self): """Index all relevant files with metadata.""" for f in self.root.rglob("*"): if self._is_relevant(f): self.index[str(f)] = { "type": f.suffix, "size": f.stat().st_size, "modified": f.stat().st_mtime, "imports": self._extract_imports(f), } def get_context(self, task: str, max_tokens: int = 50000) -> str: """Get relevant files for a task, ranked by relevance.""" relevant = self._rank_files(task) context = [] tokens_used = 0 for filepath, score in relevant: content = Path(filepath).read_text() file_tokens = len(content) // 4 # rough estimate if tokens_used + file_tokens > max_tokens: break context.append(f"### {filepath} ``` {content} ```") tokens_used += file_tokens return " ".join(context) def _rank_files(self, task: str) -> list: """Rank files by relevance to task. Simple keyword matching.""" scores = [] task_words = set(task.lower().split()) for path, meta in self.index.items(): # Score based on: filename match, imports, recency score = 0 name_words = set(Path(path).stem.lower().split("_")) score += len(task_words & name_words) * 10 score += len(set(meta["imports"]) & task_words) * 5 scores.append((path, score)) return sorted(scores, key=lambda x: x[1], reverse=True)

Step 2.2: Skill Loader

# tools/skill_loader.py class SkillLoader: """Load relevant skills based on task type.""" SKILL_DIR = Path.home() / ".hermes" / "skills" def get_skills_for_task(self, task: str) -> list[str]: """Find and load skills relevant to the task.""" skills = [] for skill_file in self.SKILL_DIR.rglob("SKILL.md"): content = skill_file.read_text() # Match task keywords against skill tags/description if self._matches(task, content): skills.append(content) return skills

🔗 Phase 3: Orchestrator (Day 3-4)

Build the Task Decomposition + Agent Spawning System ~6 hours

The orchestrator is the brain. It takes a high-level request and turns it into parallel agent workstreams.

Step 3.1: Task Decomposition

# agents/orchestrator.py from dataclasses import dataclass from enum import Enum class TaskType(Enum): FEATURE = "feature" BUGFIX = "bugfix" REFACTOR = "refactor" RESEARCH = "research" DEPLOY = "deploy" @dataclass class SubTask: description: str agent: str # which agent handles this depends_on: list # task IDs this waits for context_files: list # files to inject into context priority: int # execution order hint class Orchestrator: """Decomposes requests into subtasks and manages execution.""" def plan(self, request: str) -> list[SubTask]: """ Use Claude to decompose request into subtasks. Example input: "Add user authentication with JWT" Example output: [ SubTask("Design auth schema", "researcher", [], [...], 1), SubTask("Create User model", "coder", [0], [...], 2), SubTask("Implement JWT service", "coder", [0], [...], 2), SubTask("Write auth tests", "tester", [1,2], [...], 3), SubTask("Review auth code", "reviewer", [2], [...], 4), ] """ # Load memory + context memory = self._load_memory() context = self.context_mgr.get_context(request) # Call Claude with full context to generate plan plan_prompt = f""" Task: {request} Project Context: {context} Memory: {memory} Break this into subtasks. Each subtask needs: - description: what to do - agent: researcher/coder/tester/reviewer/deployer - depends_on: which subtasks must complete first - context_files: which files the agent needs """ return self._call_llm(plan_prompt) def execute(self, plan: list[SubTask]): """Execute plan, running independent tasks in parallel.""" # Topological sort for dependency resolution # Spawn agents via Claude Code ACP for each subtask # Collect results and feed to dependent tasks pass

Step 3.2: Agent Spawning (via Claude Code)

# agents/spawner.py # Uses Claude Code's delegate_task API to spawn sub-agents async def spawn_agent(agent_type: str, task: str, context: str): """Spawn a Claude Code sub-agent for a specific task.""" # Map agent types to system prompts prompts = { "coder": "You are a coding agent. Write clean, tested code.", "tester": "You are a testing agent. Find and fix bugs.", "reviewer": "You are a code reviewer. Be thorough and critical.", } # Spawn via Claude Code CLI result = await subprocess.run([ "claude", "--print", "--system", prompts[agent_type], "--model", "claude-sonnet-4-20250514", task + " Context: " + context ], capture_output=True) return result.stdout

✅ Phase 4: Quality Gates (Day 5)

Automated Verification Pipeline ~3 hours

Every piece of code goes through these gates. No exceptions.

Step 4.1: Quality Gate Pipeline

# tools/quality_gates.py class QualityPipeline: """Run code through quality gates before accepting it.""" def run_all(self, changed_files: list[str]) -> dict: results = {} results["syntax"] = self.check_syntax(changed_files) results["types"] = self.check_types(changed_files) results["lint"] = self.run_linter(changed_files) results["tests"] = self.run_tests(changed_files) results["security"] = self.security_scan(changed_files) return results def check_syntax(self, files): """Quick syntax check — catches obvious errors.""" # python: python -m py_compile # js/ts: npx tsc --noEmit pass def run_tests(self, files): """Run tests related to changed files.""" # pytest with coverage report # Fail if coverage drops below threshold pass def security_scan(self, files): """Static security analysis.""" # bandit for Python # npm audit for JS # semgrep for cross-language pass

Quality Gate Checklist

📊 Phase 5: Monitoring Dashboard (Day 6)

Real-time Task Monitoring ~2 hours

See what your agents are doing. Track tasks, costs, and quality metrics.

What to Monitor

📈 Task Metrics

  • Active/completed/failed tasks
  • Average completion time
  • Agent utilization
  • Parallelism ratio

💰 Cost Metrics

  • Tokens used per task
  • Cost per agent type
  • Model routing distribution
  • Daily/weekly spend trends

✅ Quality Metrics

  • Gate pass/fail rates
  • Bugs found post-deploy
  • Test coverage trends
  • Review rejection rate

🔧 System Health

  • Agent spawn success rate
  • Context window utilization
  • Memory hit rate
  • API latency

📅 Implementation Timeline

PhaseDurationDeliverableCan Use After
1. FoundationDay 1 (2h)Project structure + memoryImmediate
2. Context MgmtDay 2 (3h)Smart file injectionDay 2
3. OrchestratorDay 3-4 (6h)Multi-agent coordinationDay 4
4. Quality GatesDay 5 (3h)Automated verificationDay 5
5. DashboardDay 6 (2h)Monitoring UIDay 6
6. PolishOngoingBug fixes + new skills—

Total: ~16 hours of setup. Then it runs forever.

🎯 Quick Start: What to Do RIGHT NOW

Don't want to build the whole system? Start with just this:

# 1. Create memory file for your project cd /path/to/your/project cat > MEMORY.md << 'EOF' # Project Memory ## Stack: [your stack] ## Conventions: [your rules] ## Gotchas: [things to remember] EOF # 2. Start Claude Code with memory injection claude --system "Read MEMORY.md first. Follow all conventions." # 3. That's it. You now have persistent memory. # Add skills/ directory as you discover reusable patterns.
The single most impactful thing: Create a MEMORY.md file in your project root. Write your conventions, stack choices, and gotchas in it. Tell Claude Code to read it first. This alone solves 50% of the "context amnesia" problem.