Agentic Dev System - Part 4: Implementation Guide

Requirement	Spec	Notes
Hardware	Mac Mini (Apple Silicon)	M1/M2/M3 — 16GB+ RAM recommended
OS	macOS 14+	Sonoma or later
Claude Code	Latest version	`npm install -g @anthropic-ai/claude-code`
Node.js	20+	For Claude Code runtime
Python	3.11+	For orchestrator and tools
Git	2.40+	Version control
Docker	Desktop 4.x	Optional — for integration tests

🔨 Phase 1: Foundation (Day 1)

Set Up Project Structure + Memory System ~2 hours

Create the directory structure and initialize the memory system that all agents will use.

Step 1.1: Create Project Scaffold

#!/bin/bash
# Run this to create the project structure

mkdir -p ~/agentic-dev/{agents,tools,pipelines,memory/skills,config,dashboard}
cd ~/agentic-dev

# Initialize git
git init
echo "__pycache__/" >> .gitignore
echo ".env" >> .gitignore
echo "node_modules/" >> .gitignore

# Create core config files
touch orchestrator.yaml agents.yaml
touch memory/MEMORY.md memory/DECISIONS.md
touch config/models.yaml config/quality.yaml

Step 1.2: Initialize Memory System

# memory/MEMORY.md — This is the brain file
# Every agent reads this before starting work

# Project Memory
## Project: [YOUR PROJECT NAME]
## Stack: [e.g., Python/FastAPI/PostgreSQL/React]
## Conventions:
- Use type hints everywhere
- pytest for testing, always with fixtures
- 100 char line limit
- Docstrings: Google style
## Known Gotchas:
- [Document issues you've hit before]
## Active Tasks:
- [Current work in progress]

Step 1.3: Create Agent Configuration

# agents.yaml

agents:
  orchestrator:
    model: claude-sonnet-4-20250514
    role: "Task decomposition and coordination"
    system_prompt: "You are the orchestrator. Break tasks into subtasks."
    tools: [file, terminal, session_search, delegate_task]

  researcher:
    model: claude-sonnet-4-20250514
    role: "Codebase analysis and context building"
    tools: [file, terminal, search_files]

  coder:
    model: claude-sonnet-4-20250514
    role: "Write, edit, and refactor code"
    tools: [file, terminal, patch]
    max_parallel: 3

  tester:
    model: claude-haiku-4-20250514  # cheaper model OK
    role: "Run tests, generate test cases"
    tools: [terminal, file]

  reviewer:
    model: claude-sonnet-4-20250514
    role: "Code review and security analysis"
    tools: [file, terminal]

Start simple: You don't need all 6 agents on Day 1. Start with Orchestrator + Coder + Tester. Add Reviewer and Deployer once the basics work.

🧠 Phase 2: Context Management (Day 2)

Build the RAG-like Context Injection System ~3 hours

This is THE key differentiator. Instead of dumping entire files into context, intelligently select what each agent needs.

Step 2.1: Codebase Indexer

# tools/context_manager.py

import os
import json
from pathlib import Path

class ContextManager:
    """Smart context injection for coding agents."""

    def __init__(self, project_root: str):
        self.root = Path(project_root)
        self.index = {}
        self._build_index()

    def _build_index(self):
        """Index all relevant files with metadata."""
        for f in self.root.rglob("*"):
            if self._is_relevant(f):
                self.index[str(f)] = {
                    "type": f.suffix,
                    "size": f.stat().st_size,
                    "modified": f.stat().st_mtime,
                    "imports": self._extract_imports(f),
                }

    def get_context(self, task: str, max_tokens: int = 50000) -> str:
        """Get relevant files for a task, ranked by relevance."""
        relevant = self._rank_files(task)
        context = []
        tokens_used = 0

        for filepath, score in relevant:
            content = Path(filepath).read_text()
            file_tokens = len(content) // 4  # rough estimate
            if tokens_used + file_tokens > max_tokens:
                break
            context.append(f"### {filepath}
```
{content}
```")
            tokens_used += file_tokens

        return "

".join(context)

    def _rank_files(self, task: str) -> list:
        """Rank files by relevance to task. Simple keyword matching."""
        scores = []
        task_words = set(task.lower().split())
        for path, meta in self.index.items():
            # Score based on: filename match, imports, recency
            score = 0
            name_words = set(Path(path).stem.lower().split("_"))
            score += len(task_words & name_words) * 10
            score += len(set(meta["imports"]) & task_words) * 5
            scores.append((path, score))
        return sorted(scores, key=lambda x: x[1], reverse=True)

Step 2.2: Skill Loader

# tools/skill_loader.py

class SkillLoader:
    """Load relevant skills based on task type."""

    SKILL_DIR = Path.home() / ".hermes" / "skills"

    def get_skills_for_task(self, task: str) -> list[str]:
        """Find and load skills relevant to the task."""
        skills = []
        for skill_file in self.SKILL_DIR.rglob("SKILL.md"):
            content = skill_file.read_text()
            # Match task keywords against skill tags/description
            if self._matches(task, content):
                skills.append(content)
        return skills

🔗 Phase 3: Orchestrator (Day 3-4)

Build the Task Decomposition + Agent Spawning System ~6 hours

The orchestrator is the brain. It takes a high-level request and turns it into parallel agent workstreams.

Step 3.1: Task Decomposition

# agents/orchestrator.py

from dataclasses import dataclass
from enum import Enum

class TaskType(Enum):
    FEATURE = "feature"
    BUGFIX = "bugfix"
    REFACTOR = "refactor"
    RESEARCH = "research"
    DEPLOY = "deploy"

@dataclass
class SubTask:
    description: str
    agent: str          # which agent handles this
    depends_on: list    # task IDs this waits for
    context_files: list # files to inject into context
    priority: int       # execution order hint

class Orchestrator:
    """Decomposes requests into subtasks and manages execution."""

    def plan(self, request: str) -> list[SubTask]:
        """
        Use Claude to decompose request into subtasks.

        Example input:  "Add user authentication with JWT"
        Example output: [
            SubTask("Design auth schema", "researcher", [], [...], 1),
            SubTask("Create User model", "coder", [0], [...], 2),
            SubTask("Implement JWT service", "coder", [0], [...], 2),
            SubTask("Write auth tests", "tester", [1,2], [...], 3),
            SubTask("Review auth code", "reviewer", [2], [...], 4),
        ]
        """
        # Load memory + context
        memory = self._load_memory()
        context = self.context_mgr.get_context(request)

        # Call Claude with full context to generate plan
        plan_prompt = f"""
        Task: {request}
        Project Context: {context}
        Memory: {memory}

        Break this into subtasks. Each subtask needs:
        - description: what to do
        - agent: researcher/coder/tester/reviewer/deployer
        - depends_on: which subtasks must complete first
        - context_files: which files the agent needs
        """

        return self._call_llm(plan_prompt)

    def execute(self, plan: list[SubTask]):
        """Execute plan, running independent tasks in parallel."""
        # Topological sort for dependency resolution
        # Spawn agents via Claude Code ACP for each subtask
        # Collect results and feed to dependent tasks
        pass

Step 3.2: Agent Spawning (via Claude Code)

# agents/spawner.py
# Uses Claude Code's delegate_task API to spawn sub-agents

async def spawn_agent(agent_type: str, task: str, context: str):
    """Spawn a Claude Code sub-agent for a specific task."""

    # Map agent types to system prompts
    prompts = {
        "coder": "You are a coding agent. Write clean, tested code.",
        "tester": "You are a testing agent. Find and fix bugs.",
        "reviewer": "You are a code reviewer. Be thorough and critical.",
    }

    # Spawn via Claude Code CLI
    result = await subprocess.run([
        "claude", "--print",
        "--system", prompts[agent_type],
        "--model", "claude-sonnet-4-20250514",
        task + "

Context:
" + context
    ], capture_output=True)

    return result.stdout

✅ Phase 4: Quality Gates (Day 5)

Automated Verification Pipeline ~3 hours

Every piece of code goes through these gates. No exceptions.

Step 4.1: Quality Gate Pipeline

# tools/quality_gates.py

class QualityPipeline:
    """Run code through quality gates before accepting it."""

    def run_all(self, changed_files: list[str]) -> dict:
        results = {}
        results["syntax"] = self.check_syntax(changed_files)
        results["types"] = self.check_types(changed_files)
        results["lint"] = self.run_linter(changed_files)
        results["tests"] = self.run_tests(changed_files)
        results["security"] = self.security_scan(changed_files)
        return results

    def check_syntax(self, files):
        """Quick syntax check — catches obvious errors."""
        # python: python -m py_compile
        # js/ts: npx tsc --noEmit
        pass

    def run_tests(self, files):
        """Run tests related to changed files."""
        # pytest with coverage report
        # Fail if coverage drops below threshold
        pass

    def security_scan(self, files):
        """Static security analysis."""
        # bandit for Python
        # npm audit for JS
        # semgrep for cross-language
        pass

Quality Gate Checklist

Syntax validation (py_compile / tsc) Type checking (mypy / tsc strict) Linting (ruff / eslint) Unit tests pass (pytest / jest) Coverage above threshold (80%+) Security scan clean (bandit / semgrep) No secrets in code (git-secrets) Reviewer agent approval

📊 Phase 5: Monitoring Dashboard (Day 6)

Real-time Task Monitoring ~2 hours

See what your agents are doing. Track tasks, costs, and quality metrics.

What to Monitor

📈 Task Metrics

Active/completed/failed tasks
Average completion time
Agent utilization
Parallelism ratio

💰 Cost Metrics

Tokens used per task
Cost per agent type
Model routing distribution
Daily/weekly spend trends

✅ Quality Metrics

Gate pass/fail rates
Bugs found post-deploy
Test coverage trends
Review rejection rate

🔧 System Health

Agent spawn success rate
Context window utilization
Memory hit rate
API latency

📅 Implementation Timeline

Phase	Duration	Deliverable	Can Use After
1. Foundation	Day 1 (2h)	Project structure + memory	Immediate
2. Context Mgmt	Day 2 (3h)	Smart file injection	Day 2
3. Orchestrator	Day 3-4 (6h)	Multi-agent coordination	Day 4
4. Quality Gates	Day 5 (3h)	Automated verification	Day 5
5. Dashboard	Day 6 (2h)	Monitoring UI	Day 6
6. Polish	Ongoing	Bug fixes + new skills	—

Total: ~16 hours of setup. Then it runs forever.

🎯 Quick Start: What to Do RIGHT NOW

Don't want to build the whole system? Start with just this:

# 1. Create memory file for your project
cd /path/to/your/project
cat > MEMORY.md << 'EOF'
# Project Memory
## Stack: [your stack]
## Conventions: [your rules]
## Gotchas: [things to remember]
EOF

# 2. Start Claude Code with memory injection
claude --system "Read MEMORY.md first. Follow all conventions."

# 3. That's it. You now have persistent memory.
#    Add skills/ directory as you discover reusable patterns.

The single most impactful thing: Create a MEMORY.md file in your project root. Write your conventions, stack choices, and gotchas in it. Tell Claude Code to read it first. This alone solves 50% of the "context amnesia" problem.

🚀 Agentic Dev System: Implementation Guide

📋 Prerequisites

🔨 Phase 1: Foundation (Day 1)

Step 1.1: Create Project Scaffold

Step 1.2: Initialize Memory System

Step 1.3: Create Agent Configuration

🧠 Phase 2: Context Management (Day 2)

Step 2.1: Codebase Indexer

Step 2.2: Skill Loader

🔗 Phase 3: Orchestrator (Day 3-4)

Step 3.1: Task Decomposition

Step 3.2: Agent Spawning (via Claude Code)

✅ Phase 4: Quality Gates (Day 5)

Step 4.1: Quality Gate Pipeline

Quality Gate Checklist

📊 Phase 5: Monitoring Dashboard (Day 6)

What to Monitor

📈 Task Metrics

💰 Cost Metrics

✅ Quality Metrics

🔧 System Health

📅 Implementation Timeline

🎯 Quick Start: What to Do RIGHT NOW