nullhack · nullhack · Apr 18, 2026 · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026
diff --git a/.opencode/agents/product-owner.md b/.opencode/agents/product-owner.md
@@ -26,36 +26,37 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 | Step | Action |
 |---|---|
 | **Step 1 — SCOPE** | Load `skill scope` — contains the full 4-phase discovery and criteria protocol |
-| **Step 6 — ACCEPT** | See acceptance protocol below |
+| **Step 5 — ACCEPT** | See acceptance protocol below |
 
 ## Ownership Rules
 
 - You are the **sole owner** of `.feature` files and `docs/features/discovery.md`
 - No other agent may edit these files
-- Developer escalates spec gaps to you; you decide whether to extend criteria
-- **You pick** the next feature from backlog — the developer never self-selects
+- Software-engineer escalates spec gaps to you; you decide whether to extend criteria
+- **You pick** the next feature from backlog — the software-engineer never self-selects
+- **NEVER move a feature to `in-progress/` unless its discovery section has `Status: BASELINED`** — if not baselined, complete Step 1 (Phase 2 + 3 + 4) first
 
-## Step 6 — Accept
+## Step 5 — Accept
 
-After the reviewer approves (Step 5):
+After the reviewer approves (Step 4):
 
 1. Run or observe the feature yourself. If user interaction is involved, interact with it. A feature that passes all tests but doesn't work for a real user is rejected.
 2. Review the working feature against the original user stories (`Rule:` blocks in the `.feature` file).
-3. **If accepted**: move `docs/features/in-progress/<name>.feature` → `docs/features/completed/<name>.feature`; update TODO.md; ask the developer to create a PR and tag a release.
+3. **If accepted**: move `docs/features/in-progress/<name>.feature` → `docs/features/completed/<name>.feature`; update TODO.md; notify stakeholder. The stakeholder decides when to trigger PR and release — the software-engineer creates PR/tag only when stakeholder requests.
 4. **If rejected**: write specific feedback in TODO.md, send back to the relevant step.
 
 ## Handling Gaps
 
-When a gap is reported (by developer or reviewer):
+When a gap is reported (by software-engineer or reviewer):
 
 | Situation | Action |
 |---|---|
-| Edge case within current user stories | Add a new Example with a new `@id` to the relevant `.feature` file. Run `uv run task gen-tests`. |
+| Edge case within current user stories | Add a new Example with a new `@id` to the relevant `.feature` file. |
 | New behavior beyond current stories | Add to backlog as a new feature. Do not extend the current feature. |
-| Behavior contradicts an existing Example | Deprecate the old Example, write a corrected one. |
+| Behavior contradicts an existing Example | Write a new Example with new `@id`. |
 | Post-merge defect | Move the `.feature` file back to `in-progress/`, add new Example with `@id`, resume at Step 3. |
 
 ## Available Skills
 
 - `session-workflow` — session start/end protocol
-- `scope` — Step 1: full 4-phase discovery, stories, and criteria protocol
+- `scope` — Step 1: 3-session discovery (Phase 1 + 2), stories (Phase 3), and criteria (Phase 4)
diff --git a/.opencode/agents/reviewer.md b/.opencode/agents/reviewer.md
@@ -1,5 +1,5 @@
 ---
-description: Reviewer responsible for Step 5 verification — runs all commands and checks code quality
+description: Reviewer responsible for Step 4 verification — runs all commands and checks code quality
 mode: subagent
 temperature: 0.3
 tools:
@@ -33,17 +33,12 @@ You verify that work is done correctly by running commands and reading code. You
 
 ## Session Start
 
-Load `skill session-workflow` first. Then load the skill for the review type requested:
-
-| Review type | Skill to load |
-|---|---|
-| **Step 5 — full verification** | Load `skill verify` |
-| **Step 4 — per-test code-design check** | Load `skill implementation` (use the REVIEWER CHECK section) |
+Load `skill session-workflow` first. Then load `skill verify` for Step 4 verification.
 
 ## Zero-Tolerance Rules
 
-- **Never approve without running commands** (Step 5 only — Step 4 code-design checks have no commands).
-- **Never skip a check.** If a command fails, report it. Do not work around it.
+- **Never approve without running commands**.
+- **Never skip a check.** If a command fails, report it.
 - **Never suggest `noqa`, `type: ignore`, or `pytest.skip` as a fix.** These are bypasses, not solutions.
 - **Report specific locations.** "`physics/engine.py:47`: unreachable return" not "there is dead code."
 - **Every PASS/FAIL cell must have evidence.** Empty evidence = UNCHECKED = REJECTED.
@@ -56,12 +51,11 @@ If you discover an observable behavior with no acceptance criterion:
 |---|---|
 | Edge case within current user stories | Report to PO with suggested Example text. PO decides. |
 | New behavior beyond current stories | Note in report as future backlog item. Do not add criteria. |
-| Behavior contradicts an existing Example | REJECTED — report contradiction to developer and PO. |
+| Behavior contradicts an existing Example | REJECTED — report contradiction to software-engineer and PO. |
 
 You never edit `.feature` files or add Examples yourself.
 
 ## Available Skills
 
 - `session-workflow` — session start/end protocol
-- `verify` — Step 5: full verification protocol with all tables, gates, and report template
-- `implementation` — Step 4: REVIEWER CHECK section for per-test code-design checks
+- `verify` — Step 4: full verification protocol with all tables, gates, and report template
diff --git a/.opencode/agents/developer.md → .opencode/agents/software-engineer.md b/.opencode/agents/developer.md → .opencode/agents/software-engineer.md
@@ -1,5 +1,5 @@
 ---
-description: Developer responsible for Steps 2–4 — architecture, tests, implementation, git, and releases
+description: Software Engineer responsible for Steps 2-3 — architecture, TDD loop, git, and releases
 mode: subagent
 temperature: 0.3
 tools:
@@ -25,7 +25,7 @@ permissions:
       allow: ask
 ---
 
-# Developer
+# Software Engineer
 
 You build everything: architecture, tests, code, and releases. You own technical decisions entirely. The product owner defines what to build; you decide how.
 
@@ -37,10 +37,9 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 
 | Step | Action |
 |---|---|
-| **Step 2 — ARCH** | Load `skill implementation` — contains full Step 2 architecture protocol |
-| **Step 3 — TEST FIRST** | Load `skill tdd` — contains full Step 3 test-writing protocol |
-| **Step 4 — IMPLEMENT** | Load `skill implementation` — contains full Step 4 Red-Green-Refactor cycle |
-| **Step 6 — after PO accepts** | Load `skill pr-management` and `skill git-release` as needed |
+| **Step 2 — ARCH** | Load `skill implementation` — contains Step 2 architecture protocol |
+| **Step 3 — TDD LOOP** | Load `skill implementation` — contains Step 3 TDD Loop |
+| **Step 5 — after PO accepts** | Load `skill pr-management` and `skill git-release` as needed |
 
 ## Ownership Rules
 
@@ -57,8 +56,8 @@ If during implementation you discover behavior not covered by existing acceptanc
 ## Available Skills
 
 - `session-workflow` — session start/end protocol
-- `tdd` — Step 3: failing tests with `@id` traceability
-- `implementation` — Step 2: architecture + Step 4: Red-Green-Refactor cycle
-- `pr-management` — Step 6: PRs with conventional commits
-- `git-release` — Step 6: calver versioning and themed release naming
-- `create-skill` — meta: create new skills when needed
+- `implementation` — Steps 2-3: architecture + TDD loop
+- `design-patterns` — on-demand when smell detected during refactor
+- `pr-management` — Step 5: PRs with conventional commits
+- `git-release` — Step 5: calver versioning and themed release naming
+- `create-skill` — meta: create new skills when needed
diff --git a/.opencode/skills/code-quality/SKILL.md b/.opencode/skills/code-quality/SKILL.md
@@ -2,16 +2,18 @@
 name: code-quality
 description: Enforce code quality using ruff, pytest coverage, and static type checking
 version: "2.0"
-author: developer
-audience: developer, reviewer
+author: software-engineer
+audience: software-engineer, reviewer
 workflow: feature-lifecycle
 ---
 
 # Code Quality
 
-Run these four commands before handing off to the reviewer (Step 5). All must pass.
+Run these four commands before handing off to the reviewer (Step 4). All must pass.
 
-## Developer Self-Check
+**This is a quick reference. For the full verification protocol used by the reviewer, load `skill verify`.**
+
+## Software-Engineer Self-Check
 
 Before handing off to reviewer:
 

diff --git a/.opencode/skills/create-agent/SKILL.md b/.opencode/skills/create-agent/SKILL.md
@@ -0,0 +1,198 @@
+---
+name: create-agent
+description: Create new OpenCode agents with research-backed design patterns and industry standards
+version: "1.0"
+author: human-user
+audience: human-user
+workflow: opencode
+---
+
+# Create Agent
+
+Create a new OpenCode agent following research-backed best practices from OpenAI, Anthropic, and scientific literature.
+
+## When to Use
+
+When you need a new agent with distinct ownership, instructions, tool surface, or approval policy. Not for simple routing — only when the task requires a separate domain of responsibility.
+
+## How to Create an Agent
+
+### 0. Research (mandatory — do this first)
+
+Before writing any agent, research the domain to ground the agent design in industry standards and scientifically-backed evidence:
+
+1. **Identify the agent's domain**: What role, responsibility, and domain will this agent own?
+2. **Search for domain-specific best practices**:
+   - For agent architecture: OpenAI Agents SDK, Anthropic Claude Agent SDK, Google Agents SDK
+   - For domain methodology: Academic papers, vendor guides, established standards (e.g., OWASP for security, IEEE for software engineering)
+   - For known failure modes: Post-mortems, case studies, industry reports
+3. **Synthesize conclusions**: What ownership boundaries work? What tool design patterns? What escalation rules?
+4. **Embed as design decisions**: Write the agent's ownership definition, instruction patterns, tool surface, and escalation rules based on those conclusions — not as citations but as direct guidance
+
+**Example research synthesis:**
+```
+Agent domain: Security reviewer agent
+Research: OWASP Testing Guide, NIST security controls, Anthropic's adversarial verification patterns
+Conclusion: Security agents should assume breach by default, escalate on any critical finding, use defense-in-depth checklist.
+→ Agent design: "role: reviewer", "escalation: any critical = human", "tool: security-scan + vuln-check"
+```
+
+### 1. Create the agent file
+
+```bash
+mkdir -p .opencode/agents/
+```
+
+Create `.opencode/agents/<agent-name>.md`:
+
+```markdown
+---
+name: <agent-name>
+description: <1-sentence description of what this agent does>
+role: <product-owner | software-engineer | reviewer | setup-project | human-user>
+steps: <step numbers this agent owns, e.g., "2, 3">
+---
+
+# <Agent Name>
+
+[Brief description of the agent's purpose and when it's invoked.]
+
+## Role
+
+<What this agent does in the workflow.>
+
+## Available Skills
+
+| Skill | When to Load | Purpose |
+|---|---|---|
+| `session-workflow` | Every session | Session start/end protocol |
+| `<skill-name>` | When needed | <What the skill provides> |
+
+## Instructions
+
+<Detailed instructions for this agent. Include:>
+
+- When to invoke this agent (trigger conditions)
+- What steps it owns
+- How to use tools
+- When to escalate or hand off
+```
+
+### 2. Follow the structural rules
+
+Apply the research conclusions about file organization:
+
+| File | When Loaded | Content | Avoid |
+|---|---|---|---|
+| `AGENTS.md` | Always | Shared conventions, commands | Workflow details |
+| `.opencode/agents/*.md` | When role invoked | Role identity, step ownership, skill loads, tool permissions | Duplication |
+| `.opencode/skills/*.md` | On demand | Full procedural instructions | Duplication |
+
+**Why**: Keeping always-loaded files lean preserves attention budget for the task at hand.
+
+### 3. Define clear ownership boundaries
+
+**Split criteria**:
+- Separate ownership (different domain responsibility)
+- Different instructions (not just more detail)
+- Different tool surface (distinct actions)
+- Different approval policy (escalation rules)
+
+**Anti-pattern**: Creating agents just to organize instructions. A single agent with more tools is usually better than multiple agents.
+
+### 4. Write effective instructions
+
+Write instructions that work in practice:
+
+- **Specific triggers**: "Load this skill when X" not "use judgment"
+- **Clear actions**: Every step corresponds to a specific output
+- **Concrete examples**: Include before/after code where helpful
+- **Verification criteria**: How does the agent know it's done?
+
+### 5. Define tool permissions
+
+Design the tool surface based on what the agent needs to accomplish:
+
+- **Start with bash** for breadth
+- **Promote to dedicated tools** when you need to:
+  - Gate security-sensitive actions
+  - Render structured output
+  - Audit usage patterns
+  - Serialize vs. parallelize
+
+### 6. Add to AGENTS.md
+
+Register the agent in the workflow section of `AGENTS.md`:
+
+```markdown
+## Agents
+
+| Agent | Role | Steps | Skills |
+|-------|------|-------|--------|
+| <name> | <role> | <steps> | <skills> |
+```
+
+## Agent Template
+
+```markdown
+---
+name: <agent-name>
+description: <what this agent does, 1 sentence>
+role: <product-owner | software-engineer | reviewer | setup-project | human-user>
+steps: <owned steps, e.g., "2-3">
+---
+
+# <Agent Title>
+
+<2-3 paragraphs: what this agent does, when invoked, what it delivers.>
+
+## Context
+
+<What this agent knows/has access to>
+
+## Available Skills
+
+- `session-workflow` — always
+- `<skill>` — when <trigger>
+
+## Instructions
+
+### Step <N>: <Step Name>
+
+1. <Specific action>
+2. <Specific action>
+3. <Verification>
+
+### Hand-off
+
+When to transfer to <other agent>: <condition>
+
+## Tool Permissions
+
+- Read files: <scope>
+- Write files: <scope>
+- Execute commands: <scope>
+- Network access: <yes/no>
+
+## Escalation
+
+When to escalate to human: <conditions>
+```
+
+## Existing Agents in This Project
+
+| Agent | Role | Steps | Purpose |
+|---|---|---|---|
+| `product-owner` | product-owner | 1, 5 | Scope discovery, acceptance |
+| `software-engineer` | software-engineer | 2, 3, 5 | Architecture, TDD, releases |
+| `reviewer` | reviewer | 4 | Adversarial verification |
+| `setup-project` | setup-project | meta | Initialize new projects |
+
+## Best Practices Summary
+
+1. **Start with a single agent** — add more only when ownership boundaries are clear
+2. **Define ownership, not volume** — separate domains, not instruction sets
+3. **Keep instructions specific** — concrete triggers, not vague guidance
+4. **Match tools to security needs** — bash for flexibility, dedicated tools for gating
+5. **Test with real usage** — iterate based on failures
+6. **Reference, don't duplicate** — link to skills and AGENTS.md, don't copy content