Where this came from

This approach was inspired by the claw-code article by Sigrid Jin, which describes how the oh-my-codex (OmX) system rebuilt an entire codebase while the developer slept. The key insight:

"The code is a byproduct. The thing worth studying is the system that produced it."

The OmX system uses three tools working together: oh-my-codex for workflow orchestration, clawhip for event routing, and oh-my-openagent for multi-agent coordination. We adapted the core patterns — role-based agents, closed development loops, and GitHub-first coordination — into a lightweight TypeScript harness built on the Claude Agent SDK.

The execution model — spawning Claude Code as an isolated subprocess per task with Docker sandboxing — was inspired by Sandcastle by Matt Pocock, which demonstrated that the Claude CLI can be driven programmatically as a headless agent inside containers, with the harness managing orchestration and the agent managing the work.

Core principles

1

Roles, not prompts

Each agent has a distinct identity with explicit constraints and output formats. An Architect can only read. An Executor must verify before claiming done. A Reviewer has no shared context with the builder. The role defines the behaviour — not a vague instruction to "be careful."

2

No agent verifies its own work

The agent that writes the code never reviews it. A fresh context catches what accumulated familiarity misses. This is the same reason humans do code review — but enforced structurally, not by convention.

3

GitHub is the coordination layer

All work is tracked through GitHub issues, regardless of where the request originates — Slack, CLI, or webhook. Build requests from chat platforms must create a GitHub issue first; the issue is the authorisation gate. Every phase posts progress. The result is a complete, auditable trail.

4

Evidence over assertion

No "should work" or "looks correct." Every claim cites file:line evidence. Every completion requires fresh test output. Architects ground their analysis in code they actually read. Executors prove their changes with verification output.

The YAML workflow engine

Every behavior — triage, review, build, health, chat — is a YAML workflow. The harness is workflow-agnostic: it reads a file, executes phases in order (or as a DAG), and writes the results to SQLite and the session JSONLs. Adding a new behavior means dropping a new YAML file into workflows/, not editing TypeScript.

name: build
profile: repo-write

phases:
  - name: guardrails
    skill: assure-guardrails
  - name: architect
    prompt: prompts/architect.md
    approval_gate: post_architect
  - name: executor
    prompt: prompts/executor.md
    needs: [architect]
  - name: reviewer
    prompt: prompts/reviewer.md
    needs: [executor]
    loop:
      fix_prompt: prompts/fix.md
      max_cycles: 2
  - name: pr
    skill: github-pr

Phase types

A phase is either context (a no-op checkpoint for the dashboard), agent (runs an Agent SDK session with a prompt: or skill:), or loop (an agent phase that iterates on reviewer feedback up to max_cycles times, each fix tracked as reviewer_fix_1, reviewer_fix_2…).

Permission profiles

Each workflow declares a profile: read, issues-write, review-write, or repo-write. The harness mints a downscoped GitHub App installation token with exactly those scopes and passes it into the sandbox. A triage run literally cannot push code, even if a compromised prompt tried to.

Approval gates

A phase can declare approval_gate: post_architect. When that gate is enabled (via the APPROVAL_GATES env var) the runner persists the paused state, writes a row to workflow_approvals, and waits for a decision via GitHub comment (@last-light approve), a Slack slash command (/approve), or the dashboard. Resume picks up exactly where it stopped.

Two execution modes

Sandbox — YAML workflows run inside a Docker container with a fresh worktree and scoped token. In-process — the Slack chat skill runs directly in the harness process for low-latency replies, resuming the same Agent SDK session per Slack thread so one conversation = one growing JSONL.

The three roles

Every significant piece of work flows through three specialised agents, each with strict boundaries on what they can and cannot do. Each role is defined in a skill file.

Architect

read-only

Diagnoses, analyses, and recommends. Forms hypotheses, then cross-checks every one against the actual code. Never edits a file.

Can do

  • Read files, search code, view git history
  • Run read-only terminal commands
  • Produce structured analysis with file:line citations

Cannot do

  • Edit, create, or delete any file
  • Commit, push, or run mutating commands
  • Speculate without evidence

Output

Summary, root cause, recommendations with effort/impact, tradeoffs table, references.

Executor

full access

Implements the Architect's plan. Writes code, runs tests, commits. Keeps going until the task is fully resolved — no partial completion.

Can do

  • Read, write, and edit files
  • Run tests, builds, and linters
  • Commit and push changes

Cannot do

  • Claim "done" without fresh verification output
  • Broaden scope beyond the architect's plan
  • Skip TDD (test-driven development)

Output

Files changed, test results, commit hash. Uses Lore-style commits: intent-first message + Tested: and Scope-risk: trailers.

Reviewer

independent

Verifies the Executor's work with zero shared context. Checks the code against the Architect's plan, runs tests, reports issues. Never fixes — only reports.

Can do

  • Read all files and run tests
  • Compare implementation against the plan
  • Flag security, logic, and quality issues

Cannot do

  • Edit or fix any code
  • Access the Executor's reasoning or context
  • Approve without running verification

Output

Verdict: APPROVED or REQUEST_CHANGES with specific issues and file:line references.

The development loop

The build cycle only starts when a repository maintainer explicitly @mentions the bot on a GitHub issue. Requests from Slack are never executed directly — the bot creates a GitHub issue first, and the build runs against that issue. This is a deliberate safety constraint: the GitHub issue is the authorisation gate, and all code changes have a traceable origin.

PHASE 0 Context PHASE 1 Architect PHASE 2 Executor PHASE 3 Reviewer OK? PHASE 5 Create PR yes no (max 2x) GitHub Issue — progress posted at every phase
0

Pre-context intake + guardrails

Ensure a GitHub issue exists (create one if the request came from Slack or CLI). Clone the repo, read its CLAUDE.md/AGENTS.md, and assemble a context snapshot: what needs doing, what success looks like, what constraints exist, and what's unknown.

Then run the guardrails check: verify the repo has a working test framework, linting, and type checking. If any are missing, the bot creates a separate issue for the gaps, links it to the original task, and fixes the foundations first. You can't do TDD without tests, and the Reviewer can't verify without a test suite to run.

1

Architect analysis

A read-only agent analyses the codebase, identifies the files to change, plans the approach, flags risks, and estimates complexity. Every recommendation cites file:line evidence. The summary is posted to the GitHub issue.

2

Executor implementation

A fresh agent receives the Architect's plan and implements it. It follows test-driven development: write a failing test, implement, verify, commit. Commits use the Lore format with semantic trailers (Tested:, Scope-risk:). Progress is posted to the issue.

3

Reviewer verification

An independent agent — with no shared context from the Executor — verifies the implementation. It runs the tests, checks the code against the plan, and looks for security and logic issues. It returns APPROVED or REQUEST_CHANGES.

4

Fix loop

If the Reviewer rejects, a new Executor fixes only the reported issues (fresh context, not the original builder). The Reviewer checks again. Maximum two cycles — after that, remaining issues are noted in the PR description for human review.

5

Create PR

The PR is opened, linked to the original issue, and the issue gets a comment with the PR link. The full audit trail — from context snapshot to review verdict — lives on the GitHub issue.

Events and workflows

Every connector (GitHub webhook, Slack socket, CLI) normalizes its platform events into a canonical EventEnvelope. A deterministic router maps that envelope to a YAML workflow and a context object — no LLM decides the routing. Build-intent classification on @mention comments is the one place an LLM is called, and only to distinguish "build this" from "respond to this".

Issue opened

webhook

Classifies the issue (bug, feature, question), adds labels, checks for duplicates, and asks for missing information if needed. Runs with the issues-write permission profile.

Workflow: issue-triage.yaml

PR opened

webhook

Reviews the diff with structured feedback: critical issues first, then important, suggestions, and nits. Complex PRs (>300 lines) get deep analysis with local clone and data flow tracing. Runs with the review-write permission profile.

Workflow: pr-review.yaml

@mention from maintainer

webhook

Only triggers when a maintainer @mentions the bot — non-maintainers get a polite decline. The bot reacts with 🚀 on the triggering comment so you get instant visual feedback, then runs the full Guardrails → Architect → Executor → Reviewer → PR cycle with up to 2 reviewer-fix loops. Runs with the repo-write permission profile.

Workflow: build.yaml

Scheduled triage

cron / 15 min

Polls for new unlabelled issues every 15 minutes. Same workflow as the webhook handler, but for setups without a public webhook endpoint. Skipped automatically when webhooks are enabled.

Workflow: cron-triage.yaml

Weekly health report

cron / Mon 9am

Summarises open issues by priority, stale items, PRs awaiting review, and recently closed work. Flags anything that needs attention and posts the result to Slack (SLACK_DELIVERY_CHANNEL).

Workflow: cron-health.yaml

Slack message

socket mode

A DM or @mention in a channel runs the in-process chat skill — no Docker sandbox, so replies are low-latency. The Agent SDK session is resumed per Slack thread, so a thread grows into one coherent conversation over hours or days. Every turn is persisted to SQLite and visible in the dashboard's Chat Sessions tab.

Skill: in-process chat (src/engine/chat.ts)

Lore commits

When Last Light commits code, it uses a format adapted from the OmX "Lore commit" convention. The goal: give future humans and agents enough context to understand the decision without reading the diff.

[verified] feat: add rate limiting to webhook endpoint (#42)

Tested: npm test -> 23 passed, 0 failed
Scope-risk: medium
Constraint: must stay under 100ms p99 to avoid GitHub webhook timeout
[verified]

An independent Reviewer agent approved this change.

Tested:

What test command was run and the result. Not "tests pass" — the actual output.

Scope-risk:

How much of the codebase this touches. Helps humans decide how urgently to review.

Constraint:

Optional. External forces that shaped the decision — things a future agent might not know.

Known limitations

This is an evolving system. Here's what we know doesn't work yet and what we're exploring.

Sequential execution only

The workflow runner executes phases sequentially (or as a DAG when needs: is declared), one phase at a time, each in its own Docker sandbox container. OmX achieves parallelism through tmux-based workers and shared state files. Last Light's DAG support is the groundwork for future parallel phases, but today every phase waits for its dependencies.

Future: We're exploring parallel agent dispatch for independent subtasks within a single phase.

No persistent execution loops

OmX's $ralph mode keeps an agent working across iterations until the task is architect-verified complete, with automatic retry on failure. Last Light approximates this with an architect completion gate (max 2 retry cycles), but doesn't yet have true persistent loops that survive session resets.

Iteration budget pressure

Webhook sessions have a timeout (~30 minutes) and tool-call limit. The role-based cycle costs 3-5 subagent calls minimum. For simple, well-scoped changes this works well. For complex multi-file refactors, the budget can get tight. Simple requests can skip the Architect phase to save iterations.

No cross-run learning

Each sandbox run starts with a fresh Agent SDK session. Within one build, phases DO share state — see the "Cross-phase handoff" section below — but across builds the reviewer can't learn from patterns in previous reviews. Memory is per-build, not cumulative across the repo's history.

How phases hand off to each other

Each phase runs in its own fresh Agent SDK session — no shared context, no shared memory. But phases still need to coordinate, so they hand off through two channels: the git branch (committed code and tests) and a plain-text folder on that branch called .lastlight/issue-<N>/. Every phase commits its own outputs before exiting; every subsequent phase clones the branch and reads them.

.lastlight/issue-42/ (after a full build cycle)
guardrails-report.md   ← test/lint/typecheck commands the repo uses
architect-plan.md      ← problem statement, files to change, test strategy
status.md              ← current_phase, reviewer_status, loop counters
executor-summary.md    ← files changed, test output, lint output, deviations
reviewer-verdict.md    ← APPROVED or REQUEST_CHANGES, issues list, test results

Guardrails → Architect

Guardrails runs first and writes guardrails-report.md with the exact test/lint/typecheck commands it discovered in the repo. The Architect reads it so its plan can assume a real verification pipeline exists.

Architect → Executor

The Architect produces architect-plan.md: problem statement with file:line evidence, files to modify, step-by-step implementation approach, risks, test strategy. The Executor reads this as its ground truth instead of re-deriving intent from the issue body.

Executor → Reviewer

The Executor writes executor-summary.md (files changed, actual test/lint/typecheck output, deviations from plan) and commits the code. The Reviewer clones the branch, runs git diff main...HEAD to see exactly what changed, and reads the plan + summary to understand intent. It can see the code and the claims about the code — not the Executor's reasoning.

Reviewer → Fix loop

The Reviewer writes reviewer-verdict.md with exactly one of VERDICT: APPROVED / VERDICT: REQUEST_CHANGES as the first line. The runner parses that marker. If changes are requested, a fresh Executor session reads the verdict and the original plan, fixes the issues, and the Reviewer runs again (up to 2 fix cycles).

This is what makes "no agent verifies its own work" practical. The Reviewer has no shared session with the Executor, so its context isn't contaminated by the Executor's framing — but it's not flying blind either, because the plan, the summary, and the diff are all right there on the branch.

How this compares to OmX

Last Light adapts OmX's core ideas to a different runtime. Here's what maps directly and what we had to rethink.

OmX / claw-code Last Light
Runtime OpenAI Codex CLI + tmux Claude Agent SDK + Docker sandbox
Role system 33 prompt templates ($architect, $executor, etc.) 3 core roles (Architect, Executor, Reviewer) as prompt templates invoked by YAML workflow phases
Parallelism tmux panes with shared state files + mailbox dispatch Sequential (workflow runner executes phases one at a time; DAG support pending parallel phases)
Coordination Discord channels via clawhip daemon GitHub issues as authorisation gate; input from Slack, webhooks, CLI
Persistence $ralph — loops until architect-verified complete Architect completion gate with max 2 retry cycles
Event routing clawhip daemon (background process) Deterministic TypeScript router (Hono webhooks)
Commits Lore format (full trailer set) Lore format (reduced: Tested, Scope-risk, Constraint)

Skills are plain text

Workflows invoke skills by name. A skill is a SKILL.md file under skills/ — no framework APIs, no plugin system, just instructions the agent follows. Phases can either inline a prompt (prompt: prompts/architect.md) or delegate to a skill (skill: assure-guardrails). Change the file, change the behavior.

---
name: architect
description: >
  Read-only deep analysis role. Diagnose problems, analyze
  codebases, and recommend approaches with file:line evidence.
---

## Identity
You are the Architect. You diagnose, analyze, and recommend
with file-backed evidence. You are strictly read-only.

## Constraints
- Never write or edit files
- Never judge code you have not opened
- Always cite file:line for every important claim
...
Skill Purpose
architect Read-only deep analysis with file:line evidence. Plans the approach before implementation.
assure-guardrails Pre-flight check for test framework, linting, and type checking. Blocks builds on repos without foundations.
workflows/*.yaml The build cycle is no longer a monolithic coordinator skill. Each workflow (build.yaml, issue-triage.yaml, pr-review.yaml, cron-health.yaml…) declares its phases in YAML; the harness runner executes them and the skills table above is what individual phases invoke.
subagent-driven-development Dispatches fresh subagents per task with role-based prompts and two-stage review (spec then quality).
requesting-code-review Pre-commit verification pipeline — static security scan, independent reviewer, auto-fix loop, Lore commits.
test-driven-development Enforces RED-GREEN-REFACTOR: write failing test first, implement, verify, commit.
systematic-debugging 4-phase root cause investigation — no fixes without understanding the problem first.
plan Read-only planning mode. Produces a markdown plan without executing any changes.
pr-review Structured PR review: critical > important > suggestions > nits. Deep analysis for complex PRs.
issue-triage Classifies, labels, deduplicates issues. Chases missing info, closes stale items.
repo-health Weekly health report — open issues, stale items, PRs awaiting review, action items.

All skill files live in the skills/ directory. The agent personality is defined in agent-context/soul.md.

This page will evolve

Last Light is in active development. As we test these patterns in the real world — on the drizzle-cube and drizby repositories — we'll update this page with what works, what doesn't, and what we've changed. Check the roadmap for what's next.