Events from GitHub, Slack, or the CLI flow through a normalizer into a YAML-driven workflow engine. Each workflow runs its phases — Architect, Executor, Reviewer, and more — inside Docker sandboxes with downscoped GitHub tokens and optional human approval gates. Everything is logged to an admin dashboard you control.
This approach was inspired by the claw-code article by Sigrid Jin, which describes how the oh-my-codex (OmX) system rebuilt an entire codebase while the developer slept. The key insight:
"The code is a byproduct. The thing worth studying is the system that produced it."
The OmX system uses three tools working together: oh-my-codex for workflow orchestration, clawhip for event routing, and oh-my-openagent for multi-agent coordination. We adapted the core patterns — role-based agents, closed development loops, and GitHub-first coordination — into a lightweight TypeScript harness built on the Claude Agent SDK.
The execution model — spawning Claude Code as an isolated subprocess per task with Docker sandboxing — was inspired by Sandcastle by Matt Pocock, which demonstrated that the Claude CLI can be driven programmatically as a headless agent inside containers, with the harness managing orchestration and the agent managing the work.
Each agent has a distinct identity with explicit constraints and output formats. An Architect can only read. An Executor must verify before claiming done. A Reviewer has no shared context with the builder. The role defines the behaviour — not a vague instruction to "be careful."
The agent that writes the code never reviews it. A fresh context catches what accumulated familiarity misses. This is the same reason humans do code review — but enforced structurally, not by convention.
All work is tracked through GitHub issues, regardless of where the request originates — Slack, CLI, or webhook. Build requests from chat platforms must create a GitHub issue first; the issue is the authorisation gate. Every phase posts progress. The result is a complete, auditable trail.
No "should work" or "looks correct." Every claim cites file:line evidence. Every completion requires fresh test output. Architects ground their analysis in code they actually read. Executors prove their changes with verification output.
Every behavior — triage, review, build, health, chat — is a YAML workflow. The harness is workflow-agnostic: it reads a file, executes phases in order (or as a DAG), and writes the results to SQLite and the session JSONLs. Adding a new behavior means dropping a new YAML file into workflows/, not editing TypeScript.
name: build
profile: repo-write
phases:
- name: guardrails
skill: assure-guardrails
- name: architect
prompt: prompts/architect.md
approval_gate: post_architect
- name: executor
prompt: prompts/executor.md
needs: [architect]
- name: reviewer
prompt: prompts/reviewer.md
needs: [executor]
loop:
fix_prompt: prompts/fix.md
max_cycles: 2
- name: pr
skill: github-pr A phase is either context (a no-op checkpoint for the dashboard), agent (runs an Agent SDK session with a prompt: or skill:), or loop (an agent phase that iterates on reviewer feedback up to max_cycles times, each fix tracked as reviewer_fix_1, reviewer_fix_2…).
Each workflow declares a profile: read, issues-write, review-write, or repo-write. The harness mints a downscoped GitHub App installation token with exactly those scopes and passes it into the sandbox. A triage run literally cannot push code, even if a compromised prompt tried to.
A phase can declare approval_gate: post_architect. When that gate is enabled (via the APPROVAL_GATES env var) the runner persists the paused state, writes a row to workflow_approvals, and waits for a decision via GitHub comment (@last-light approve), a Slack slash command (/approve), or the dashboard. Resume picks up exactly where it stopped.
Sandbox — YAML workflows run inside a Docker container with a fresh worktree and scoped token. In-process — the Slack chat skill runs directly in the harness process for low-latency replies, resuming the same Agent SDK session per Slack thread so one conversation = one growing JSONL.
Every significant piece of work flows through three specialised agents, each with strict boundaries on what they can and cannot do. Each role is defined in a skill file.
Diagnoses, analyses, and recommends. Forms hypotheses, then cross-checks every one against the actual code. Never edits a file.
Can do
file:line citationsCannot do
Output
Summary, root cause, recommendations with effort/impact, tradeoffs table, references.
Implements the Architect's plan. Writes code, runs tests, commits. Keeps going until the task is fully resolved — no partial completion.
Can do
Cannot do
Output
Files changed, test results, commit hash. Uses Lore-style commits: intent-first message + Tested: and Scope-risk: trailers.
Verifies the Executor's work with zero shared context. Checks the code against the Architect's plan, runs tests, reports issues. Never fixes — only reports.
Can do
Cannot do
Output
Verdict: APPROVED or REQUEST_CHANGES with specific issues and file:line references.
The build cycle only starts when a repository maintainer explicitly @mentions the bot on a GitHub issue. Requests from Slack are never executed directly — the bot creates a GitHub issue first, and the build runs against that issue. This is a deliberate safety constraint: the GitHub issue is the authorisation gate, and all code changes have a traceable origin.
Ensure a GitHub issue exists (create one if the request came from Slack or CLI). Clone the repo, read its CLAUDE.md/AGENTS.md, and assemble a context snapshot: what needs doing, what success looks like, what constraints exist, and what's unknown.
Then run the guardrails check: verify the repo has a working test framework, linting, and type checking. If any are missing, the bot creates a separate issue for the gaps, links it to the original task, and fixes the foundations first. You can't do TDD without tests, and the Reviewer can't verify without a test suite to run.
A read-only agent analyses the codebase, identifies the files to change, plans the approach, flags risks, and estimates complexity. Every recommendation cites file:line evidence. The summary is posted to the GitHub issue.
A fresh agent receives the Architect's plan and implements it. It follows test-driven development: write a failing test, implement, verify, commit. Commits use the Lore format with semantic trailers (Tested:, Scope-risk:). Progress is posted to the issue.
An independent agent — with no shared context from the Executor — verifies the implementation. It runs the tests, checks the code against the plan, and looks for security and logic issues. It returns APPROVED or REQUEST_CHANGES.
If the Reviewer rejects, a new Executor fixes only the reported issues (fresh context, not the original builder). The Reviewer checks again. Maximum two cycles — after that, remaining issues are noted in the PR description for human review.
The PR is opened, linked to the original issue, and the issue gets a comment with the PR link. The full audit trail — from context snapshot to review verdict — lives on the GitHub issue.
Every connector (GitHub webhook, Slack socket, CLI) normalizes its platform events into a canonical EventEnvelope. A deterministic router maps that envelope to a YAML workflow and a context object — no LLM decides the routing. Build-intent classification on @mention comments is the one place an LLM is called, and only to distinguish "build this" from "respond to this".
Classifies the issue (bug, feature, question), adds labels, checks for duplicates, and asks for missing information if needed. Runs with the issues-write permission profile.
Workflow: issue-triage.yaml
Reviews the diff with structured feedback: critical issues first, then important, suggestions, and nits. Complex PRs (>300 lines) get deep analysis with local clone and data flow tracing. Runs with the review-write permission profile.
Workflow: pr-review.yaml
Only triggers when a maintainer @mentions the bot — non-maintainers get a polite decline. The bot reacts with 🚀 on the triggering comment so you get instant visual feedback, then runs the full Guardrails → Architect → Executor → Reviewer → PR cycle with up to 2 reviewer-fix loops. Runs with the repo-write permission profile.
Workflow: build.yaml
Polls for new unlabelled issues every 15 minutes. Same workflow as the webhook handler, but for setups without a public webhook endpoint. Skipped automatically when webhooks are enabled.
Workflow: cron-triage.yaml
Summarises open issues by priority, stale items, PRs awaiting review, and recently closed work. Flags anything that needs attention and posts the result to Slack (SLACK_DELIVERY_CHANNEL).
Workflow: cron-health.yaml
A DM or @mention in a channel runs the in-process chat skill — no Docker sandbox, so replies are low-latency. The Agent SDK session is resumed per Slack thread, so a thread grows into one coherent conversation over hours or days. Every turn is persisted to SQLite and visible in the dashboard's Chat Sessions tab.
Skill: in-process chat (src/engine/chat.ts)
When Last Light commits code, it uses a format adapted from the OmX "Lore commit" convention. The goal: give future humans and agents enough context to understand the decision without reading the diff.
[verified] feat: add rate limiting to webhook endpoint (#42)
Tested: npm test -> 23 passed, 0 failed
Scope-risk: medium
Constraint: must stay under 100ms p99 to avoid GitHub webhook timeout [verified] An independent Reviewer agent approved this change.
Tested: What test command was run and the result. Not "tests pass" — the actual output.
Scope-risk: How much of the codebase this touches. Helps humans decide how urgently to review.
Constraint: Optional. External forces that shaped the decision — things a future agent might not know.
This is an evolving system. Here's what we know doesn't work yet and what we're exploring.
The workflow runner executes phases sequentially (or as a DAG when needs: is declared), one phase at a time, each in its own Docker sandbox container. OmX achieves parallelism through tmux-based workers and shared state files. Last Light's DAG support is the groundwork for future parallel phases, but today every phase waits for its dependencies.
Future: We're exploring parallel agent dispatch for independent subtasks within a single phase.
OmX's $ralph mode keeps an agent working across iterations until the task is architect-verified complete, with automatic retry on failure. Last Light approximates this with an architect completion gate (max 2 retry cycles), but doesn't yet have true persistent loops that survive session resets.
Webhook sessions have a timeout (~30 minutes) and tool-call limit. The role-based cycle costs 3-5 subagent calls minimum. For simple, well-scoped changes this works well. For complex multi-file refactors, the budget can get tight. Simple requests can skip the Architect phase to save iterations.
Each sandbox run starts with a fresh Agent SDK session. Within one build, phases DO share state — see the "Cross-phase handoff" section below — but across builds the reviewer can't learn from patterns in previous reviews. Memory is per-build, not cumulative across the repo's history.
Each phase runs in its own fresh Agent SDK session — no shared context, no shared memory. But phases still need to coordinate, so they hand off through two channels: the git branch (committed code and tests) and a plain-text folder on that branch called .lastlight/issue-<N>/. Every phase commits its own outputs before exiting; every subsequent phase clones the branch and reads them.
guardrails-report.md ← test/lint/typecheck commands the repo uses
architect-plan.md ← problem statement, files to change, test strategy
status.md ← current_phase, reviewer_status, loop counters
executor-summary.md ← files changed, test output, lint output, deviations
reviewer-verdict.md ← APPROVED or REQUEST_CHANGES, issues list, test results Guardrails runs first and writes guardrails-report.md with the exact test/lint/typecheck commands it discovered in the repo. The Architect reads it so its plan can assume a real verification pipeline exists.
The Architect produces architect-plan.md: problem statement with file:line evidence, files to modify, step-by-step implementation approach, risks, test strategy. The Executor reads this as its ground truth instead of re-deriving intent from the issue body.
The Executor writes executor-summary.md (files changed, actual test/lint/typecheck output, deviations from plan) and commits the code. The Reviewer clones the branch, runs git diff main...HEAD to see exactly what changed, and reads the plan + summary to understand intent. It can see the code and the claims about the code — not the Executor's reasoning.
The Reviewer writes reviewer-verdict.md with exactly one of VERDICT: APPROVED / VERDICT: REQUEST_CHANGES as the first line. The runner parses that marker. If changes are requested, a fresh Executor session reads the verdict and the original plan, fixes the issues, and the Reviewer runs again (up to 2 fix cycles).
This is what makes "no agent verifies its own work" practical. The Reviewer has no shared session with the Executor, so its context isn't contaminated by the Executor's framing — but it's not flying blind either, because the plan, the summary, and the diff are all right there on the branch.
Last Light adapts OmX's core ideas to a different runtime. Here's what maps directly and what we had to rethink.
Workflows invoke skills by name. A skill is a SKILL.md file under skills/ — no framework APIs, no plugin system, just instructions the agent follows. Phases can either inline a prompt (prompt: prompts/architect.md) or delegate to a skill (skill: assure-guardrails). Change the file, change the behavior.
---
name: architect
description: >
Read-only deep analysis role. Diagnose problems, analyze
codebases, and recommend approaches with file:line evidence.
---
## Identity
You are the Architect. You diagnose, analyze, and recommend
with file-backed evidence. You are strictly read-only.
## Constraints
- Never write or edit files
- Never judge code you have not opened
- Always cite file:line for every important claim
... architect Read-only deep analysis with file:line evidence. Plans the approach before implementation. assure-guardrails Pre-flight check for test framework, linting, and type checking. Blocks builds on repos without foundations. workflows/*.yaml The build cycle is no longer a monolithic coordinator skill. Each workflow (build.yaml, issue-triage.yaml, pr-review.yaml, cron-health.yaml…) declares its phases in YAML; the harness runner executes them and the skills table above is what individual phases invoke. subagent-driven-development Dispatches fresh subagents per task with role-based prompts and two-stage review (spec then quality). requesting-code-review Pre-commit verification pipeline — static security scan, independent reviewer, auto-fix loop, Lore commits. test-driven-development Enforces RED-GREEN-REFACTOR: write failing test first, implement, verify, commit. systematic-debugging 4-phase root cause investigation — no fixes without understanding the problem first. plan Read-only planning mode. Produces a markdown plan without executing any changes. pr-review Structured PR review: critical > important > suggestions > nits. Deep analysis for complex PRs. issue-triage Classifies, labels, deduplicates issues. Chases missing info, closes stale items. repo-health Weekly health report — open issues, stale items, PRs awaiting review, action items. All skill files live in the skills/ directory. The agent personality is defined in agent-context/soul.md.
Last Light is in active development. As we test these patterns in the real world — on the drizzle-cube and drizby repositories — we'll update this page with what works, what doesn't, and what we've changed. Check the roadmap for what's next.