Spec
State
Purpose
State is split deliberately between two stores:
- SQLite (
$STATE_DIR/lastlight.db) — the resume substrate. Indexed, mutable, small. Tracks what’s running, what’s paused, what to do next. - JSONL (per-session files under
$STATE_DIR/agent-sessions/projects/) — the event log. Append-only, large, streamable. Captures every event the agent emitted, in order.
The split rule is load-bearing: unbounded text never lands in
workflow_runs blobs. Large LLM outputs live in JSONL or in
executions.output_text (a row the runner points at), never inlined
into the resume state read by every dashboard query.
SQLite tables
src/state/db.ts defines six tables. All rows are append-only unless
marked mutable. Migrations are additive — CREATE TABLE IF NOT EXISTS
plus ALTER TABLE ADD COLUMN blocks wrapped in try/catch.
executions
One row per phase execution (sandbox) or chat turn. The bridge between the resume state and the JSONL.
CREATE TABLE IF NOT EXISTS executions (
id TEXT PRIMARY KEY,
trigger_type TEXT NOT NULL, -- "webhook" | "cron" | "chat" | "api"
trigger_id TEXT NOT NULL, -- issue URL, Slack thread id, etc.
skill TEXT NOT NULL, -- "workflow-name:phase-name" or "chat"
repo TEXT,
issue_number INTEGER,
started_at TEXT NOT NULL,
finished_at TEXT,
success INTEGER, -- 1 | 0 | NULL (still running)
error TEXT,
turns INTEGER,
duration_ms INTEGER,
session_id TEXT, -- agentic-pi session id; key into JSONL filename
cost_usd REAL,
input_tokens INTEGER,
cache_creation_input_tokens INTEGER,
cache_read_input_tokens INTEGER,
output_tokens INTEGER,
api_duration_ms INTEGER,
stop_reason TEXT,
workflow_run_id TEXT, -- → workflow_runs.id
output_text TEXT -- large final assistant text for loop iterations
);
CREATE INDEX idx_executions_trigger ON executions(trigger_type, trigger_id);
CREATE INDEX idx_executions_skill ON executions(skill, started_at);
CREATE INDEX idx_executions_workflow_run ON executions(workflow_run_id, skill);
output_text is only populated when a loop iteration’s
scratch.<key>.lastOutputExecutionId points at this row. The full
event stream lives in the JSONL; output_text is the cached final
assistant message the next iteration needs to read without rehydrating
the full conversation.
workflow_runs
One row per workflow dispatch. The resume substrate.
CREATE TABLE IF NOT EXISTS workflow_runs (
id TEXT PRIMARY KEY,
workflow_name TEXT NOT NULL,
trigger_id TEXT NOT NULL,
repo TEXT,
issue_number INTEGER,
current_phase TEXT NOT NULL,
phase_history TEXT NOT NULL DEFAULT '[]', -- JSON array of completed phases
status TEXT NOT NULL DEFAULT 'running', -- running | paused | succeeded | failed | cancelled
context TEXT, -- immutable trigger context (JSON)
scratch TEXT, -- mutable phase-to-phase state (JSON)
node_statuses TEXT, -- DAG node status map (JSON)
restart_count INTEGER NOT NULL DEFAULT 0,
started_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
finished_at TEXT
);
CREATE INDEX idx_workflow_runs_trigger ON workflow_runs(trigger_id, status);
CREATE INDEX idx_workflow_runs_status ON workflow_runs(status);
CREATE INDEX idx_workflow_runs_started_at ON workflow_runs(started_at DESC);
CREATE INDEX idx_workflow_runs_name_started ON workflow_runs(workflow_name, started_at DESC);
scratch is the only mutable JSON. context is set on creation and
never changed. phase_history is technically a JSON array that the
runner appends to. restart_count is the Workflow Engine
crash-loop circuit breaker.
workflow_approvals
CREATE TABLE IF NOT EXISTS workflow_approvals (
id TEXT PRIMARY KEY,
workflow_run_id TEXT NOT NULL, -- → workflow_runs.id
gate TEXT NOT NULL, -- "post_architect", "post_reviewer", etc.
summary TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending | approved | rejected
kind TEXT NOT NULL DEFAULT 'approve', -- "approve" or "reply" (Socratic loop)
requested_by TEXT,
responded_by TEXT,
response TEXT,
responded_at TEXT,
created_at TEXT NOT NULL
);
kind: "reply" is the Socratic loop’s reply gate — any free-form
message resolves it; no explicit approve / reject needed.
cron_overrides
CREATE TABLE IF NOT EXISTS cron_overrides (
name TEXT PRIMARY KEY,
enabled INTEGER NOT NULL DEFAULT 1,
schedule TEXT, -- override the YAML schedule
updated_at TEXT NOT NULL,
updated_by TEXT
);
Mutable. Deletion reverts to YAML defaults.
workflow_overrides
CREATE TABLE IF NOT EXISTS workflow_overrides (
name TEXT PRIMARY KEY,
enabled INTEGER NOT NULL DEFAULT 1,
updated_at TEXT NOT NULL,
updated_by TEXT
);
Workflow-level kill switch. Absence of a row = enabled by default.
messaging_sessions + messaging_messages
CREATE TABLE IF NOT EXISTS messaging_sessions (
id TEXT PRIMARY KEY,
platform TEXT NOT NULL, -- "slack"
channel_id TEXT NOT NULL,
thread_id TEXT,
user_id TEXT NOT NULL,
agent_session_id TEXT, -- pi-ai session id → JSONL filename
created_at TEXT NOT NULL,
last_activity_at TEXT NOT NULL,
message_count INTEGER DEFAULT 0,
active INTEGER DEFAULT 1
);
CREATE INDEX idx_msg_sessions_lookup ON messaging_sessions(platform, channel_id, thread_id, user_id);
CREATE UNIQUE INDEX idx_msg_sessions_unique_active
ON messaging_sessions(platform, channel_id, thread_id, user_id) WHERE active = 1;
CREATE TABLE IF NOT EXISTS messaging_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL REFERENCES messaging_sessions(id),
role TEXT NOT NULL, -- "user" | "assistant"
content TEXT NOT NULL,
timestamp TEXT NOT NULL,
platform_message_id TEXT
);
CREATE INDEX idx_msg_messages_session ON messaging_messages(session_id, timestamp);
The partial unique index enforces “one active session per (platform, channel, thread, user)” while allowing old inactive rows to stack. See Chat for the session lifecycle.
JSONL event log
Per-session, append-only, one file per agent session.
Paths
$STATE_DIR/agent-sessions/projects/
├── -<sanitized-cwd>/<sessionId>.jsonl ← sandboxed workflow phases
│ (e.g. -home-agent-workspace/<id>.jsonl)
└── -app/<sessionId>.jsonl ← chat turns (cwd = /app)
Sanitization: slashes in the agent’s cwd become dashes via
projectSlugForCwd() (src/engine/event-shim.ts). The leading dash is
the convention agentic-pi expects; the dashboard’s SessionReader and
ChatSessionReader scan these directories.
Line format
Each line is a JSON object in Claude-SDK envelope shape:
{"type":"user","message":{"role":"user","content":"..."},"timestamp":"...","sessionId":"..."}
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"..."}],"model":"..."},"timestamp":"...","sessionId":"..."}
{"type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"...","content":"..."}]},"timestamp":"...","sessionId":"..."}
{"type":"result","subtype":"success","num_turns":7,"total_cost_usd":0.13,"total_input_tokens":...,"stop_reason":"end_turn","timestamp":"..."}
The format predates the agentic-pi migration — it’s the Claude SDK
shape because that’s what the dashboard already knew how to render.
The translation lives in AgenticShim.
Translation rules (AgenticShim)
src/engine/event-shim.ts:
| agentic-pi event | JSONL envelope |
|---|---|
session | (opens the file; emits the initial user envelope with the prompt) |
message_end (assistant) | assistant envelope with text + tool_use blocks (thinking blocks dropped) |
tool_execution_end | user envelope with tool_result block |
usage_snapshot | result envelope with cost, tokens, turns, stop_reason |
fatal_error | assistant envelope with isApiErrorMessage: true |
Tool results > 64 KB are truncated with a …[truncated N chars]
marker. The raw output remains in workspace files / stdout — only the
JSONL is capped, for dashboard render efficiency.
Append-only
Lines are never edited or deleted. A resumed workflow that re-enters the same session id appends to the existing file. No file rotation.
The split rule
| Store | What goes here | Why |
|---|---|---|
| SQLite | Execution lifecycles, costs, phase history, approvals, scratch keys + pointers, schedule overrides, messaging session metadata | Indexed, fast list queries, small rows. The dashboard’s list-view query is ORDER BY started_at DESC LIMIT 20 polled every 5 s — it must return cheaply. |
| JSONL | Every agent event in order — assistant messages, tool calls, tool results, usage snapshots, errors | Append-only event stream, unbounded length, one file per session. Lets the dashboard render the full conversation without paging through SQLite blobs. |
The dashboard’s workflow-runs list endpoint excludes context,
scratch, and node_statuses from the SELECT so the list query
stays small even when individual runs accumulate megabytes of state.
The detail endpoint uses SELECT *.
output_text is the bridge. When a loop iteration needs to read
its prior output, it doesn’t rehydrate the JSONL — it looks up
scratch.<key>.lastOutputExecutionId, joins on executions, and reads
output_text directly. One row, one column, bounded size.
Migrations
migrate() runs on every new StateDb() call:
CREATE TABLE IF NOT EXISTS …for every table.CREATE INDEX IF NOT EXISTS …for every index.- Additive
ALTER TABLE … ADD COLUMN …in try/catch for fields added since v0.0.1. Old rows have NULLs; new rows respect defaults.
Strategy: never drop, never narrow. Long-running deployments accumulate schema; SQLite handles it.
PRAGMA foreign_keys = ON is set at connection time (better-sqlite3
default behaviour depends on version — the harness sets it explicitly).
A one-shot rebuild of messaging_sessions was needed once to remove
an overly strict table-level UNIQUE constraint that blocked legitimate
session recreation after timeouts.
Invariants
- No unbounded text in
workflow_runs.scratch. Loop iterations store anexecutions.idreference; the text lives inoutput_textor in JSONL. session_idis the join key between the two stores. Everyexecutionsrow that ran an agent has one; matching the JSONL filename joins them.- Append-only by default. Only
cron_overridesandworkflow_overridespermit deletion; everything else accumulates. Audit trail trumps disk usage. - JSONL truncation is for display, not retention. The raw output is still on disk somewhere (workspace, stdout). A re-implementation that deletes the original content based on JSONL truncation is losing data.
- Partial unique index on
messaging_sessionsallows multiple inactive rows but exactly one active per key. - List queries exclude blob columns. The dashboard polls every
5 s; reading
context+scratch+node_statusesfor every row would dominate the query cost. The list endpoint’s projection is deliberate.
Current implementation
| Piece | File |
|---|---|
| All tables, indexes, accessors | src/state/db.ts |
| JSONL writer + envelope translation | src/engine/event-shim.ts |
| Sandbox session reader (dashboard) | src/admin/SessionReader.ts |
| Chat session reader (dashboard, DB-backed) | src/admin/ChatSessionReader.ts |
| Session manager (messaging) | src/connectors/messaging/session-manager.ts |
Rebuild notes
- Pick the split. Resume state goes to a small, indexed store (SQLite, Postgres, any KV). Event stream goes to append-only files (JSONL, NDJSON, anything line-oriented). Don’t put the event stream in the relational store.
- Don’t grow the resume state by accident. Every blob column you
add will end up read by the list query. If you find yourself adding
large_output TEXTto a frequently-listed table, you have the wrong shape — write it to JSONL or to a separate small table the list endpoint doesn’t read. - Index the list query, not everything. The hot path is “recent
rows, status filter, name filter”. One descending index by
started_atis doing most of the work. - Make
session_idthe join. It’s the only stable id the agent runtime hands you; everything else (taskId, workflow_run_id) is harness state. - Migrate additively. Drops, narrowings, renames are all high-risk on a running system. Adding a column with a NULL default is safe.
- Plan for
restart_countfrom day one. Crash loops are a certainty. Cap them at the schema level so a stuck workflow can’t consume the database.