Spec
Chat
Purpose
Chat is the alternative execution surface to the Sandbox. It exists because Slack threads need low-latency, multi-turn conversations that don’t benefit from container isolation — and because handing a tool-rich, write-capable agent to every casual question is overkill.
The deliberate split: workflows do work and need isolation; chat answers questions and needs latency. pi-ai is the right runtime for the latter; agentic-pi is the right runtime for the former.
Public contract
// src/engine/chat-runner.ts:75
export class ChatRunner {
constructor(cfg: ChatRunnerConfig, sessionManager: SessionManager);
async turn(messagingSessionId: string, prompt: string): Promise<ChatRunnerTurnResult>;
}
interface ChatRunnerConfig {
model: string; // resolved via resolveModel(config.models, "chat")
thinking?: string; // off | minimal | low | medium | high | xhigh
systemPrompt: string; // loadAgentContext() + CHAT_SYSTEM_SUFFIX + skill catalogue XML
github?: ChatGitHubAuth;
extraTools?: ChatExtraToolset; // additional tools (read_skill); merged with github tools
timeoutMs?: number; // per-turn; default 120 s
}
The runner is constructed once at Harness boot
(src/index.ts:103–111) and lives for the lifetime of the process. Each
inbound Slack message becomes one turn() call.
pi-ai vs agentic-pi
Both are exported from @earendil-works/pi-ai. They serve different
purposes:
- pi-ai —
completeSimple()is a single-turn-loop chat runtime with tool support. No sandbox. No supervisor. Suitable for low-latency conversational replies. Used here. - agentic-pi — the sandboxed agent supervisor used by Workflow Engine phases. Higher overhead per session, full isolation, full tool surface.
The runtimes share the provider abstraction and JSONL event-emission shape, which is why both can write to the same dashboard via the Event Shim.
Session model
One pi-ai session per Slack thread, mapped through the
messaging_sessions table.
Flow per turn (chat-runner.ts:129–170):
- Resolve or mint
agentSessionIdfor the messaging session. New threads get a fresh id; existing threads reuse the stored one. getHistory()rehydrates the last 50 user/assistant message pairs frommessaging_messages(rolling window — no token-aware truncation).- The new user message is appended to the in-memory turn payload.
completeSimple()runs the model with the read-only tool kit (line 197).- The final user prompt and the final assistant text are persisted
via
addMessage()(messaging_messagesinsert). Intermediate tool-loop output is discarded — only the surface conversation is stored. touchSession()updateslast_activity_at(session-manager.ts:197).
The agent_session_id is the join key into the JSONL — Slack thread
↔ messaging_session ↔ agent_session_id ↔
projects/-app/<agent_session_id>.jsonl. See State.
Tools
Two toolsets, merged into a single tool list at construction time
(chat-runner.ts mergedTools):
GitHub (read-only)
Ten functions wired into pi-ai at src/engine/github-tools.ts:
| Tool | Purpose |
|---|---|
github_get_repository | Repo metadata, default branch, language stats |
github_get_issue | Issue body + metadata |
github_list_issue_comments | Comments on an issue or PR |
github_list_issues | Filter by state, labels, etc. |
github_get_pull_request | PR body + metadata |
github_list_pull_requests | PR list |
github_get_pull_request_diff | The unified diff |
github_get_file_contents | File from a ref |
github_list_commits | Commit log |
github_search_issues | GitHub search API |
github_search_code | GitHub code search |
Skills (read_skill)
One tool wired in via extraTools, defined in
src/engine/chat-skills.ts:
| Tool | Purpose |
|---|---|
read_skill | Read the full SKILL.md for one of the curated chat skills. Parameters: { name: <enum of CHAT_SKILL_NAMES> }. |
The chat agent’s system prompt contains an XML <available_skills>
catalogue (name + description per curated skill — same shape
pi-coding-agent emits for sandbox phases). When a user’s request
matches a skill’s description, the agent calls read_skill to load
the body — pi’s progressive-disclosure model. See
Skills §Chat path.
No bash, no edit, no write, no MCP. Chat physically cannot
modify code or open issues. A user asking chat to “fix that bug” is
gently redirected to the build workflow path, which goes through the
Router classifier and dispatches via
Workflow Engine.
Tool execution loop (chat-runner.ts dispatchTool): the model emits
a toolCall, the runner tries the github toolset first, then the
extra (read_skill) toolset; the JSON result is appended to context
and the loop repeats. Capped at MAX_TOOL_ROUNDS = 8 — hitting the
limit ends the turn with finishReason: "max-rounds".
No sandbox — implications
Chat runs in the harness process itself. Real consequences:
- Shared memory and env. A pi-ai memory blow-up takes the harness with it. Production deployments should size the host accordingly.
- No filesystem isolation. Chat tools are network-only (GitHub API); the agent has no file-write capability. The sandbox-less design doesn’t grant filesystem access — it just doesn’t fence it off.
- Lowest possible latency. No container spin-up, no VM boot, no per-turn workspace clone. A turn is roughly one HTTP round-trip plus the LLM call.
- Same crash blast radius as the rest of the harness. A pi-ai error is a harness error — surfaced via the same logs, recovered by the same supervisor.
System prompt
Built once at boot (src/index.ts):
systemPrompt = loadAgentContext() + CHAT_SYSTEM_SUFFIX + chatSkills.catalogueXml
Three layers:
loadAgentContext()(src/engine/profiles.ts) concatenates all.mdfiles underagent-context/in alphabetical order, joined with\n\n---\n\n(see Skills §AGENTS.md).CHAT_SYSTEM_SUFFIX(src/engine/chat.ts) adds the chat-specific constraints — read-only tools, no write actions, hand off to the build workflow for code changes — so the same persona file (soul.md) can serve both surfaces without contradicting itself.chatSkills.catalogueXml(src/engine/chat-skills.ts → loadChatSkillCatalogue) is the XML<available_skills>block listing each curated chat skill’s name + description. Mirrors the catalogue pi-coding-agent emits for sandbox phases. The agent uses it to decide whichread_skillcall (if any) to make.
The curated skill list is CHAT_SKILL_NAMES — currently ["chat", "issue-triage", "pr-review", "repo-health"]. v1 is hard-coded; lift
to env / settings if it ever needs runtime configurability.
LLM provider routing
Same providers as the sandbox path. Model and reasoning effort resolve via:
- Model:
resolveModel(config.models, "chat")→config.models.chatorconfig.models.defaultor the globalLASTLIGHT_MODEL. - Thinking:
resolveVariant(config.variants, "chat")→config.variants.chator the globalLASTLIGHT_THINKING.
Provider keys (ANTHROPIC_API_KEY etc.) are read from the harness’s
own env — chat doesn’t need the sandbox’s forwarding dance.
Session reset and status
Two adjacent skills routed by the Router:
chat-reset(src/index.ts:654–661) — deactivates the current messaging session (session-manager.ts:206). The next user message starts a new pi-ai session with empty history. Confirmation is sent viaenvelope.reply().status-report(src/index.ts:664–675) — lists currently running executions. Not a pi-ai call at all — it queries the State directly and replies with a formatted summary.
Both are harness-level skills, not pi-ai tools — they need DB write or admin-level state access that read-only chat tools cannot provide.
JSONL log
Chat turns log the same way sandboxed phases do (see State):
- One JSONL file per Slack thread, at
$STATE_DIR/agent-sessions/projects/-app/<agentSessionId>.jsonl. - Each turn emits assistant + tool-result envelopes plus a final
resultenvelope with cost / token stats. - The dashboard’s
ChatSessionReaderlooks up theagent_session_idfrommessaging_sessionsand reads the single file. It does not scan the-app/directory blindly — that would return JSONL from every Slack thread mixed together.
Concurrency
chains: Map<sessionId, Promise> in ChatRunner (chat-runner.ts:86, 115–127)
serializes turns on a single Slack thread — two messages arriving in
the same thread within milliseconds are guaranteed to run one after
the other. Different threads run in parallel without bound.
A turn that throws still resolves the chain promise (in a finally)
so the next turn isn’t blocked by a prior crash.
Invariants
- Chat is read-only on the world. Every tool is a GET. Inserts
into
messaging_messagesare the only writes chat makes, and they go through the session manager — not the agent’s tool surface. - Same Slack thread → same agent session id. Always. A reset is the only way to get a new id for an existing thread.
- Tool rounds are capped. Eight is enough; a chat that wants to exceed this should be redirected to a workflow.
- History is a rolling 50-message window. No token-aware truncation. A re-implementation that adds it should be careful to preserve assistant ↔ user pairing.
- Screened messages reach chat with a flag, not a block. A
[lastlight-flag: ...]prefix on the user content tells the agent to treat it as data peragent-context/security.md. Chat does not refuse flagged content; it processes it with appropriate skepticism. - The system prompt is constructed once. A change to
agent-context/*.mddoes not propagate until the harness restarts.
Current implementation
| Piece | File |
|---|---|
ChatRunner class | src/engine/chat-runner.ts |
| System prompt assembly, screening | src/engine/chat.ts |
| Read-only GitHub tools | src/engine/github-tools.ts |
| Session manager + DB | src/connectors/messaging/session-manager.ts |
chat-reset handler | src/index.ts:654–661 |
status-report handler | src/index.ts:664–675 |
| Dashboard reader | src/admin/ChatSessionReader.ts |
Rebuild notes
- Two runtimes, one persona file. The same
agent-context/*.mddrives both chat and workflows. A re-implementation that bifurcates the persona will drift quickly. - In-process for chat is the right call. Container spin-up per turn would dwarf the LLM call latency. The trade-off is shared blast radius, which is acceptable for a read-only surface.
- Resist adding write tools to chat. “Just one tool to create the issue” is how surfaces drift. The contract — chat asks questions, workflows do work — keeps both clean.
- Per-thread serialisation is required. Two simultaneous turns on
one Slack thread would corrupt session state. The
chainsmap is load-bearing. - Rolling history window over token budget — for now. The 50-message window is simple and predictable. Switching to a token-budgeted approach is fine but needs care around partial assistant messages.
- JSONL is shared infrastructure. Both surfaces write to the same shim, the same envelope format, the same project-slug convention. A re-implementation that gives chat its own log format makes the dashboard harder.