Docs
Observability
Last Light has first-class OpenTelemetry support: distributed traces and metrics across every workflow run, phase, agent execution, PI event stream, and chat turn. It's disabled by default — nothing leaves the harness until you opt in.
Enabling it
Set LASTLIGHT_OTEL_ENABLED=true and point the standard OTLP
exporter env vars at your collector:
LASTLIGHT_OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector.example.com
OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer <token>
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod
The export uses the standard OTLP/HTTP protocol, so any OTEL-compatible
backend works — Grafana Tempo/Mimir, Honeycomb, Datadog, Jaeger,
Grafana Cloud, or a self-hosted OpenTelemetry Collector. Every
LASTLIGHT_OTEL_* and OTEL_* variable is listed on the
Configuration page.
OTEL_* env vars alone does not turn
telemetry on — LASTLIGHT_OTEL_ENABLED=true is the master switch.
This keeps a stray OTEL_EXPORTER_OTLP_ENDPOINT in your
environment from silently shipping data.
What's exported
Instrumentation spans the whole execution path so you can trace a single issue or PR from event to outcome:
- Traces — a span per workflow phase (
lastlight.workflow.phase) and per agent execution (lastlight.agent.execute), tagged with the workflow name, phase, repo, issue number, model, sandbox backend, and stop reason. - Metrics — execution duration, cost (USD), input/output tokens, run and error counts, dimensioned by workflow, phase, repo, model, backend, and success/stop-reason.
- PI event streams — the agent's own tool-use / text / error events, recorded as span events.
- Chat turns — each in-process Slack chat turn, alongside the same execution metrics.
Content redaction
By default Last Light exports metadata only: workflow and phase names, repo, sandbox backend, model, success / stop reason, timing, token counts, and cost. Prompt text, message content, tool-call arguments, and tool outputs are redacted.
Set LASTLIGHT_OTEL_INCLUDE_CONTENT=true to include that content
(truncated) — useful for debugging agent behaviour, but it can export
sensitive data.
OTEL_EXPORTER_OTLP_HEADERS) are always
treated as secrets and never appear in the dashboard Config tab.
Sandbox telemetry
With LASTLIGHT_OTEL_FORWARD_TO_SANDBOX=true (the default), the
agent emits its own telemetry from inside each workflow sandbox too — not
just the harness. How that reaches your collector depends on the
sandbox backend:
docker backend — in-network collector
Docker sandboxes run with a default-deny egress firewall, so they can't dial an arbitrary collector directly. Instead, Last Light runs a small OpenTelemetry Collector inside the sandbox network. Sandboxes export to it by a fixed internal address; it re-exports to your real backend over a separate, trusted outbound network:
sandbox ──OTLP──▶ in-network collector ──OTLP──▶ your backend
(fixed internal IP) (trusted outbound leg) This design has three properties worth calling out:
- Secrets stay host-side. The real backend endpoint and its auth headers live only on the collector's config (on the host). They are never forwarded into the untrusted sandbox.
- Any backend works. The sandbox only ever dials one fixed internal endpoint, so a collector on any port or scheme (e.g.
https://collector:4318) just works — no egress-allowlist or firewall changes needed. - No new exfil surface. The collector only ever forwards to its configured backend; sandbox traffic can't redirect where it exports, only influence the span content sent to your own backend.
gondolin / none backends
On the gondolin (micro-VM) and none backends the agent
runs in the harness process and already inherits the harness's OTEL setup, so
the allowlisted OTEL_* env vars are forwarded directly. The
gondolin VM additionally adds your collector hosts — parsed from
the endpoint vars and LASTLIGHT_OTEL_COLLECTOR_HOSTS — to its
egress allowlist. Private and cloud-metadata hosts remain blocked.
To keep all telemetry harness-only and never touch the sandbox, set
LASTLIGHT_OTEL_FORWARD_TO_SANDBOX=false.
Failure handling
If the OTEL SDK fails to initialise (bad endpoint, unreachable collector),
Last Light logs a warning and continues without telemetry —
observability never takes the harness down. Set
LASTLIGHT_OTEL_STRICT=true to make such a failure fatal at
startup instead, which is useful when telemetry is a hard requirement.