Docs

Observability

Last Light has first-class OpenTelemetry support: distributed traces and metrics across every workflow run, phase, agent execution, PI event stream, and chat turn. It's disabled by default — nothing leaves the harness until you opt in.

Enabling it

Set LASTLIGHT_OTEL_ENABLED=true and point the standard OTLP exporter env vars at your collector:

LASTLIGHT_OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector.example.com
OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer <token>
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod

The export uses the standard OTLP/HTTP protocol, so any OTEL-compatible backend works — Grafana Tempo/Mimir, Honeycomb, Datadog, Jaeger, Grafana Cloud, or a self-hosted OpenTelemetry Collector. Every LASTLIGHT_OTEL_* and OTEL_* variable is listed on the Configuration page.

Setting OTEL_* env vars alone does not turn telemetry on — LASTLIGHT_OTEL_ENABLED=true is the master switch. This keeps a stray OTEL_EXPORTER_OTLP_ENDPOINT in your environment from silently shipping data.

What's exported

Instrumentation spans the whole execution path so you can trace a single issue or PR from event to outcome:

  • Traces — a span per workflow phase (lastlight.workflow.phase) and per agent execution (lastlight.agent.execute), tagged with the workflow name, phase, repo, issue number, model, sandbox backend, and stop reason.
  • Metrics — execution duration, cost (USD), input/output tokens, run and error counts, dimensioned by workflow, phase, repo, model, backend, and success/stop-reason.
  • PI event streams — the agent's own tool-use / text / error events, recorded as span events.
  • Chat turns — each in-process Slack chat turn, alongside the same execution metrics.

Content redaction

By default Last Light exports metadata only: workflow and phase names, repo, sandbox backend, model, success / stop reason, timing, token counts, and cost. Prompt text, message content, tool-call arguments, and tool outputs are redacted.

Set LASTLIGHT_OTEL_INCLUDE_CONTENT=true to include that content (truncated) — useful for debugging agent behaviour, but it can export sensitive data.

Only enable content with a trusted collector. It can put source code, secrets that appear in tool output, and full prompts on the wire. Auth headers (OTEL_EXPORTER_OTLP_HEADERS) are always treated as secrets and never appear in the dashboard Config tab.

Sandbox telemetry

With LASTLIGHT_OTEL_FORWARD_TO_SANDBOX=true (the default), the agent emits its own telemetry from inside each workflow sandbox too — not just the harness. How that reaches your collector depends on the sandbox backend:

docker backend — in-network collector

Docker sandboxes run with a default-deny egress firewall, so they can't dial an arbitrary collector directly. Instead, Last Light runs a small OpenTelemetry Collector inside the sandbox network. Sandboxes export to it by a fixed internal address; it re-exports to your real backend over a separate, trusted outbound network:

sandbox ──OTLP──▶ in-network collector ──OTLP──▶ your backend
              (fixed internal IP)        (trusted outbound leg)

This design has three properties worth calling out:

  • Secrets stay host-side. The real backend endpoint and its auth headers live only on the collector's config (on the host). They are never forwarded into the untrusted sandbox.
  • Any backend works. The sandbox only ever dials one fixed internal endpoint, so a collector on any port or scheme (e.g. https://collector:4318) just works — no egress-allowlist or firewall changes needed.
  • No new exfil surface. The collector only ever forwards to its configured backend; sandbox traffic can't redirect where it exports, only influence the span content sent to your own backend.

gondolin / none backends

On the gondolin (micro-VM) and none backends the agent runs in the harness process and already inherits the harness's OTEL setup, so the allowlisted OTEL_* env vars are forwarded directly. The gondolin VM additionally adds your collector hosts — parsed from the endpoint vars and LASTLIGHT_OTEL_COLLECTOR_HOSTS — to its egress allowlist. Private and cloud-metadata hosts remain blocked.

To keep all telemetry harness-only and never touch the sandbox, set LASTLIGHT_OTEL_FORWARD_TO_SANDBOX=false.

Failure handling

If the OTEL SDK fails to initialise (bad endpoint, unreachable collector), Last Light logs a warning and continues without telemetry — observability never takes the harness down. Set LASTLIGHT_OTEL_STRICT=true to make such a failure fatal at startup instead, which is useful when telemetry is a hard requirement.