Monitor

Execution traces

Open any run to see exactly what the agent did — every prompt sent, every tool called, every decision made, with timing and cost attached to each span.

What you'll learn

What each span type represents
How to inspect LLM prompts and completions
How to read the decision tree for agent reasoning
How to diagnose errors and slow steps

Anatomy of a trace

A trace is an ordered tree of spans. The root span is the run itself; children are LLM calls, tool invocations, guardrail checks, and sub-agent calls.

1
LLM spans
Show the full prompt sent to the model, the completion received, model name, token counts, and dollar cost. Click to expand the raw text.
2
Tool spans
Show the tool name, action, typed arguments, raw response, and latency. Failed calls expose the error payload.
3
Guardrail spans
Show which guardrail ran, what it inspected, and the verdict (pass, redact, block). Blocked runs include the policy that fired.
4
Sub-agent spans
When one agent delegates to another, the child run is nested inline. Click to drill into the child trace without leaving the page.

Reading the decision tree

The decision-tree tab reconstructs the agent's reasoning loop — thought, action, observation — as a collapsible tree. Useful when you want to understand why the agent chose a tool rather than what it did. Each node links back to the underlying span.

Timing breakdown

1
Waterfall
Each span is plotted on a horizontal timeline. Long bars surface slow steps at a glance.
2
Self time vs total time
Total includes children; self excludes them. A span with high self time and low total is a leaf bottleneck — usually a slow LLM or tool.
3
Critical path
Highlight the longest dependent chain to see where to optimize first.

Error inspection

Failed runs land on the failing span by default. You will see the exception type, message, the input that triggered it, and the upstream context that led there. Use Open in Eval to add the failing input to a regression dataset so you can verify the next agent version fixes it.

Frequently asked questions

Are prompts redacted before storage?: Guardrails can redact PII before storage when the relevant policy is on. Otherwise prompts are stored verbatim and access is restricted by role.
Can I export a single trace?: Yes. The Export menu produces JSON or OpenTelemetry-compatible output you can pipe into your own tools.
Why are some spans missing token counts?: Token counts come from the model provider. Self-hosted or local models may not report them — in that case the span shows estimated counts based on the input length.
Can I leave a comment on a span?: Yes. Annotations attach to a span and show up for everyone who opens the run. Useful for marking known failure modes during eval review.

Run history

The list view that opens these traces.

Eval annotation

Label trace outputs to build a graded dataset.

Analytics

Aggregate latency and cost across runs.

Approvals

Resume runs paused at HITL gates.