Guardrails

Runtime safety for every agent action

Guardrails sit on the hot path. Each prompt, tool invocation, and model output is scanned against your active rules — abusive content, PII leaks, off-topic responses — and blocked in real time.

What you'll learn
  • What a guardrail inspects and when it fires
  • The four places enforcement happens — input, tool args, output, final response
  • How presets compose with custom rules
  • How guardrails differ from policies

What guardrails inspect

A guardrail runs on every interaction the agent has with the outside world.
  1. 1

    Inbound prompts

    User messages are scanned before they reach the model. Jailbreak attempts, abusive language, and out-of-scope topics are caught here.
  2. 2

    Tool call arguments

    Arguments the agent is about to pass to a tool are inspected. Blocks malformed SQL, prompt injection in tool payloads, and PII written to logs.
  3. 3

    Tool responses

    Data returned from a tool — database row, API payload, retrieved doc — is scanned before it enters the model context. Catches leaks at retrieval time.
  4. 4

    Final response

    The model output is checked before delivery. Last line of defense for PII, profanity, and topic drift.

Block, redact, or warn

Each rule has an action: block halts the run with a violation event, redact masks the matching span and lets the run continue, warn records the match but takes no other action. Choose per rule based on severity.

Guardrail vs policy

A policy is structural — what the agent can do (which tools, which data, what budget). A guardrail is content-level — what the agent says or sees. They compose: a policy might allow the database tool while a guardrail redacts PII from its result.

Frequently asked questions

Where do I attach a guardrail to an agent?
In the agent builder, the Guardrails Configuration step. You can attach one or more guardrail profiles. They evaluate in order on every interaction.
Does a guardrail slow down responses?
Inspection runs in parallel with model generation where possible. Typical added latency is in the low tens of milliseconds. Plugin-backed checks (e.g. external moderation APIs) add the round-trip time of the plugin.
Can I see what a guardrail blocked?
Yes. Open the run in Monitor — each block is recorded with the rule that fired, the matched span, and the action taken. Filter the Audit Log by guardrail id to see history.
Can a guardrail fail open?
By default, a guardrail that errors during inspection fails closed — the request is blocked and a system event is logged. You can configure fail-open per profile if availability matters more than safety for that surface.