Guardrails

Runtime safety for every agent action

Guardrails sit on the hot path. Each prompt, tool invocation, and model output is scanned against your active rules — abusive content, PII leaks, off-topic responses — and blocked in real time.

What you'll learn

What a guardrail inspects and when it fires
The four places enforcement happens — input, tool args, output, final response
How presets compose with custom rules
How guardrails differ from policies

What guardrails inspect

A guardrail runs on every interaction the agent has with the outside world.

1
Inbound prompts
User messages are scanned before they reach the model. Jailbreak attempts, abusive language, and out-of-scope topics are caught here.
2
Tool call arguments
Arguments the agent is about to pass to a tool are inspected. Blocks malformed SQL, prompt injection in tool payloads, and PII written to logs.
3
Tool responses
Data returned from a tool — database row, API payload, retrieved doc — is scanned before it enters the model context. Catches leaks at retrieval time.
4
Final response
The model output is checked before delivery. Last line of defense for PII, profanity, and topic drift.

Block, redact, or warn

Each rule has an action: block halts the run with a violation event, redact masks the matching span and lets the run continue, warn records the match but takes no other action. Choose per rule based on severity.

Guardrail vs policy

A policy is structural — what the agent can do (which tools, which data, what budget). A guardrail is content-level — what the agent says or sees. They compose: a policy might allow the database tool while a guardrail redacts PII from its result.

Frequently asked questions

Where do I attach a guardrail to an agent?: In the agent builder, the Guardrails Configuration step. You can attach one or more guardrail profiles. They evaluate in order on every interaction.
Does a guardrail slow down responses?: Inspection runs in parallel with model generation where possible. Typical added latency is in the low tens of milliseconds. Plugin-backed checks (e.g. external moderation APIs) add the round-trip time of the plugin.
Can I see what a guardrail blocked?: Yes. Open the run in Monitor — each block is recorded with the rule that fired, the matched span, and the action taken. Filter the Audit Log by guardrail id to see history.
Can a guardrail fail open?: By default, a guardrail that errors during inspection fails closed — the request is blocked and a system event is logged. You can configure fail-open per profile if availability matters more than safety for that surface.

Guardrail presets

Standard, PII, Content Filter, Custom.

Create a guardrail

Author rules end-to-end.

Testing guardrails

Verify a phrase is blocked.

Policies overview

Structural governance that pairs with guardrails.