Guardrails
Runtime safety for every agent action
Guardrails sit on the hot path. Each prompt, tool invocation, and model output is scanned against your active rules — abusive content, PII leaks, off-topic responses — and blocked in real time.
What you'll learn
- What a guardrail inspects and when it fires
- The four places enforcement happens — input, tool args, output, final response
- How presets compose with custom rules
- How guardrails differ from policies
What guardrails inspect
A guardrail runs on every interaction the agent has with the outside world.
- 1
Inbound prompts
User messages are scanned before they reach the model. Jailbreak attempts, abusive language, and out-of-scope topics are caught here. - 2
Tool call arguments
Arguments the agent is about to pass to a tool are inspected. Blocks malformed SQL, prompt injection in tool payloads, and PII written to logs. - 3
Tool responses
Data returned from a tool — database row, API payload, retrieved doc — is scanned before it enters the model context. Catches leaks at retrieval time. - 4
Final response
The model output is checked before delivery. Last line of defense for PII, profanity, and topic drift.
Block, redact, or warn
Each rule has an action: block halts the run with a violation event, redact masks the matching span and lets the run continue, warn records the match but takes no other action. Choose per rule based on severity.
Guardrail vs policy
A policy is structural — what the agent can do (which tools, which data, what budget). A guardrail is content-level — what the agent says or sees. They compose: a policy might allow the database tool while a guardrail redacts PII from its result.
Frequently asked questions
- Where do I attach a guardrail to an agent?
- In the agent builder, the Guardrails Configuration step. You can attach one or more guardrail profiles. They evaluate in order on every interaction.
- Does a guardrail slow down responses?
- Inspection runs in parallel with model generation where possible. Typical added latency is in the low tens of milliseconds. Plugin-backed checks (e.g. external moderation APIs) add the round-trip time of the plugin.
- Can I see what a guardrail blocked?
- Yes. Open the run in Monitor — each block is recorded with the rule that fired, the matched span, and the action taken. Filter the Audit Log by guardrail id to see history.
- Can a guardrail fail open?
- By default, a guardrail that errors during inspection fails closed — the request is blocked and a system event is logged. You can configure fail-open per profile if availability matters more than safety for that surface.