Guardrails

Test a guardrail before you trust it

The Test dialog runs your profile against any phrase you paste. You see which rule fired, the action, and the matched span — all in under a second.

What you'll learn
  • Where the Test dialog lives and what it shows
  • How to verify a block, redact, or warn
  • How to iterate when a phrase should match but does not
  • How to test edge cases before promoting to agents

Open the Test dialog

  1. 1

    Open the profile

    In Guardrails, click the profile you want to test.
  2. 2

    Click Test

    A dialog opens with a text area, a Run button, and a results panel.
  3. 3

    Paste a phrase

    Paste a sample prompt, tool argument, or response. Realistic strings test better than synthetic ones — pull from past runs if you have them.

Read the result

Each run returns a verdict and a breakdown.
  1. 1

    Verdict

    Pass, Warn, Redact, or Block. The verdict is the strongest action triggered by any matching rule.
  2. 2

    Matching rules

    Every rule that fired is listed with its category (topic, regex, sensitive-data, plugin), the matched span, and its individual action.
  3. 3

    Latency

    Inspection time in milliseconds, broken down per rule. Use this to spot slow plugins before they hit production.

Refine until the verdict is right

  1. 1

    Phrase should block but does not

    Add a closer example phrase to the matching topic, or tighten the regex. Re-run. Keep iterating until the verdict is Block.
  2. 2

    Phrase blocks but should not

    You have a false positive. Loosen the regex, or remove the over-broad example phrase from the topic. Test a benign neighboring phrase to confirm.
  3. 3

    Build a regression list

    Keep a list of representative phrases — known-bad, known-good, edge cases. Re-run them every time you edit the profile. Catches regressions instantly.

Frequently asked questions

Does the Test dialog count against my LLM quota?
Inspection runs locally inside the guardrail engine. Topic detection uses the platform classifier; only plugins that call external moderation APIs incur a per-call cost. Plugin cost is reported in the latency breakdown.
Can I batch-test many phrases at once?
The dialog accepts one phrase at a time today. For batch evaluation, use Eval with a guardrail dataset — it scores the profile across an entire set and reports precision and recall.
How is the Test dialog different from agent testing?
Agent testing runs the full agent loop — model, tools, guardrails — against a prompt. The guardrail Test dialog isolates just the guardrail layer, so you can iterate on rules without paying for model calls.
Do I need to re-test after attaching the profile to an agent?
Yes. Run the agent in Test mode with a few representative prompts to confirm the guardrail behaves the same in the full execution path. Tool responses and model outputs sometimes trigger rules that an isolated input did not.