Guardrails
Configure safety controls, content moderation, token budgets, and prompt injection defense for your AI agents.
Guardrails
Guardrails are safety controls that protect your AI agents from misuse, enforce token budgets, and ensure responses meet your quality standards. Every project can configure a guardrail policy that applies to all agents.
Overview
ThinkFleet's guardrail system operates at three stages of every agent interaction:
User Message → [Input Guardrails] → Agent Processing → [Output Guardrails] → Response
↓
[Tool Guardrails]
| Stage | What It Checks |
|---|---|
| Input | Content moderation, prompt injection defense, token budget |
| Tool | Tool restriction enforcement |
| Output | Content moderation, PII detection and redaction |
Input Moderation
Input moderation scans user messages before they reach the agent. Messages that violate your policy are blocked before any tokens are consumed.
Sensitivity Levels
| Level | Behavior |
|---|---|
| Low | Only blocks clearly harmful content (violence, illegal activity) |
| Medium | Blocks harmful content plus explicit material and harassment |
| High | Blocks all of the above plus borderline or ambiguous content |
Actions
When a violation is detected, you can configure one of three actions:
- Block — Reject the message entirely and return a configurable error message
- Flag — Allow the message but log an audit event for review
- Redact — Remove the violating content and pass the sanitized message to the agent
Prompt Injection Defense
Prompt injection attacks attempt to override the agent's system prompt with malicious instructions embedded in user messages. ThinkFleet defends against this with a layered approach:
- Pattern matching — Detects common injection patterns like "ignore previous instructions", "you are now", and system prompt extraction attempts
- LLM classification — For messages that pass pattern matching, a lightweight classifier evaluates whether the message contains adversarial intent
When an injection attempt is detected, the message is blocked and an audit event is recorded.
Output Moderation
Output moderation scans agent responses before they are delivered to the user.
PII Detection
ThinkFleet can detect and redact personally identifiable information in agent responses:
| PII Type | Example | Redacted As |
|---|---|---|
| Social Security Numbers | 123-45-6789 | [SSN REDACTED] |
| Credit Card Numbers | 4111-1111-1111-1111 | [CARD REDACTED] |
| Phone Numbers | (555) 123-4567 | [PHONE REDACTED] |
| Email Addresses | user@example.com | [EMAIL REDACTED] |
PII detection uses regex pattern matching for high-confidence detection with minimal latency overhead.
Token Budgets
Token budgets prevent runaway costs by capping how many tokens an agent can consume.
Budget Levels
| Level | Scope |
|---|---|
| Per Message | Maximum tokens for a single agent response |
| Per Session | Maximum tokens across an entire conversation session |
| Per Day | Maximum tokens per user per calendar day |
When a budget is exceeded, the agent returns a friendly message explaining the limit has been reached. Token usage is tracked in the token_usage_daily table and visible in the Observability dashboard.
Monitoring Usage
Navigate to Settings → Guardrails → Token Usage to view:
- Daily token consumption by user
- Trend charts over time
- Users approaching their daily limits
Execution Timeout
Set a maximum duration for agent processing. If an agent takes longer than the configured timeout (in seconds), the request is aborted and the user receives a timeout error.
This prevents:
- Infinite loops in tool chains
- Hanging requests from unresponsive external services
- Excessive token consumption from overly long reasoning chains
Tool Restrictions
Control which tools agents are allowed to use. You can maintain an allowlist or blocklist of tool names.
Use Cases
- Prevent agents from accessing sensitive tools (e.g., database write operations) in production
- Restrict demo agents to read-only tools
- Limit specific agents to their designated toolset
Audit Trail
Every guardrail action is recorded as an audit event:
| Event | Description |
|---|---|
guardrail.violation |
A message was blocked, flagged, or redacted |
guardrail.token_budget_exceeded |
A user hit their token budget |
guardrail.timeout |
An agent execution timed out |
These events are visible in the Observability dashboard under the Alerts tab.
Configuration
Navigate to Settings → Guardrails in your project to configure:
- Input Moderation — Toggle on/off, set sensitivity level and action
- Output Moderation — Toggle on/off, set sensitivity level and action
- Prompt Injection Defense — Toggle on/off
- PII Detection — Toggle on/off, select which PII types to detect
- Token Budget — Set per-message, per-session, and per-day limits
- Execution Timeout — Set timeout in seconds (default: 120)
- Tool Restrictions — Add tools to the allowlist or blocklist
Best Practices
- Start with Medium sensitivity and adjust based on false positive rates
- Enable PII detection for any agent that handles customer data
- Set daily token budgets to prevent unexpected costs during development
- Use tool restrictions for production agents — only expose the tools they need
- Monitor the audit trail weekly to catch emerging patterns
Next Steps
- Agent Memory — Configure per-user memory
- Crews — Multi-agent orchestration
- MCP — Connect external tools