Guardrails

Configure safety controls, content moderation, token budgets, and prompt injection defense for your AI agents.

5 min readAI Agents

Guardrails

Guardrails are safety controls that protect your AI agents from misuse, enforce token budgets, and ensure responses meet your quality standards. Every project can configure a guardrail policy that applies to all agents.

Overview

ThinkFleet's guardrail system operates at three stages of every agent interaction:

User Message → [Input Guardrails] → Agent Processing → [Output Guardrails] → Response
                                          ↓
                                   [Tool Guardrails]

Stage	What It Checks
Input	Content moderation, prompt injection defense, token budget
Tool	Tool restriction enforcement
Output	Content moderation, PII detection and redaction

Input Moderation

Input moderation scans user messages before they reach the agent. Messages that violate your policy are blocked before any tokens are consumed.

Sensitivity Levels

Level	Behavior
Low	Only blocks clearly harmful content (violence, illegal activity)
Medium	Blocks harmful content plus explicit material and harassment
High	Blocks all of the above plus borderline or ambiguous content

Actions

When a violation is detected, you can configure one of three actions:

Block — Reject the message entirely and return a configurable error message
Flag — Allow the message but log an audit event for review
Redact — Remove the violating content and pass the sanitized message to the agent

Prompt Injection Defense

Prompt injection attacks attempt to override the agent's system prompt with malicious instructions embedded in user messages. ThinkFleet defends against this with a layered approach:

Pattern matching — Detects common injection patterns like "ignore previous instructions", "you are now", and system prompt extraction attempts
LLM classification — For messages that pass pattern matching, a lightweight classifier evaluates whether the message contains adversarial intent

When an injection attempt is detected, the message is blocked and an audit event is recorded.

Output Moderation

Output moderation scans agent responses before they are delivered to the user.

PII Detection

ThinkFleet can detect and redact personally identifiable information in agent responses:

PII Type	Example	Redacted As
Social Security Numbers	123-45-6789	[SSN REDACTED]
Credit Card Numbers	4111-1111-1111-1111	[CARD REDACTED]
Phone Numbers	(555) 123-4567	[PHONE REDACTED]
Email Addresses	user@example.com	[EMAIL REDACTED]

PII detection uses regex pattern matching for high-confidence detection with minimal latency overhead.

Token Budgets

Token budgets prevent runaway costs by capping how many tokens an agent can consume.

Budget Levels

Level	Scope
Per Message	Maximum tokens for a single agent response
Per Session	Maximum tokens across an entire conversation session
Per Day	Maximum tokens per user per calendar day

When a budget is exceeded, the agent returns a friendly message explaining the limit has been reached. Token usage is tracked in the token_usage_daily table and visible in the Observability dashboard.

Monitoring Usage

Navigate to Settings → Guardrails → Token Usage to view:

Daily token consumption by user
Trend charts over time
Users approaching their daily limits

Execution Timeout

Set a maximum duration for agent processing. If an agent takes longer than the configured timeout (in seconds), the request is aborted and the user receives a timeout error.

This prevents:

Infinite loops in tool chains
Hanging requests from unresponsive external services
Excessive token consumption from overly long reasoning chains

Tool Restrictions

Control which tools agents are allowed to use. You can maintain an allowlist or blocklist of tool names.

Use Cases

Prevent agents from accessing sensitive tools (e.g., database write operations) in production
Restrict demo agents to read-only tools
Limit specific agents to their designated toolset

Audit Trail

Every guardrail action is recorded as an audit event:

Event	Description
`guardrail.violation`	A message was blocked, flagged, or redacted
`guardrail.token_budget_exceeded`	A user hit their token budget
`guardrail.timeout`	An agent execution timed out

These events are visible in the Observability dashboard under the Alerts tab.

Configuration

Navigate to Settings → Guardrails in your project to configure:

Input Moderation — Toggle on/off, set sensitivity level and action
Output Moderation — Toggle on/off, set sensitivity level and action
Prompt Injection Defense — Toggle on/off
PII Detection — Toggle on/off, select which PII types to detect
Token Budget — Set per-message, per-session, and per-day limits
Execution Timeout — Set timeout in seconds (default: 120)
Tool Restrictions — Add tools to the allowlist or blocklist

Best Practices

Start with Medium sensitivity and adjust based on false positive rates
Enable PII detection for any agent that handles customer data
Set daily token budgets to prevent unexpected costs during development
Use tool restrictions for production agents — only expose the tools they need
Monitor the audit trail weekly to catch emerging patterns

Next Steps

Agent Memory — Configure per-user memory
Crews — Multi-agent orchestration
MCP — Connect external tools