Guardrails

Configure safety controls, content moderation, token budgets, and prompt injection defense for your AI agents.

5 min readAI Agents

Guardrails

Guardrails are safety controls that protect your AI agents from misuse, enforce token budgets, and ensure responses meet your quality standards. Every project can configure a guardrail policy that applies to all agents.

Overview

ThinkFleet's guardrail system operates at three stages of every agent interaction:

User Message → [Input Guardrails] → Agent Processing → [Output Guardrails] → Response
                                          ↓
                                   [Tool Guardrails]
Stage What It Checks
Input Content moderation, prompt injection defense, token budget
Tool Tool restriction enforcement
Output Content moderation, PII detection and redaction

Input Moderation

Input moderation scans user messages before they reach the agent. Messages that violate your policy are blocked before any tokens are consumed.

Sensitivity Levels

Level Behavior
Low Only blocks clearly harmful content (violence, illegal activity)
Medium Blocks harmful content plus explicit material and harassment
High Blocks all of the above plus borderline or ambiguous content

Actions

When a violation is detected, you can configure one of three actions:

  • Block — Reject the message entirely and return a configurable error message
  • Flag — Allow the message but log an audit event for review
  • Redact — Remove the violating content and pass the sanitized message to the agent

Prompt Injection Defense

Prompt injection attacks attempt to override the agent's system prompt with malicious instructions embedded in user messages. ThinkFleet defends against this with a layered approach:

  1. Pattern matching — Detects common injection patterns like "ignore previous instructions", "you are now", and system prompt extraction attempts
  2. LLM classification — For messages that pass pattern matching, a lightweight classifier evaluates whether the message contains adversarial intent

When an injection attempt is detected, the message is blocked and an audit event is recorded.

Output Moderation

Output moderation scans agent responses before they are delivered to the user.

PII Detection

ThinkFleet can detect and redact personally identifiable information in agent responses:

PII Type Example Redacted As
Social Security Numbers 123-45-6789 [SSN REDACTED]
Credit Card Numbers 4111-1111-1111-1111 [CARD REDACTED]
Phone Numbers (555) 123-4567 [PHONE REDACTED]
Email Addresses user@example.com [EMAIL REDACTED]

PII detection uses regex pattern matching for high-confidence detection with minimal latency overhead.

Token Budgets

Token budgets prevent runaway costs by capping how many tokens an agent can consume.

Budget Levels

Level Scope
Per Message Maximum tokens for a single agent response
Per Session Maximum tokens across an entire conversation session
Per Day Maximum tokens per user per calendar day

When a budget is exceeded, the agent returns a friendly message explaining the limit has been reached. Token usage is tracked in the token_usage_daily table and visible in the Observability dashboard.

Monitoring Usage

Navigate to Settings → Guardrails → Token Usage to view:

  • Daily token consumption by user
  • Trend charts over time
  • Users approaching their daily limits

Execution Timeout

Set a maximum duration for agent processing. If an agent takes longer than the configured timeout (in seconds), the request is aborted and the user receives a timeout error.

This prevents:

  • Infinite loops in tool chains
  • Hanging requests from unresponsive external services
  • Excessive token consumption from overly long reasoning chains

Tool Restrictions

Control which tools agents are allowed to use. You can maintain an allowlist or blocklist of tool names.

Use Cases

  • Prevent agents from accessing sensitive tools (e.g., database write operations) in production
  • Restrict demo agents to read-only tools
  • Limit specific agents to their designated toolset

Audit Trail

Every guardrail action is recorded as an audit event:

Event Description
guardrail.violation A message was blocked, flagged, or redacted
guardrail.token_budget_exceeded A user hit their token budget
guardrail.timeout An agent execution timed out

These events are visible in the Observability dashboard under the Alerts tab.

Configuration

Navigate to Settings → Guardrails in your project to configure:

  1. Input Moderation — Toggle on/off, set sensitivity level and action
  2. Output Moderation — Toggle on/off, set sensitivity level and action
  3. Prompt Injection Defense — Toggle on/off
  4. PII Detection — Toggle on/off, select which PII types to detect
  5. Token Budget — Set per-message, per-session, and per-day limits
  6. Execution Timeout — Set timeout in seconds (default: 120)
  7. Tool Restrictions — Add tools to the allowlist or blocklist

Best Practices

  • Start with Medium sensitivity and adjust based on false positive rates
  • Enable PII detection for any agent that handles customer data
  • Set daily token budgets to prevent unexpected costs during development
  • Use tool restrictions for production agents — only expose the tools they need
  • Monitor the audit trail weekly to catch emerging patterns

Next Steps

  • Agent Memory — Configure per-user memory
  • Crews — Multi-agent orchestration
  • MCP — Connect external tools