Knowledge Base Overview

Learn how the ThinkFleet knowledge base works — RAG-powered document search for your AI agents.

5 min readKnowledge Base

Knowledge Base Overview

The Knowledge Base is ThinkFleet's built-in Retrieval-Augmented Generation (RAG) system. It lets you upload documents that your AI agents can search and reference when answering questions, ensuring responses are grounded in your actual data rather than general knowledge.

How It Works

The RAG Pipeline

Upload Document
    │
    ▼
Parse & Extract Text
    │
    ▼
Split into Chunks
    │
    ▼
Generate Embeddings (vectors)
    │
    ▼
Store in pgvector
    │
    ▼
Ready for Search

When an agent needs information:

User Question
    │
    ▼
Generate Query Embedding
    │
    ▼
Vector Similarity Search
    │
    ▼
Return Top-K Relevant Chunks
    │
    ▼
Include in Agent Context
    │
    ▼
LLM Generates Grounded Response

Key Components

Component Technology Purpose
Document Parser Built-in parsers Extract text from PDF, DOCX, TXT, HTML, Markdown
Chunking Engine Recursive text splitter Break documents into searchable segments
Embedding Model OpenAI or configurable Convert text to vector representations
Vector Store pgvector (PostgreSQL) Store and search embeddings efficiently

Supported Document Types

Format Extension Notes
PDF .pdf Text extraction; scanned PDFs require OCR
Word .docx Full formatting preserved during extraction
Plain Text .txt Direct ingestion
Markdown .md Headers used as natural chunk boundaries
HTML .html Tags stripped, text extracted
CSV .csv Each row can become a separate chunk

Architecture

ThinkFleet's knowledge base runs entirely on your existing PostgreSQL database using the pgvector extension. This means:

  • No additional infrastructure — No separate vector database to manage
  • Consistent backups — Your knowledge base is backed up with your regular database
  • Transaction safety — Document operations are ACID-compliant
  • Cost-effective — No extra service costs

Database Tables

Table Purpose
knowledge_base Stores knowledge base metadata (name, project, settings)
knowledge_base_document Tracks uploaded documents and processing status
knowledge_base_chunk Stores document chunks with vector embeddings

Creating a Knowledge Base

  1. Navigate to Knowledge Base in the sidebar
  2. Click New Knowledge Base
  3. Enter a name (e.g., "Product Documentation")
  4. Configure settings:
    • Chunk Size: Target size for each text chunk (default: 500 tokens)
    • Chunk Overlap: Overlap between consecutive chunks (default: 50 tokens)
    • Embedding Model: Select the embedding model

Chunking Strategy

Chunking determines how documents are split for search. The right chunk size depends on your content:

Content Type Recommended Chunk Size Overlap
FAQs 200-300 tokens 20 tokens
Technical docs 500-800 tokens 50 tokens
Legal documents 800-1200 tokens 100 tokens
General articles 400-600 tokens 50 tokens

Smaller chunks = more precise search results, but less context per result Larger chunks = more context per result, but may include irrelevant content

Chunk Overlap

Overlap ensures that information at chunk boundaries isn't lost. If a key sentence spans two chunks, the overlap means it appears in both.

How Search Works

When an agent queries the knowledge base:

  1. The query text is converted to a vector embedding
  2. pgvector performs a cosine similarity search against all chunks
  3. The top-K most similar chunks are returned
  4. Chunks are ranked by relevance score (0.0 to 1.0)

Search Parameters

Parameter Description Default
Top K Number of chunks to return 5
Similarity Threshold Minimum relevance score 0.7
Max Tokens Maximum tokens across all returned chunks 2000

Search Quality Tips

  1. Use descriptive document titles — They're included in chunk metadata
  2. Structure documents with headers — Headers create natural chunk boundaries
  3. Remove boilerplate — Copyright notices, headers/footers reduce search quality
  4. Keep content focused — One topic per document performs better than catch-all documents

Connecting to Agents

To give an agent access to a knowledge base:

  1. Open the agent's settings
  2. Go to the Knowledge Base tab
  3. Select one or more knowledge bases
  4. The agent will automatically search them when answering questions

You can also instruct the agent in its system prompt:

Always search the knowledge base before answering product questions.
If you find relevant information, cite the document name in your response.
If no relevant results are found, say "I don't have information about that
in my documentation" rather than guessing.

Monitoring

Document Status

Track document processing in the Knowledge Base dashboard:

Status Description
Processing Document is being parsed and chunked
Ready Document is fully indexed and searchable
Error Processing failed (check error details)

Search Analytics

Monitor how your knowledge base is being used:

  • Query volume — How many searches per day
  • Average relevance — How well results match queries
  • No-result queries — Queries that returned no relevant chunks (indicates content gaps)

Next Steps