Knowledge Base Overview
Learn how the ThinkFleet knowledge base works — RAG-powered document search for your AI agents.
Knowledge Base Overview
The Knowledge Base is ThinkFleet's built-in Retrieval-Augmented Generation (RAG) system. It lets you upload documents that your AI agents can search and reference when answering questions, ensuring responses are grounded in your actual data rather than general knowledge.
How It Works
The RAG Pipeline
Upload Document
│
▼
Parse & Extract Text
│
▼
Split into Chunks
│
▼
Generate Embeddings (vectors)
│
▼
Store in pgvector
│
▼
Ready for Search
When an agent needs information:
User Question
│
▼
Generate Query Embedding
│
▼
Vector Similarity Search
│
▼
Return Top-K Relevant Chunks
│
▼
Include in Agent Context
│
▼
LLM Generates Grounded Response
Key Components
| Component | Technology | Purpose |
|---|---|---|
| Document Parser | Built-in parsers | Extract text from PDF, DOCX, TXT, HTML, Markdown |
| Chunking Engine | Recursive text splitter | Break documents into searchable segments |
| Embedding Model | OpenAI or configurable | Convert text to vector representations |
| Vector Store | pgvector (PostgreSQL) | Store and search embeddings efficiently |
Supported Document Types
| Format | Extension | Notes |
|---|---|---|
.pdf |
Text extraction; scanned PDFs require OCR | |
| Word | .docx |
Full formatting preserved during extraction |
| Plain Text | .txt |
Direct ingestion |
| Markdown | .md |
Headers used as natural chunk boundaries |
| HTML | .html |
Tags stripped, text extracted |
| CSV | .csv |
Each row can become a separate chunk |
Architecture
ThinkFleet's knowledge base runs entirely on your existing PostgreSQL database using the pgvector extension. This means:
- No additional infrastructure — No separate vector database to manage
- Consistent backups — Your knowledge base is backed up with your regular database
- Transaction safety — Document operations are ACID-compliant
- Cost-effective — No extra service costs
Database Tables
| Table | Purpose |
|---|---|
knowledge_base |
Stores knowledge base metadata (name, project, settings) |
knowledge_base_document |
Tracks uploaded documents and processing status |
knowledge_base_chunk |
Stores document chunks with vector embeddings |
Creating a Knowledge Base
- Navigate to Knowledge Base in the sidebar
- Click New Knowledge Base
- Enter a name (e.g., "Product Documentation")
- Configure settings:
- Chunk Size: Target size for each text chunk (default: 500 tokens)
- Chunk Overlap: Overlap between consecutive chunks (default: 50 tokens)
- Embedding Model: Select the embedding model
Chunking Strategy
Chunking determines how documents are split for search. The right chunk size depends on your content:
| Content Type | Recommended Chunk Size | Overlap |
|---|---|---|
| FAQs | 200-300 tokens | 20 tokens |
| Technical docs | 500-800 tokens | 50 tokens |
| Legal documents | 800-1200 tokens | 100 tokens |
| General articles | 400-600 tokens | 50 tokens |
Smaller chunks = more precise search results, but less context per result Larger chunks = more context per result, but may include irrelevant content
Chunk Overlap
Overlap ensures that information at chunk boundaries isn't lost. If a key sentence spans two chunks, the overlap means it appears in both.
Search
How Search Works
When an agent queries the knowledge base:
- The query text is converted to a vector embedding
- pgvector performs a cosine similarity search against all chunks
- The top-K most similar chunks are returned
- Chunks are ranked by relevance score (0.0 to 1.0)
Search Parameters
| Parameter | Description | Default |
|---|---|---|
| Top K | Number of chunks to return | 5 |
| Similarity Threshold | Minimum relevance score | 0.7 |
| Max Tokens | Maximum tokens across all returned chunks | 2000 |
Search Quality Tips
- Use descriptive document titles — They're included in chunk metadata
- Structure documents with headers — Headers create natural chunk boundaries
- Remove boilerplate — Copyright notices, headers/footers reduce search quality
- Keep content focused — One topic per document performs better than catch-all documents
Connecting to Agents
To give an agent access to a knowledge base:
- Open the agent's settings
- Go to the Knowledge Base tab
- Select one or more knowledge bases
- The agent will automatically search them when answering questions
You can also instruct the agent in its system prompt:
Always search the knowledge base before answering product questions.
If you find relevant information, cite the document name in your response.
If no relevant results are found, say "I don't have information about that
in my documentation" rather than guessing.
Monitoring
Document Status
Track document processing in the Knowledge Base dashboard:
| Status | Description |
|---|---|
| Processing | Document is being parsed and chunked |
| Ready | Document is fully indexed and searchable |
| Error | Processing failed (check error details) |
Search Analytics
Monitor how your knowledge base is being used:
- Query volume — How many searches per day
- Average relevance — How well results match queries
- No-result queries — Queries that returned no relevant chunks (indicates content gaps)