Knowledge Base Overview

Learn how the ThinkFleet knowledge base works — RAG-powered document search for your AI agents.

5 min readKnowledge Base

Knowledge Base Overview

The Knowledge Base is ThinkFleet's built-in Retrieval-Augmented Generation (RAG) system. It lets you upload documents that your AI agents can search and reference when answering questions, ensuring responses are grounded in your actual data rather than general knowledge.

How It Works

The RAG Pipeline

Upload Document
    │
    ▼
Parse & Extract Text
    │
    ▼
Split into Chunks
    │
    ▼
Generate Embeddings (vectors)
    │
    ▼
Store in pgvector
    │
    ▼
Ready for Search

When an agent needs information:

User Question
    │
    ▼
Generate Query Embedding
    │
    ▼
Vector Similarity Search
    │
    ▼
Return Top-K Relevant Chunks
    │
    ▼
Include in Agent Context
    │
    ▼
LLM Generates Grounded Response

Key Components

Component	Technology	Purpose
Document Parser	Built-in parsers	Extract text from PDF, DOCX, TXT, HTML, Markdown
Chunking Engine	Recursive text splitter	Break documents into searchable segments
Embedding Model	OpenAI or configurable	Convert text to vector representations
Vector Store	pgvector (PostgreSQL)	Store and search embeddings efficiently

Supported Document Types

Format	Extension	Notes
PDF	`.pdf`	Text extraction; scanned PDFs require OCR
Word	`.docx`	Full formatting preserved during extraction
Plain Text	`.txt`	Direct ingestion
Markdown	`.md`	Headers used as natural chunk boundaries
HTML	`.html`	Tags stripped, text extracted
CSV	`.csv`	Each row can become a separate chunk

Architecture

ThinkFleet's knowledge base runs entirely on your existing PostgreSQL database using the pgvector extension. This means:

No additional infrastructure — No separate vector database to manage
Consistent backups — Your knowledge base is backed up with your regular database
Transaction safety — Document operations are ACID-compliant
Cost-effective — No extra service costs

Database Tables

Table	Purpose
`knowledge_base`	Stores knowledge base metadata (name, project, settings)
`knowledge_base_document`	Tracks uploaded documents and processing status
`knowledge_base_chunk`	Stores document chunks with vector embeddings

Creating a Knowledge Base

Navigate to Knowledge Base in the sidebar
Click New Knowledge Base
Enter a name (e.g., "Product Documentation")
Configure settings:
- Chunk Size: Target size for each text chunk (default: 500 tokens)
- Chunk Overlap: Overlap between consecutive chunks (default: 50 tokens)
- Embedding Model: Select the embedding model

Chunking Strategy

Chunking determines how documents are split for search. The right chunk size depends on your content:

Content Type	Recommended Chunk Size	Overlap
FAQs	200-300 tokens	20 tokens
Technical docs	500-800 tokens	50 tokens
Legal documents	800-1200 tokens	100 tokens
General articles	400-600 tokens	50 tokens

Smaller chunks = more precise search results, but less context per result Larger chunks = more context per result, but may include irrelevant content

Chunk Overlap

Overlap ensures that information at chunk boundaries isn't lost. If a key sentence spans two chunks, the overlap means it appears in both.

Search

How Search Works

When an agent queries the knowledge base:

The query text is converted to a vector embedding
pgvector performs a cosine similarity search against all chunks
The top-K most similar chunks are returned
Chunks are ranked by relevance score (0.0 to 1.0)

Search Parameters

Parameter	Description	Default
Top K	Number of chunks to return	5
Similarity Threshold	Minimum relevance score	0.7
Max Tokens	Maximum tokens across all returned chunks	2000

Search Quality Tips

Use descriptive document titles — They're included in chunk metadata
Structure documents with headers — Headers create natural chunk boundaries
Remove boilerplate — Copyright notices, headers/footers reduce search quality
Keep content focused — One topic per document performs better than catch-all documents

Connecting to Agents

To give an agent access to a knowledge base:

Open the agent's settings
Go to the Knowledge Base tab
Select one or more knowledge bases
The agent will automatically search them when answering questions

You can also instruct the agent in its system prompt:

Always search the knowledge base before answering product questions.
If you find relevant information, cite the document name in your response.
If no relevant results are found, say "I don't have information about that
in my documentation" rather than guessing.

Monitoring

Document Status

Track document processing in the Knowledge Base dashboard:

Status	Description
Processing	Document is being parsed and chunked
Ready	Document is fully indexed and searchable
Error	Processing failed (check error details)

Search Analytics

Monitor how your knowledge base is being used:

Query volume — How many searches per day
Average relevance — How well results match queries
No-result queries — Queries that returned no relevant chunks (indicates content gaps)