Knowledge base
Knowledge bases & RAG
A knowledge base is a set of indexed documents your agents can search at runtime. Build one when your agent needs facts that aren't in the model's training data — internal policies, product manuals, customer history, support transcripts.
What you'll learn
- What retrieval-augmented generation (RAG) is and when to use it
- How datasets, documents, and vector stores relate
- How an agent retrieves from a dataset at runtime
- When to pick a knowledge base over a tool or fine-tune
What is RAG
Retrieval-augmented generation lets an agent look up information at the moment it answers. Documents are split into chunks, embedded into vectors, and stored in a vector database. At runtime the agent embeds the user query, fetches the closest chunks, and passes them to the LLM as context. The result: grounded answers that cite source documents, without retraining the model.
When to use a knowledge base
- 1
Internal documents the model has never seen
Policies, runbooks, contracts, product specs, onboarding guides — anything proprietary to your company. - 2
High-volume, slow-changing reference data
Support FAQs, troubleshooting trees, compliance text. Cheaper than calling a tool on every turn. - 3
Citation requirements
The agent must point to a source. RAG returns the chunk and document ID with every retrieval. - 4
Not the right fit
Live operational data (use a tool), structured records you query by key (use SQL), or behavior you want the model to learn (fine-tune).
What is a dataset
A dataset is the unit of organization inside a knowledge base. One dataset holds the documents on a single topic, indexed with one embedding model and stored in one vector store. Agents attach to a dataset to query it. You can split content across multiple datasets when topics do not overlap, then attach more than one to the same agent.
Frequently asked questions
- How is a knowledge base different from a tool?
- A tool reads or writes live data through an API — Salesforce, Postgres, Zendesk. A knowledge base is a static, pre-indexed corpus you control. Use tools for operational data, knowledge bases for reference content.
- Do I need a knowledge base for every agent?
- No. Most agents start without one. Add a knowledge base only when the model needs facts it doesn't already know and the data is too large to fit in the system prompt.
- Can multiple agents share one dataset?
- Yes. Datasets are reusable across agents and workflows. Index once, attach anywhere.
- How fresh is the data the agent sees?
- As fresh as your last indexing run. Re-upload or re-sync documents on the cadence your content changes. Live data should come through a tool, not a knowledge base.