Knowledge base

Knowledge bases & RAG

A knowledge base is a set of indexed documents your agents can search at runtime. Build one when your agent needs facts that aren't in the model's training data — internal policies, product manuals, customer history, support transcripts.

What you'll learn

What retrieval-augmented generation (RAG) is and when to use it
How datasets, documents, and vector stores relate
How an agent retrieves from a dataset at runtime
When to pick a knowledge base over a tool or fine-tune

What is RAG

Retrieval-augmented generation lets an agent look up information at the moment it answers. Documents are split into chunks, embedded into vectors, and stored in a vector database. At runtime the agent embeds the user query, fetches the closest chunks, and passes them to the LLM as context. The result: grounded answers that cite source documents, without retraining the model.

When to use a knowledge base

1
Internal documents the model has never seen
Policies, runbooks, contracts, product specs, onboarding guides — anything proprietary to your company.
2
High-volume, slow-changing reference data
Support FAQs, troubleshooting trees, compliance text. Cheaper than calling a tool on every turn.
3
Citation requirements
The agent must point to a source. RAG returns the chunk and document ID with every retrieval.
4
Not the right fit
Live operational data (use a tool), structured records you query by key (use SQL), or behavior you want the model to learn (fine-tune).

What is a dataset

A dataset is the unit of organization inside a knowledge base. One dataset holds the documents on a single topic, indexed with one embedding model and stored in one vector store. Agents attach to a dataset to query it. You can split content across multiple datasets when topics do not overlap, then attach more than one to the same agent.

Frequently asked questions

How is a knowledge base different from a tool?: A tool reads or writes live data through an API — Salesforce, Postgres, Zendesk. A knowledge base is a static, pre-indexed corpus you control. Use tools for operational data, knowledge bases for reference content.
Do I need a knowledge base for every agent?: No. Most agents start without one. Add a knowledge base only when the model needs facts it doesn't already know and the data is too large to fit in the system prompt.
Can multiple agents share one dataset?: Yes. Datasets are reusable across agents and workflows. Index once, attach anywhere.
How fresh is the data the agent sees?: As fresh as your last indexing run. Re-upload or re-sync documents on the cadence your content changes. Live data should come through a tool, not a knowledge base.

Create a dataset

Name, scope, and provision a new dataset.

Upload documents

Get files into the index.

Embeddings & providers

Pick the right vector store and model.

Agents overview

How an agent consumes a knowledge base.