Knowledge base

Knowledge bases & RAG

A knowledge base is a set of indexed documents your agents can search at runtime. Build one when your agent needs facts that aren't in the model's training data — internal policies, product manuals, customer history, support transcripts.

What you'll learn
  • What retrieval-augmented generation (RAG) is and when to use it
  • How datasets, documents, and vector stores relate
  • How an agent retrieves from a dataset at runtime
  • When to pick a knowledge base over a tool or fine-tune

What is RAG

Retrieval-augmented generation lets an agent look up information at the moment it answers. Documents are split into chunks, embedded into vectors, and stored in a vector database. At runtime the agent embeds the user query, fetches the closest chunks, and passes them to the LLM as context. The result: grounded answers that cite source documents, without retraining the model.

When to use a knowledge base

  1. 1

    Internal documents the model has never seen

    Policies, runbooks, contracts, product specs, onboarding guides — anything proprietary to your company.
  2. 2

    High-volume, slow-changing reference data

    Support FAQs, troubleshooting trees, compliance text. Cheaper than calling a tool on every turn.
  3. 3

    Citation requirements

    The agent must point to a source. RAG returns the chunk and document ID with every retrieval.
  4. 4

    Not the right fit

    Live operational data (use a tool), structured records you query by key (use SQL), or behavior you want the model to learn (fine-tune).

What is a dataset

A dataset is the unit of organization inside a knowledge base. One dataset holds the documents on a single topic, indexed with one embedding model and stored in one vector store. Agents attach to a dataset to query it. You can split content across multiple datasets when topics do not overlap, then attach more than one to the same agent.

Frequently asked questions

How is a knowledge base different from a tool?
A tool reads or writes live data through an API — Salesforce, Postgres, Zendesk. A knowledge base is a static, pre-indexed corpus you control. Use tools for operational data, knowledge bases for reference content.
Do I need a knowledge base for every agent?
No. Most agents start without one. Add a knowledge base only when the model needs facts it doesn't already know and the data is too large to fit in the system prompt.
Can multiple agents share one dataset?
Yes. Datasets are reusable across agents and workflows. Index once, attach anywhere.
How fresh is the data the agent sees?
As fresh as your last indexing run. Re-upload or re-sync documents on the cadence your content changes. Live data should come through a tool, not a knowledge base.