Knowledge base

Create a dataset

A dataset is the container documents live in. Get the metadata and provider choice right up front — they shape what the agent can retrieve and how fast.

What you'll learn
  • What each dataset field controls
  • How to choose Public vs Private visibility
  • Which RAG provider to select for your use case
  • When to split content into multiple datasets

Open the dataset builder

Go to Knowledge Base in the sidebar, then click + New Dataset. The form has four fields. Each is required.

Fill the dataset fields

  1. 1

    Name

    Short, descriptive, lowercase-friendly. Examples: "support-faqs", "hr-policies-2026", "product-spec-v3". The name appears in agent attach pickers and in logs.
  2. 2

    Description

    One sentence on what's inside and who maintains it. This is what teammates see when they attach the dataset to an agent.
  3. 3

    Visibility

    Private — only the creator and admins can attach or edit. Public — any workspace member can attach to their agents. Start Private; promote to Public once the content is reviewed.
  4. 4

    RAG provider

    Pick the vector store and embedding model. Each provider is a paired choice: where vectors are stored plus which model embeds them. Defaults are fine for most teams; see Embeddings & providers for the tradeoffs.

Save and provision

Click Create. Dezifi provisions the underlying index against your chosen provider — this is instant for managed providers and may take a minute for self-hosted ones. You land on the dataset detail page, empty, ready for uploads.

When to split into multiple datasets

  1. 1

    Different topics

    Support FAQs and HR policies don't belong together — keep retrieval focused.
  2. 2

    Different freshness

    A daily-rotated dataset and an annual one should be separate so re-indexing one doesn't touch the other.
  3. 3

    Different access

    If only some agents should see certain content, isolate it in its own dataset and control attachment.

Frequently asked questions

Can I rename a dataset later?
Yes. Description and visibility are editable from the dataset detail page. The slug behind the scenes stays stable, so references from agents don't break.
Can I change the RAG provider after creating the dataset?
No. Provider choice is locked at creation because it determines how documents are embedded and stored. To switch, create a new dataset with the desired provider and re-upload.
How many datasets can a workspace have?
There is no hard limit. Most teams end up with five to twenty — one per topic or content owner. Splitting is cheap; over-merging hurts retrieval quality.