Knowledge base

Embeddings and vector stores

The embedding model decides how Dezifi understands meaning. The vector store decides where indices live and how fast retrieval is. Together they determine retrieval quality, cost, and data residency.

What you'll learn

How to choose an embedding model
When to use Pinecone, Weaviate, or the local store
Which knobs change retrieval quality
How to evaluate retrieval before going to production

Choosing an embedding model

Embedding models convert text into vectors. Bigger models capture more nuance but cost more per token and produce larger indices. Start with a balanced default — text-embedding-3-small from OpenAI or an equivalent open-source model. Move to a larger model only if retrieval evaluation shows consistently weak matches. The model is locked at dataset creation; switching means re-indexing.

Vector store options

1
Pinecone
Managed, serverless, high throughput. Best for large datasets and multi-region deployments. Lowest operational overhead.
2
Weaviate
Self-hostable or managed. Hybrid search (vector + keyword) is first-class. Pick this when you need to combine semantic and lexical retrieval, or when data residency requires self-hosting.
3
Local store
Embedded vector store that ships with Dezifi. Zero external dependencies. Best for small datasets, development, and air-gapped on-prem deployments.
4
Bring your own
Custom providers can be added through the SDK if your team already runs pgvector, Qdrant, or Milvus. Talk to your account team for the integration path.

Retrieval quality knobs

1
Chunk size
Default is 500 tokens with 50-token overlap. Smaller chunks improve precision on short fact lookups; larger chunks help when the agent needs surrounding context. Adjust per dataset, not per query.
2
Top-k
How many chunks the agent pulls per query. Default is 5. Raise it when answers feel incomplete; lower it when the agent gets distracted by irrelevant context.
3
Similarity threshold
Minimum similarity score for a chunk to be returned. Higher = fewer but more relevant chunks, with a higher chance of returning nothing. Tune in tandem with top-k.
4
Hybrid search
Combines vector similarity with keyword match. Turn on for domains with proper nouns, codes, or acronyms the embedding model may not recognize semantically.

Evaluate before going live

Build a small set of representative questions with expected source documents and run them through Eval. Track retrieval precision and the agent end-to-end answer quality across provider and knob changes. Do not ship a tuning change without measuring it.

Frequently asked questions

Which provider should I pick if I am unsure?: Pinecone for managed production. Local store for development, prototypes, and on-prem. You can prove the dataset with the local store, then promote to Pinecone by creating a new dataset and re-uploading.
Can I mix embedding models across datasets?: Yes. Each dataset has its own model. An agent that attaches to two datasets queries each one with the model it was indexed with — Dezifi handles that automatically.
What does it cost?: Embedding is a one-time cost per document chunk, billed per token by your model provider. Vector store cost depends on the provider — local is free, managed providers bill on storage and query volume.
How do I know retrieval is good enough?: Run an Eval suite of held-out questions. Aim for top-k precision above 0.8 on representative queries before exposing the agent to end users.

Knowledge bases overview

Create a dataset

Upload documents

Eval overview