Knowledge base

Embeddings and vector stores

The embedding model decides how Dezifi understands meaning. The vector store decides where indices live and how fast retrieval is. Together they determine retrieval quality, cost, and data residency.

What you'll learn
  • How to choose an embedding model
  • When to use Pinecone, Weaviate, or the local store
  • Which knobs change retrieval quality
  • How to evaluate retrieval before going to production

Choosing an embedding model

Embedding models convert text into vectors. Bigger models capture more nuance but cost more per token and produce larger indices. Start with a balanced default — text-embedding-3-small from OpenAI or an equivalent open-source model. Move to a larger model only if retrieval evaluation shows consistently weak matches. The model is locked at dataset creation; switching means re-indexing.

Vector store options

  1. 1

    Pinecone

    Managed, serverless, high throughput. Best for large datasets and multi-region deployments. Lowest operational overhead.
  2. 2

    Weaviate

    Self-hostable or managed. Hybrid search (vector + keyword) is first-class. Pick this when you need to combine semantic and lexical retrieval, or when data residency requires self-hosting.
  3. 3

    Local store

    Embedded vector store that ships with Dezifi. Zero external dependencies. Best for small datasets, development, and air-gapped on-prem deployments.
  4. 4

    Bring your own

    Custom providers can be added through the SDK if your team already runs pgvector, Qdrant, or Milvus. Talk to your account team for the integration path.

Retrieval quality knobs

  1. 1

    Chunk size

    Default is 500 tokens with 50-token overlap. Smaller chunks improve precision on short fact lookups; larger chunks help when the agent needs surrounding context. Adjust per dataset, not per query.
  2. 2

    Top-k

    How many chunks the agent pulls per query. Default is 5. Raise it when answers feel incomplete; lower it when the agent gets distracted by irrelevant context.
  3. 3

    Similarity threshold

    Minimum similarity score for a chunk to be returned. Higher = fewer but more relevant chunks, with a higher chance of returning nothing. Tune in tandem with top-k.
  4. 4

    Hybrid search

    Combines vector similarity with keyword match. Turn on for domains with proper nouns, codes, or acronyms the embedding model may not recognize semantically.

Evaluate before going live

Build a small set of representative questions with expected source documents and run them through Eval. Track retrieval precision and the agent end-to-end answer quality across provider and knob changes. Do not ship a tuning change without measuring it.

Frequently asked questions

Which provider should I pick if I am unsure?
Pinecone for managed production. Local store for development, prototypes, and on-prem. You can prove the dataset with the local store, then promote to Pinecone by creating a new dataset and re-uploading.
Can I mix embedding models across datasets?
Yes. Each dataset has its own model. An agent that attaches to two datasets queries each one with the model it was indexed with — Dezifi handles that automatically.
What does it cost?
Embedding is a one-time cost per document chunk, billed per token by your model provider. Vector store cost depends on the provider — local is free, managed providers bill on storage and query volume.
How do I know retrieval is good enough?
Run an Eval suite of held-out questions. Aim for top-k precision above 0.8 on representative queries before exposing the agent to end users.