Understanding Chunks & Retrieval

What Are Chunks?

A “chunk” is a piece of a document that has been processed and indexed for search. When you ask a question, the system searches these chunks to find relevant information, then passes the most relevant chunks to the language model to generate a response.

Why Is Content Split?

Language models have context limits—they can only process a certain amount of text at once. Additionally, searching through entire documents would be slow and imprecise. Splitting content into chunks provides several benefits:

  • Precision: Smaller chunks mean more targeted results
  • Context limits: Fits within LLM token limits
  • Speed: Faster vector search on smaller units
  • Relevance: Each chunk can be scored independently

Chunk Size Trade-offs

Smaller Chunks

✓ More precise retrieval
✓ Better for specific facts
✗ May lose context
✗ More chunks to search

Larger Chunks

✓ More context preserved
✓ Better for complex topics
✗ May include irrelevant content
✗ Uses more context window

The system uses adaptive chunking based on document type, typically targeting 500-1500 characters per chunk with semantic boundaries.

Semantic vs. Keyword Matching

Semantic Search (Vector)

Converts your query and chunks into mathematical vectors (embeddings) and finds chunks whose meaning is similar, even if they use different words.

Example: “Bitcoin ETF cost” matches “BITB expense ratio is 0.20%”

Keyword Search (BM25)

Traditional text matching that finds chunks containing the exact words in your query. Excellent for specific terms, names, and tickers.

Example: “BITB” finds chunks containing exactly “BITB”

The system can use both approaches together (hybrid search) and merge results for the best of both worlds. See Experiment Variants to learn about enabling hybrid search.

Why Some Content Isn't Found

If the system doesn't find content you expect, there are several possible reasons:

Common Retrieval Issues

  • Relevance threshold: Content exists but scored below the threshold
  • Semantic gap: Query wording too different from document language
  • Not ingested: Document hasn't been added to the knowledge base
  • Chunking split: Information split across multiple chunks
  • Entity mismatch: Query mentions a product not in the chunk

Debugging Retrieval Issues

To troubleshoot why content isn't being retrieved:

  1. Check the sources panel: After each response, review the source documents shown. Are the expected sources appearing at all?
  2. Try different wording: Rephrase your query using terms that appear in the document. Specific product names and tickers often help.
  3. Lower the relevance threshold: In retrieval settings, try reducing the minimum relevance score to see if content appears.
  4. Enable hybrid search: Use the “Hybrid” experiment variant to add keyword matching alongside semantic search.
  5. Check entity filters: If asking about a specific product, ensure the entity overlap filter isn't being too aggressive.

The Retrieval Pipeline

QueryEmbedVector SearchFilterRerankGenerate
  1. Query: User's question is analyzed for complexity and intent
  2. Embed: Query converted to 2048-dimensional vector
  3. Vector Search: Find similar chunks in Vertex AI Vector Search
  4. Filter: Apply length, relevance, and entity filters
  5. Rerank: Optionally reorder results for better relevance
  6. Generate: Pass top chunks to LLM for response generation

Related Articles