Understanding Experiment Variants

What are Experiment Variants?

Experiment variants allow you to control how the Knowledge Assistant retrieves and processes information before generating a response. Different variants enable different combinations of retrieval techniques, each with their own trade-offs between speed, accuracy, and relevance.

Available Variants

Turbo

The fastest possible retrieval path. Turbo skips HyDE, hybrid retrieval, and reranking to deliver pure vector search results with minimum latency.

Hybrid: off | Reranking: off | HyDE: disabled

Default

Uses the system's default configuration. Hybrid retrieval and Vertex reranking are both disabled by default, relying on pure vector search. HyDE is enabled.

Hybrid: off | Reranking: off | HyDE: enabled

Hybrid

Forces hybrid retrieval, which combines semantic vector search with keyword-based BM25 search. Results are merged using Reciprocal Rank Fusion (RRF). This is particularly effective for queries that contain specific terms or product names.

Hybrid: forced ON | Reranking: off | HyDE: enabled

Rerank

Forces Vertex AI reranking on retrieved results. After initial retrieval, Google's Discovery Engine analyzes and reorders results based on semantic relevance to your query. This can significantly improve result quality for complex questions.

Hybrid: off | Reranking: forced ON | HyDE: enabled

Hybrid + Rerank

Combines both techniques: hybrid retrieval followed by Vertex reranking. This is the most comprehensive retrieval pipeline, casting a wider net with hybrid search then refining results with reranking. Best for important queries where quality matters most.

Hybrid: forced ON | Reranking: forced ON | HyDE: enabled

Cross-Encoder

Enables cross-encoder reranking as a second-stage filter after Vertex AI reranking. Cross-encoders analyze query-document pairs jointly rather than independently, providing more precise relevance scoring. Best for critical queries where precision matters more than speed.

Hybrid: off | Vertex Rerank: forced ON | Cross-Encoder: enabled | HyDE: enabled

No HyDE

Disables Hypothetical Document Embedding. HyDE generates a hypothetical answer to improve retrieval for vague queries like "tell me about Bitcoin." Disabling it makes retrieval faster but may reduce quality for exploratory questions.

Hybrid: off | Reranking: off | HyDE: disabled

Technical Deep Dive

Hybrid Retrieval (Vector + BM25)

Traditional vector search uses semantic embeddings to find conceptually similar content. However, it can miss documents with exact keyword matches. BM25 is a classic keyword-based algorithm that excels at finding documents with specific terms.

Hybrid retrieval runs both searches in parallel and merges results using Reciprocal Rank Fusion (RRF). RRF assigns scores based on rank position in each list:

RRF_score = Σ 1/(k + rank)

where k = 60 (constant) and rank is the position in each list

Documents appearing high in both lists get boosted, while documents appearing in only one list still contribute. This creates a balanced result set.

Vertex AI Reranking

After initial retrieval, Vertex AI's Discovery Engine reanalyzes the top candidates. It uses a more sophisticated model to assess relevance, considering:

  • Semantic similarity between query and document content
  • Query-document term overlap
  • Document quality signals

Reranking adds latency (~200-500ms) but often significantly improves the quality of the top results.

Cross-Encoder Reranking

Unlike bi-encoders (used for initial retrieval) which embed queries and documents separately,cross-encoders process the query-document pair together. This allows the model to:

  • Capture fine-grained interactions between query and document terms
  • Understand nuanced relevance relationships
  • Provide more accurate relevance scores

Cross-encoders are slower (O(n) per query vs O(1) for bi-encoders) but more accurate. We use them as a second-stage reranker on the top ~10 candidates after Vertex AI reranking.

HyDE (Hypothetical Document Embedding)

For underspecified queries like "What's happening with crypto?", the query itself may not embed closely to relevant documents. HyDE addresses this by:

  1. Using an LLM to generate a short hypothetical answer
  2. Embedding the hypothetical (which is closer to actual document language)
  3. Retrieving using the hypothetical embedding
  4. Merging results with the original query results via RRF

HyDE only activates for vague queries and when initial results have low confidence scores.

When to Use Each Variant

ScenarioRecommended Variant
General useDefault
Searching for specific product (e.g., "BITB expense ratio")Hybrid
Complex analytical questionsRerank or Hybrid+Rerank
Critical research queries requiring maximum precisionCross-Encoder
Fastest possible response timeTurbo
Faster response without hybrid or rerankingNo HyDE
Critical research queriesHybrid+Rerank

Analytics Integration

All queries are tagged with the experiment variant used, enabling comparison of performance metrics across variants in the Analytics Dashboard. The Variant Breakdown section shows queries, response times, and engagement metrics per variant.