Experiment variants allow you to control how the Knowledge Assistant retrieves and processes information before generating a response. Different variants enable different combinations of retrieval techniques, each with their own trade-offs between speed, accuracy, and relevance.
The fastest possible retrieval path. Turbo skips HyDE, hybrid retrieval, and reranking to deliver pure vector search results with minimum latency.
Uses the system's default configuration. Hybrid retrieval and Vertex reranking are both disabled by default, relying on pure vector search. HyDE is enabled.
Forces hybrid retrieval, which combines semantic vector search with keyword-based BM25 search. Results are merged using Reciprocal Rank Fusion (RRF). This is particularly effective for queries that contain specific terms or product names.
Forces Vertex AI reranking on retrieved results. After initial retrieval, Google's Discovery Engine analyzes and reorders results based on semantic relevance to your query. This can significantly improve result quality for complex questions.
Combines both techniques: hybrid retrieval followed by Vertex reranking. This is the most comprehensive retrieval pipeline, casting a wider net with hybrid search then refining results with reranking. Best for important queries where quality matters most.
Enables cross-encoder reranking as a second-stage filter after Vertex AI reranking. Cross-encoders analyze query-document pairs jointly rather than independently, providing more precise relevance scoring. Best for critical queries where precision matters more than speed.
Disables Hypothetical Document Embedding. HyDE generates a hypothetical answer to improve retrieval for vague queries like "tell me about Bitcoin." Disabling it makes retrieval faster but may reduce quality for exploratory questions.
Traditional vector search uses semantic embeddings to find conceptually similar content. However, it can miss documents with exact keyword matches. BM25 is a classic keyword-based algorithm that excels at finding documents with specific terms.
Hybrid retrieval runs both searches in parallel and merges results using Reciprocal Rank Fusion (RRF). RRF assigns scores based on rank position in each list:
RRF_score = Σ 1/(k + rank) where k = 60 (constant) and rank is the position in each list
Documents appearing high in both lists get boosted, while documents appearing in only one list still contribute. This creates a balanced result set.
After initial retrieval, Vertex AI's Discovery Engine reanalyzes the top candidates. It uses a more sophisticated model to assess relevance, considering:
Reranking adds latency (~200-500ms) but often significantly improves the quality of the top results.
Unlike bi-encoders (used for initial retrieval) which embed queries and documents separately,cross-encoders process the query-document pair together. This allows the model to:
Cross-encoders are slower (O(n) per query vs O(1) for bi-encoders) but more accurate. We use them as a second-stage reranker on the top ~10 candidates after Vertex AI reranking.
For underspecified queries like "What's happening with crypto?", the query itself may not embed closely to relevant documents. HyDE addresses this by:
HyDE only activates for vague queries and when initial results have low confidence scores.
| Scenario | Recommended Variant |
|---|---|
| General use | Default |
| Searching for specific product (e.g., "BITB expense ratio") | Hybrid |
| Complex analytical questions | Rerank or Hybrid+Rerank |
| Critical research queries requiring maximum precision | Cross-Encoder |
| Fastest possible response time | Turbo |
| Faster response without hybrid or reranking | No HyDE |
| Critical research queries | Hybrid+Rerank |
All queries are tagged with the experiment variant used, enabling comparison of performance metrics across variants in the Analytics Dashboard. The Variant Breakdown section shows queries, response times, and engagement metrics per variant.