Hybrid Search: Combine Semantic and Keyword Search
Boost retrieval accuracy by 20-30%: combine vector search with BM25 keyword matching for superior RAG performance.
Why Hybrid Search?
Vector search misses exact matches. BM25 misses semantics. Combine both for 20-30% better recall.
Vector search fails on:
- Product IDs: "SKU-12345"
- Proper nouns: "Marie Curie"
- Technical terms: "RAG-Fusion"
BM25 fails on:
- Synonyms: "car" vs "automobile"
- Paraphrasing: "how to cook pasta" vs "pasta cooking instructions"
Implementation (November 2025)
With Weaviate
Weaviate has built-in hybrid search (alpha parameter):
DEVELOPERpythonimport weaviate client = weaviate.Client("http://localhost:8080") results = client.query.get("Document", ["content"]).with_hybrid( query="Marie Curie radioactivity", alpha=0.7 # 0 = pure BM25, 1 = pure vector ).with_limit(10).do()
With Qdrant
DEVELOPERpythonfrom qdrant_client import QdrantClient from qdrant_client.models import Prefetch, Query client = QdrantClient("localhost", port=6333) # Vector + keyword search results = client.query_points( collection_name="documents", prefetch=Prefetch( query="radiation discovery", using="dense", limit=20 ), query=Query( text="Marie Curie", using="sparse" ), limit=10 )
Manual Hybrid (Any Vector DB)
DEVELOPERpythonfrom rank_bm25 import BM25Okapi import numpy as np # BM25 setup tokenized_docs = [doc.split() for doc in documents] bm25 = BM25Okapi(tokenized_docs) def hybrid_search(query, vector_db, alpha=0.7, k=10): # 1. Vector search query_vector = embed_model.encode(query) vector_results = vector_db.search(query_vector, k=k*2) # 2. BM25 search bm25_scores = bm25.get_scores(query.split()) # 3. Normalize scores to [0, 1] vector_scores = {r['id']: r['score'] for r in vector_results} max_v = max(vector_scores.values()) vector_scores = {k: v/max_v for k, v in vector_scores.items()} max_b = max(bm25_scores) bm25_scores_norm = {i: score/max_b for i, score in enumerate(bm25_scores)} # 4. Combine with alpha weighting combined = {} for doc_id in set(vector_scores.keys()) | set(bm25_scores_norm.keys()): combined[doc_id] = ( alpha * vector_scores.get(doc_id, 0) + (1 - alpha) * bm25_scores_norm.get(doc_id, 0) ) # 5. Sort and return top k top_results = sorted(combined.items(), key=lambda x: x[1], reverse=True)[:k] return [documents[doc_id] for doc_id, _ in top_results]
Reciprocal Rank Fusion (RRF)
Better than score fusion - combines rankings, not scores:
DEVELOPERpythondef reciprocal_rank_fusion(rankings, k=60): """ rankings: List of document IDs ranked by different methods k: Constant (typically 60) """ rrf_scores = {} for rank_list in rankings: for rank, doc_id in enumerate(rank_list, start=1): if doc_id not in rrf_scores: rrf_scores[doc_id] = 0 rrf_scores[doc_id] += 1 / (k + rank) return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True) # Use it vector_results = vector_search(query) bm25_results = bm25_search(query) final = reciprocal_rank_fusion([ [r['id'] for r in vector_results], [i for i, _ in sorted(enumerate(bm25_scores), key=lambda x: x[1], reverse=True)] ])
Tuning Alpha
Test on your queries:
DEVELOPERpythontest_queries = ["Marie Curie", "SKU-12345", "how does photosynthesis work"] ground_truth = {...} # Known relevant docs alphas = [0.3, 0.5, 0.7, 0.9] for alpha in alphas: recall = evaluate_hybrid(test_queries, ground_truth, alpha) print(f"Alpha {alpha}: Recall@10 = {recall}")
Typical optimal values:
- Technical docs with IDs/codes: alpha = 0.3-0.5 (favor BM25)
- Natural language QA: alpha = 0.7-0.8 (favor vector)
- Mixed content: alpha = 0.5-0.6
Sparse-Dense Encoders (2025 Innovation)
Single model for both sparse and dense:
DEVELOPERpythonfrom transformers import AutoModelForMaskedLM, AutoTokenizer # SPLADE or BGE-M3 for sparse+dense model = AutoModelForMaskedLM.from_pretrained('naver/splade-v3') tokenizer = AutoTokenizer.from_pretrained('naver/splade-v3') # Get both sparse and dense in one pass tokens = tokenizer(query, return_tensors='pt') output = model(**tokens) sparse_vector = output.logits.max(dim=1).values # Sparse dense_vector = output.last_hidden_state.mean(dim=1) # Dense
Hybrid search is the secret weapon of production RAG systems. Implement it and watch your recall soar.
Tags
Related Guides
Advanced Retrieval Strategies for RAG
Beyond basic similarity search: hybrid search, query expansion, MMR, and multi-stage retrieval for better RAG performance.
Query Expansion: Retrieve More Relevant Results
Improve recall by 40%: expand user queries with synonyms, sub-queries, and LLM-generated variations.
Parent Document Retrieval: Context Without Noise
Search small chunks, retrieve full documents: the best of both precision and context for RAG systems.