5. RetrievalIntermédiaire

Hybrid Search: Combine Semantic and Keyword Search

14 novembre 2025
10 min read
Ailog Research Team

Boost retrieval accuracy by 20-30%: combine vector search with BM25 keyword matching for superior RAG performance.

Why Hybrid Search?

Vector search misses exact matches. BM25 misses semantics. Combine both for 20-30% better recall.

Vector search fails on:

  • Product IDs: "SKU-12345"
  • Proper nouns: "Marie Curie"
  • Technical terms: "RAG-Fusion"

BM25 fails on:

  • Synonyms: "car" vs "automobile"
  • Paraphrasing: "how to cook pasta" vs "pasta cooking instructions"

Implementation (November 2025)

With Weaviate

Weaviate has built-in hybrid search (alpha parameter):

DEVELOPERpython
import weaviate client = weaviate.Client("http://localhost:8080") results = client.query.get("Document", ["content"]).with_hybrid( query="Marie Curie radioactivity", alpha=0.7 # 0 = pure BM25, 1 = pure vector ).with_limit(10).do()

With Qdrant

DEVELOPERpython
from qdrant_client import QdrantClient from qdrant_client.models import Prefetch, Query client = QdrantClient("localhost", port=6333) # Vector + keyword search results = client.query_points( collection_name="documents", prefetch=Prefetch( query="radiation discovery", using="dense", limit=20 ), query=Query( text="Marie Curie", using="sparse" ), limit=10 )

Manual Hybrid (Any Vector DB)

DEVELOPERpython
from rank_bm25 import BM25Okapi import numpy as np # BM25 setup tokenized_docs = [doc.split() for doc in documents] bm25 = BM25Okapi(tokenized_docs) def hybrid_search(query, vector_db, alpha=0.7, k=10): # 1. Vector search query_vector = embed_model.encode(query) vector_results = vector_db.search(query_vector, k=k*2) # 2. BM25 search bm25_scores = bm25.get_scores(query.split()) # 3. Normalize scores to [0, 1] vector_scores = {r['id']: r['score'] for r in vector_results} max_v = max(vector_scores.values()) vector_scores = {k: v/max_v for k, v in vector_scores.items()} max_b = max(bm25_scores) bm25_scores_norm = {i: score/max_b for i, score in enumerate(bm25_scores)} # 4. Combine with alpha weighting combined = {} for doc_id in set(vector_scores.keys()) | set(bm25_scores_norm.keys()): combined[doc_id] = ( alpha * vector_scores.get(doc_id, 0) + (1 - alpha) * bm25_scores_norm.get(doc_id, 0) ) # 5. Sort and return top k top_results = sorted(combined.items(), key=lambda x: x[1], reverse=True)[:k] return [documents[doc_id] for doc_id, _ in top_results]

Reciprocal Rank Fusion (RRF)

Better than score fusion - combines rankings, not scores:

DEVELOPERpython
def reciprocal_rank_fusion(rankings, k=60): """ rankings: List of document IDs ranked by different methods k: Constant (typically 60) """ rrf_scores = {} for rank_list in rankings: for rank, doc_id in enumerate(rank_list, start=1): if doc_id not in rrf_scores: rrf_scores[doc_id] = 0 rrf_scores[doc_id] += 1 / (k + rank) return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True) # Use it vector_results = vector_search(query) bm25_results = bm25_search(query) final = reciprocal_rank_fusion([ [r['id'] for r in vector_results], [i for i, _ in sorted(enumerate(bm25_scores), key=lambda x: x[1], reverse=True)] ])

Tuning Alpha

Test on your queries:

DEVELOPERpython
test_queries = ["Marie Curie", "SKU-12345", "how does photosynthesis work"] ground_truth = {...} # Known relevant docs alphas = [0.3, 0.5, 0.7, 0.9] for alpha in alphas: recall = evaluate_hybrid(test_queries, ground_truth, alpha) print(f"Alpha {alpha}: Recall@10 = {recall}")

Typical optimal values:

  • Technical docs with IDs/codes: alpha = 0.3-0.5 (favor BM25)
  • Natural language QA: alpha = 0.7-0.8 (favor vector)
  • Mixed content: alpha = 0.5-0.6

Sparse-Dense Encoders (2025 Innovation)

Single model for both sparse and dense:

DEVELOPERpython
from transformers import AutoModelForMaskedLM, AutoTokenizer # SPLADE or BGE-M3 for sparse+dense model = AutoModelForMaskedLM.from_pretrained('naver/splade-v3') tokenizer = AutoTokenizer.from_pretrained('naver/splade-v3') # Get both sparse and dense in one pass tokens = tokenizer(query, return_tensors='pt') output = model(**tokens) sparse_vector = output.logits.max(dim=1).values # Sparse dense_vector = output.last_hidden_state.mean(dim=1) # Dense

Hybrid search is the secret weapon of production RAG systems. Implement it and watch your recall soar.

Tags

hybrid searchbm25retrievalsemantic search

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !