Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

Why Hybrid Search?

Vector search misses exact matches. BM25 misses semantics. Combine both for 20-30% better recall.

Vector search fails on:

Product IDs: "SKU-12345"
Proper nouns: "Marie Curie"
Technical terms: "RAG-Fusion"

BM25 fails on:

Synonyms: "car" vs "automobile"
Paraphrasing: "how to cook pasta" vs "pasta cooking instructions"

Implementation (November 2025)

With Weaviate

Weaviate has built-in hybrid search (alpha parameter):

DEVELOPERpython
import weaviate

client = weaviate.Client("http://localhost:8080")

results = client.query.get("Document", ["content"]).with_hybrid(
    query="Marie Curie radioactivity",
    alpha=0.7  # 0 = pure BM25, 1 = pure vector
).with_limit(10).do()

With Qdrant

DEVELOPERpython
from qdrant_client import QdrantClient
from qdrant_client.models import Prefetch, Query

client = QdrantClient("localhost", port=6333)

# Vector + keyword search
results = client.query_points(
    collection_name="documents",
    prefetch=Prefetch(
        query="radiation discovery",
        using="dense",
        limit=20
    ),
    query=Query(
        text="Marie Curie",
        using="sparse"
    ),
    limit=10
)

Manual Hybrid (Any Vector DB)

DEVELOPERpython
from rank_bm25 import BM25Okapi
import numpy as np

# BM25 setup
tokenized_docs = [doc.split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)

def hybrid_search(query, vector_db, alpha=0.7, k=10):
    # 1. Vector search
    query_vector = embed_model.encode(query)
    vector_results = vector_db.search(query_vector, k=k*2)
    
    # 2. BM25 search
    bm25_scores = bm25.get_scores(query.split())
    
    # 3. Normalize scores to [0, 1]
    vector_scores = {r['id']: r['score'] for r in vector_results}
    max_v = max(vector_scores.values())
    vector_scores = {k: v/max_v for k, v in vector_scores.items()}
    
    max_b = max(bm25_scores)
    bm25_scores_norm = {i: score/max_b for i, score in enumerate(bm25_scores)}
    
    # 4. Combine with alpha weighting
    combined = {}
    for doc_id in set(vector_scores.keys()) | set(bm25_scores_norm.keys()):
        combined[doc_id] = (
            alpha * vector_scores.get(doc_id, 0) +
            (1 - alpha) * bm25_scores_norm.get(doc_id, 0)
        )
    
    # 5. Sort and return top k
    top_results = sorted(combined.items(), key=lambda x: x[1], reverse=True)[:k]
    return [documents[doc_id] for doc_id, _ in top_results]

Reciprocal Rank Fusion (RRF)

Better than score fusion - combines rankings, not scores:

DEVELOPERpython
def reciprocal_rank_fusion(rankings, k=60):
    """
    rankings: List of document IDs ranked by different methods
    k: Constant (typically 60)
    """
    rrf_scores = {}
    
    for rank_list in rankings:
        for rank, doc_id in enumerate(rank_list, start=1):
            if doc_id not in rrf_scores:
                rrf_scores[doc_id] = 0
            rrf_scores[doc_id] += 1 / (k + rank)
    
    return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

# Use it
vector_results = vector_search(query)
bm25_results = bm25_search(query)

final = reciprocal_rank_fusion([
    [r['id'] for r in vector_results],
    [i for i, _ in sorted(enumerate(bm25_scores), key=lambda x: x[1], reverse=True)]
])

Tuning Alpha

Test on your queries:

DEVELOPERpython
test_queries = ["Marie Curie", "SKU-12345", "how does photosynthesis work"]
ground_truth = {...}  # Known relevant docs

alphas = [0.3, 0.5, 0.7, 0.9]
for alpha in alphas:
    recall = evaluate_hybrid(test_queries, ground_truth, alpha)
    print(f"Alpha {alpha}: Recall@10 = {recall}")

Typical optimal values:

Technical docs with IDs/codes: alpha = 0.3-0.5 (favor BM25)
Natural language QA: alpha = 0.7-0.8 (favor vector)
Mixed content: alpha = 0.5-0.6

Sparse-Dense Encoders (2025 Innovation)

Single model for both sparse and dense:

DEVELOPERpython
from transformers import AutoModelForMaskedLM, AutoTokenizer

# SPLADE or BGE-M3 for sparse+dense
model = AutoModelForMaskedLM.from_pretrained('naver/splade-v3')
tokenizer = AutoTokenizer.from_pretrained('naver/splade-v3')

# Get both sparse and dense in one pass
tokens = tokenizer(query, return_tensors='pt')
output = model(**tokens)

sparse_vector = output.logits.max(dim=1).values  # Sparse
dense_vector = output.last_hidden_state.mean(dim=1)  # Dense

Hybrid search is the secret weapon of production RAG systems. Implement it and watch your recall soar.

Hybrid Search for RAG: BM25 + Vector Search Tutorial (2025)

Why Hybrid Search?

Implementation (November 2025)

With Weaviate

With Qdrant

Manual Hybrid (Any Vector DB)

Reciprocal Rank Fusion (RRF)

Tuning Alpha

Sparse-Dense Encoders (2025 Innovation)

Tags

Related Posts

Retrieval Fundamentals: How RAG Search Works

Hybrid Fusion: Combining Dense and Sparse Retrieval

Sparse Retrieval and BM25: When Lexical Search Wins

Ailog Assistant