Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

Warum hybride Suche ?

Die vector-Suche verpasst exakte Übereinstimmungen. BM25 fehlt die Semantik. Kombinieren Sie beide für 20-30% besseren Recall.

Die vector-Suche versagt bei :

Produkt-IDs : "SKU-12345"
Eigennamen : "Marie Curie"
Technische Begriffe : "RAG-Fusion"

BM25 versagt bei :

Synonyme : "Auto" vs "Automobil"
Paraphrasen : "wie man Pasta kocht" vs "Anleitung zum Kochen von Pasta"

Implementierung (November 2025)

Mit Weaviate

Weaviate hat die hybride Suche integriert (Parameter alpha) :

DEVELOPERpython
import weaviate

client = weaviate.Client("http://localhost:8080")

results = client.query.get("Document", ["content"]).with_hybrid(
    query="Marie Curie radioactivité",
    alpha=0.7  # 0 = pure BM25, 1 = pure vector
).with_limit(10).do()

Mit Qdrant

DEVELOPERpython
from qdrant_client import QdrantClient
from qdrant_client.models import Prefetch, Query

client = QdrantClient("localhost", port=6333)

# Vector + keyword search
results = client.query_points(
    collection_name="documents",
    prefetch=Prefetch(
        query="découverte radiation",
        using="dense",
        limit=20
    ),
    query=Query(
        text="Marie Curie",
        using="sparse"
    ),
    limit=10
)

Manueller Hybrid (jede vector-Datenbank)

DEVELOPERpython
from rank_bm25 import BM25Okapi
import numpy as np

# BM25 setup
tokenized_docs = [doc.split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)

def hybrid_search(query, vector_db, alpha=0.7, k=10):
    # 1. Vector search
    query_vector = embed_model.encode(query)
    vector_results = vector_db.search(query_vector, k=k*2)

    # 2. BM25 search
    bm25_scores = bm25.get_scores(query.split())

    # 3. Normalize scores to [0, 1]
    vector_scores = {r['id']: r['score'] for r in vector_results}
    max_v = max(vector_scores.values())
    vector_scores = {k: v/max_v for k, v in vector_scores.items()}

    max_b = max(bm25_scores)
    bm25_scores_norm = {i: score/max_b for i, score in enumerate(bm25_scores)}

    # 4. Combine with alpha weighting
    combined = {}
    for doc_id in set(vector_scores.keys()) | set(bm25_scores_norm.keys()):
        combined[doc_id] = (
            alpha * vector_scores.get(doc_id, 0) +
            (1 - alpha) * bm25_scores_norm.get(doc_id, 0)
        )

    # 5. Sort and return top k
    top_results = sorted(combined.items(), key=lambda x: x[1], reverse=True)[:k]
    return [documents[doc_id] for doc_id, _ in top_results]

Reciprocal Rank Fusion (RRF)

Besser als die Fusion von Scores - kombiniert die Rangfolgen, nicht die Scores :

DEVELOPERpython
def reciprocal_rank_fusion(rankings, k=60):
    """
    rankings: List of document IDs ranked by different methods
    k: Constant (typically 60)
    """
    rrf_scores = {}

    for rank_list in rankings:
        for rank, doc_id in enumerate(rank_list, start=1):
            if doc_id not in rrf_scores:
                rrf_scores[doc_id] = 0
            rrf_scores[doc_id] += 1 / (k + rank)

    return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

# Use it
vector_results = vector_search(query)
bm25_results = bm25_search(query)

final = reciprocal_rank_fusion([
    [r['id'] for r in vector_results],
    [i for i, _ in sorted(enumerate(bm25_scores), key=lambda x: x[1], reverse=True)]
])

Alpha-Anpassung

Testen Sie mit Ihren Abfragen :

DEVELOPERpython
test_queries = ["Marie Curie", "SKU-12345", "comment fonctionne la photosynthèse"]
ground_truth = {...}  # Known relevant docs

alphas = [0.3, 0.5, 0.7, 0.9]
for alpha in alphas:
    recall = evaluate_hybrid(test_queries, ground_truth, alpha)
    print(f"Alpha {alpha}: Recall@10 = {recall}")

Typische optimale Werte :

Technische Docs mit IDs/Codes : alpha = 0.3-0.5 (BM25 bevorzugen)
QA natürliche Sprache : alpha = 0.7-0.8 (vector bevorzugen)
Gemischter Inhalt : alpha = 0.5-0.6

Sparse-Dense-Encoder (Innovation 2025)

Modell, das sparse und dense in einem liefert :

DEVELOPERpython
from transformers import AutoModelForMaskedLM, AutoTokenizer

# SPLADE or BGE-M3 for sparse+dense
model = AutoModelForMaskedLM.from_pretrained('naver/splade-v3')
tokenizer = AutoTokenizer.from_pretrained('naver/splade-v3')

# Get both sparse and dense in one pass
tokens = tokenizer(query, return_tensors='pt')
output = model(**tokens)

sparse_vector = output.logits.max(dim=1).values  # Sparse
dense_vector = output.last_hidden_state.mean(dim=1)  # Dense

Die hybride Suche ist die Geheimwaffe von RAG-Systemen in Produktion. Implementieren Sie sie und sehen Sie, wie Ihr Recall durch die Decke geht.

Hybride RAG-Suche: BM25 + Vektorsuche (2025)

Warum hybride Suche ?

Implementierung (November 2025)

Mit Weaviate

Mit Qdrant

Manueller Hybrid (jede vector-Datenbank)

Reciprocal Rank Fusion (RRF)

Alpha-Anpassung

Sparse-Dense-Encoder (Innovation 2025)

Tags

Verwandte Artikel

Grundlagen des Retrievals: Wie die RAG-Suche funktioniert

Erweiterte Retrieval-Strategien für RAG

Abfrageerweiterung: Relevantere Ergebnisse erhalten

Ailog Assistant