6. RerankingAvancé

Cross-Encoder Reranking for RAG Precision

16 novembre 2025
11 min read
Ailog Research Team

Achieve 95%+ precision: use cross-encoders to rerank retrieved documents and eliminate false positives.

Why Cross-Encoders?

Bi-encoders (standard embeddings) encode query and document separately. Cross-encoders process them together - much more accurate but slower.

Bi-encoder: sim(encode(query), encode(doc)) Cross-encoder: score(query + doc together)

Implementation

DEVELOPERpython
from sentence_transformers import CrossEncoder model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2') def rerank(query, documents, top_k=5): # Create query-document pairs pairs = [[query, doc] for doc in documents] # Score all pairs scores = model.predict(pairs) # Sort by score ranked = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True) return [doc for doc, score in ranked[:top_k]] # Use it initial_results = vector_search(query, k=100) final_results = rerank(query, initial_results, top_k=10)

Best Models (November 2025)

1. ms-marco-MiniLM-L-12-v2

  • Fast, accurate
  • Best for general purpose

2. bge-reranker-v2-m3

  • Multilingual
  • SOTA accuracy

3. jina-reranker-v2-base-multilingual

  • 89 languages
  • Production-ready

Two-Stage Retrieval

DEVELOPERpython
def two_stage_rag(query, vector_db): # Stage 1: Fast bi-encoder retrieval (100 candidates) candidates = vector_db.search( query_embedding=embed(query), k=100 ) # Stage 2: Slow but accurate cross-encoder reranking cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-12-v2') pairs = [[query, doc['content']] for doc in candidates] scores = cross_encoder.predict(pairs) # Return top 10 ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True) return [doc for doc, _ in ranked[:10]]

Performance Optimization

Cross-encoders are slow - optimize:

DEVELOPERpython
# Batch processing def batch_rerank(query, documents, batch_size=32): pairs = [[query, doc] for doc in documents] all_scores = [] for i in range(0, len(pairs), batch_size): batch = pairs[i:i+batch_size] scores = model.predict(batch) all_scores.extend(scores) return sorted(zip(documents, all_scores), key=lambda x: x[1], reverse=True)

When to Rerank

Always rerank when:

  • Precision is critical
  • Cost of false positives is high
  • You have compute budget

Skip reranking when:

  • Latency < 100ms required
  • High QPS (> 1000/sec)
  • Budget constrained

Tags

rerankingcross-encoderprecisionaccuracy

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !