6. RerankingExperte

LLM Reranking: LLMs nutzen, um Ihre Ergebnisse neu anzuordnen

27. Dezember 2025
10 Minuten Lesezeit
Équipe de Recherche Ailog

LLMs können beim Reranking von Suchergebnissen helfen, indem sie den Kontext tiefgehend verstehen. Erfahren Sie, wann und wie Sie diese kostspielige, aber leistungsstarke Technik einsetzen.

TL;DR

  • LLM reranking = GPT-4/Claude verwenden, um die Relevanz von Ergebnissen zu bewerten
  • Vorteil : semantisches Verständnis, besser als cross-encoders
  • Nachteil : 10–100x langsamer und teurer
  • Anwendungsfälle : komplexe Anfragen, spezialisierte Bereiche, hoher Wert
  • Testen Sie verschiedene Reranking-Strategien auf Ailog

Pourquoi Utiliser un LLM pour le Reranking ?

Les cross-encoders (BERT, Cohere Rerank) sont rapides mais limités :

  • Entraînés sur des données générales
  • Comprennent mal les nuances domaine-spécifiques
  • Score binaire (pertinent/non pertinent)

Les LLMs apportent :

  • Raisonnement : peuvent expliquer pourquoi un document est pertinent
  • Contextualité : comprennent les nuances de la requête
  • Flexibilité : s'adaptent à tout domaine sans fine-tuning

Implémentation de Base

Scoring par LLM

DEVELOPERpython
from openai import OpenAI client = OpenAI() def llm_rerank(query: str, documents: list, top_k: int = 3) -> list: """ Reranke die Dokumente mit einem LLM. """ scored_docs = [] for doc in documents: prompt = f"""Rate the relevance of this document to the query. Query: {query} Document: {doc['content'][:1500]} Rate from 0-10 where: - 0: Completely irrelevant - 5: Partially relevant - 10: Highly relevant and directly answers the query Output ONLY a number between 0 and 10.""" response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}], max_tokens=5, temperature=0 ) try: score = float(response.choices[0].message.content.strip()) except ValueError: score = 5.0 # Standard-Score scored_docs.append({ **doc, "relevance_score": score }) # Nach absteigendem Score sortieren scored_docs.sort(key=lambda x: x["relevance_score"], reverse=True) return scored_docs[:top_k]

Avec Explication

DEVELOPERpython
def llm_rerank_with_reasoning(query: str, documents: list, top_k: int = 3) -> list: """ Reranke mit einer Erklärung der Bewertung. """ prompt = f"""You are a relevance judge. Rate each document's relevance to the query. Query: {query} Documents: """ for i, doc in enumerate(documents): prompt += f"\n[Doc {i+1}]: {doc['content'][:500]}...\n" prompt += """ For each document, output: - Document number - Relevance score (0-10) - One sentence explaining why Format: Doc 1: 8/10 - Directly addresses the main question about... Doc 2: 3/10 - Only tangentially related to... """ response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}], temperature=0 ) # Die Antwort parsen result = parse_ranking_response(response.choices[0].message.content) return result[:top_k]

Approche par Comparaison (Pairwise)

Plus robuste que le scoring absolu :

DEVELOPERpython
def pairwise_llm_rerank(query: str, documents: list, top_k: int = 3) -> list: """ Vergleicht die Dokumente paarweise für ein präziseres Ranking. """ n = len(documents) wins = {i: 0 for i in range(n)} # Jede Paarung vergleichen for i in range(n): for j in range(i + 1, n): winner = compare_pair(query, documents[i], documents[j]) wins[winner] += 1 # Nach Anzahl der Siege sortieren ranked_indices = sorted(wins.keys(), key=lambda x: wins[x], reverse=True) return [documents[i] for i in ranked_indices[:top_k]] def compare_pair(query: str, doc_a: dict, doc_b: dict) -> int: """ Vergleicht zwei Dokumente und gibt den Index des relevanteren zurück. """ prompt = f"""Which document is more relevant to this query? Query: {query} Document A: {doc_a['content'][:800]} Document B: {doc_b['content'][:800]} Answer with only "A" or "B".""" response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}], max_tokens=1, temperature=0 ) answer = response.choices[0].message.content.strip().upper() return 0 if answer == "A" else 1

Optimisation des Coûts

Batch Processing avec Claude

DEVELOPERpython
import anthropic client = anthropic.Anthropic() def batch_llm_rerank(query: str, documents: list, top_k: int = 5) -> list: """ Rerankt alle Dokumente in einem einzigen LLM-Aufruf. """ docs_text = "\n\n".join([ f"[{i+1}] {doc['content'][:600]}" for i, doc in enumerate(documents) ]) prompt = f"""Rank these documents by relevance to the query. Query: {query} Documents: {docs_text} Return ONLY the document numbers in order of relevance, comma-separated. Example: 3,1,5,2,4""" response = client.messages.create( model="claude-3-5-haiku-latest", max_tokens=50, messages=[{"role": "user", "content": prompt}] ) # Die Reihenfolge parsen order_str = response.content[0].text.strip() order = [int(x.strip()) - 1 for x in order_str.split(",")] # Dokumente neu anordnen reranked = [documents[i] for i in order if i < len(documents)] return reranked[:top_k]

Stratégie Hybride : Cross-Encoder + LLM

DEVELOPERpython
from sentence_transformers import CrossEncoder cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2") def hybrid_rerank(query: str, documents: list, top_k: int = 3) -> list: """ 1. Schneller Cross-encoder zum Filtern (Top 10) 2. LLM für das finale Ranking (Top 3) """ # Étape 1: Cross-encoder (rapide) pairs = [(query, doc['content']) for doc in documents] scores = cross_encoder.predict(pairs) # Top 10 par cross-encoder top_indices = scores.argsort()[-10:][::-1] candidates = [documents[i] for i in top_indices] # Étape 2: LLM pour affiner (coûteux mais sur moins de docs) final_ranking = batch_llm_rerank(query, candidates, top_k) return final_ranking

LLM Reranking avec Contexte Domaine

Prompt Spécialisé

DEVELOPERpython
def domain_specific_rerank( query: str, documents: list, domain: str, top_k: int = 3 ) -> list: """ Reranking mit domänenspezifischem Kontext. """ domain_context = { "legal": """You are a legal research expert. Prioritize: - Exact legal citations and case law - Jurisdictional relevance - Recency of legal precedents""", "medical": """You are a medical research expert. Prioritize: - Clinical evidence and study quality - Patient safety considerations - Guideline compliance""", "ecommerce": """You are an e-commerce product expert. Prioritize: - Product specification matches - Price and availability relevance - User intent (browse vs. buy)""" } context = domain_context.get(domain, "You are a relevance expert.") prompt = f"""{context} Query: {query} Rank these documents by relevance: {format_documents(documents)} Return document numbers in order of relevance.""" # ... LLM-Aufruf

Comparaison des Approches

MéthodeLatenceCoûtQualitéCas d'usage
Cross-encoder~50msGratuitBonneUsage général
Cohere Rerank~100ms$1/1K reqTrès bonneProduction
GPT-4o-mini~500ms$0.15/1KExcellenteDomaines spécialisés
GPT-4o~1s$2.50/1KMeilleureHaute valeur
Claude Haiku~300ms$0.25/1KTrès bonneBon rapport qualité/prix

Coûts approximatifs pour reranking de 10 documents

Quand Utiliser le LLM Reranking

Utilisez-le quand :

  • Requêtes complexes ou multi-hop
  • Domaine très spécialisé sans données d'entraînement
  • Haute valeur par requête (juridique, médical, finance)
  • Besoin d'explications sur le ranking
  • Cross-encoders insuffisants

Évitez quand :

  • Volume élevé (> 1000 req/jour)
  • Latence critique (< 200ms requis)
  • Budget limité
  • Requêtes simples et directes

Métriques et Évaluation

DEVELOPERpython
def evaluate_reranking( queries: list, ground_truth: dict, rerank_fn: callable ) -> dict: """ Bewertet die Qualität des Rerankings. """ metrics = { "mrr": [], # Mean Reciprocal Rank "ndcg@3": [], # Normalized DCG "precision@1": [] } for query in queries: # Kandidaten abrufen candidates = retrieve(query, k=20) # Reranken reranked = rerank_fn(query, candidates) # Metriken berechnen relevant_docs = ground_truth[query] # MRR for i, doc in enumerate(reranked): if doc['id'] in relevant_docs: metrics["mrr"].append(1 / (i + 1)) break else: metrics["mrr"].append(0) # Precision@1 if reranked and reranked[0]['id'] in relevant_docs: metrics["precision@1"].append(1) else: metrics["precision@1"].append(0) return {k: sum(v)/len(v) for k, v in metrics.items()}

Exemple Complet

DEVELOPERpython
class LLMReranker: def __init__(self, model: str = "gpt-4o-mini", domain: str = None): self.model = model self.domain = domain self.client = OpenAI() def rerank( self, query: str, documents: list, top_k: int = 5, use_hybrid: bool = True ) -> list: """ Vollständige Reranking-Pipeline. """ # Étape 1: Pré-filtrage si beaucoup de documents if use_hybrid and len(documents) > 15: documents = self._cross_encoder_filter(query, documents, k=15) # Étape 2: LLM reranking reranked = self._llm_rank(query, documents) return reranked[:top_k] def _cross_encoder_filter(self, query, docs, k): encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2") pairs = [(query, d['content']) for d in docs] scores = encoder.predict(pairs) top_idx = scores.argsort()[-k:][::-1] return [docs[i] for i in top_idx] def _llm_rank(self, query, docs): # ... Implementierung von batch_llm_rerank pass # Usage reranker = LLMReranker(model="gpt-4o-mini", domain="legal") results = reranker.rerank(query, candidates, top_k=5)

Guides connexes

Reranking :

Retrieval :


Le LLM reranking est-il adapté à votre cas d'usage ? Évaluons ensemble votre pipeline →

Tags

rerankingllmgptclauderetrieval

Verwandte Artikel

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !