LLM Reranking: LLMs nutzen, um Ihre Ergebnisse neu anzuordnen
LLMs können beim Reranking von Suchergebnissen helfen, indem sie den Kontext tiefgehend verstehen. Erfahren Sie, wann und wie Sie diese kostspielige, aber leistungsstarke Technik einsetzen.
TL;DR
- LLM reranking = GPT-4/Claude verwenden, um die Relevanz von Ergebnissen zu bewerten
- Vorteil : semantisches Verständnis, besser als cross-encoders
- Nachteil : 10–100x langsamer und teurer
- Anwendungsfälle : komplexe Anfragen, spezialisierte Bereiche, hoher Wert
- Testen Sie verschiedene Reranking-Strategien auf Ailog
Pourquoi Utiliser un LLM pour le Reranking ?
Les cross-encoders (BERT, Cohere Rerank) sont rapides mais limités :
- Entraînés sur des données générales
- Comprennent mal les nuances domaine-spécifiques
- Score binaire (pertinent/non pertinent)
Les LLMs apportent :
- Raisonnement : peuvent expliquer pourquoi un document est pertinent
- Contextualité : comprennent les nuances de la requête
- Flexibilité : s'adaptent à tout domaine sans fine-tuning
Implémentation de Base
Scoring par LLM
DEVELOPERpythonfrom openai import OpenAI client = OpenAI() def llm_rerank(query: str, documents: list, top_k: int = 3) -> list: """ Reranke die Dokumente mit einem LLM. """ scored_docs = [] for doc in documents: prompt = f"""Rate the relevance of this document to the query. Query: {query} Document: {doc['content'][:1500]} Rate from 0-10 where: - 0: Completely irrelevant - 5: Partially relevant - 10: Highly relevant and directly answers the query Output ONLY a number between 0 and 10.""" response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}], max_tokens=5, temperature=0 ) try: score = float(response.choices[0].message.content.strip()) except ValueError: score = 5.0 # Standard-Score scored_docs.append({ **doc, "relevance_score": score }) # Nach absteigendem Score sortieren scored_docs.sort(key=lambda x: x["relevance_score"], reverse=True) return scored_docs[:top_k]
Avec Explication
DEVELOPERpythondef llm_rerank_with_reasoning(query: str, documents: list, top_k: int = 3) -> list: """ Reranke mit einer Erklärung der Bewertung. """ prompt = f"""You are a relevance judge. Rate each document's relevance to the query. Query: {query} Documents: """ for i, doc in enumerate(documents): prompt += f"\n[Doc {i+1}]: {doc['content'][:500]}...\n" prompt += """ For each document, output: - Document number - Relevance score (0-10) - One sentence explaining why Format: Doc 1: 8/10 - Directly addresses the main question about... Doc 2: 3/10 - Only tangentially related to... """ response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}], temperature=0 ) # Die Antwort parsen result = parse_ranking_response(response.choices[0].message.content) return result[:top_k]
Approche par Comparaison (Pairwise)
Plus robuste que le scoring absolu :
DEVELOPERpythondef pairwise_llm_rerank(query: str, documents: list, top_k: int = 3) -> list: """ Vergleicht die Dokumente paarweise für ein präziseres Ranking. """ n = len(documents) wins = {i: 0 for i in range(n)} # Jede Paarung vergleichen for i in range(n): for j in range(i + 1, n): winner = compare_pair(query, documents[i], documents[j]) wins[winner] += 1 # Nach Anzahl der Siege sortieren ranked_indices = sorted(wins.keys(), key=lambda x: wins[x], reverse=True) return [documents[i] for i in ranked_indices[:top_k]] def compare_pair(query: str, doc_a: dict, doc_b: dict) -> int: """ Vergleicht zwei Dokumente und gibt den Index des relevanteren zurück. """ prompt = f"""Which document is more relevant to this query? Query: {query} Document A: {doc_a['content'][:800]} Document B: {doc_b['content'][:800]} Answer with only "A" or "B".""" response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}], max_tokens=1, temperature=0 ) answer = response.choices[0].message.content.strip().upper() return 0 if answer == "A" else 1
Optimisation des Coûts
Batch Processing avec Claude
DEVELOPERpythonimport anthropic client = anthropic.Anthropic() def batch_llm_rerank(query: str, documents: list, top_k: int = 5) -> list: """ Rerankt alle Dokumente in einem einzigen LLM-Aufruf. """ docs_text = "\n\n".join([ f"[{i+1}] {doc['content'][:600]}" for i, doc in enumerate(documents) ]) prompt = f"""Rank these documents by relevance to the query. Query: {query} Documents: {docs_text} Return ONLY the document numbers in order of relevance, comma-separated. Example: 3,1,5,2,4""" response = client.messages.create( model="claude-3-5-haiku-latest", max_tokens=50, messages=[{"role": "user", "content": prompt}] ) # Die Reihenfolge parsen order_str = response.content[0].text.strip() order = [int(x.strip()) - 1 for x in order_str.split(",")] # Dokumente neu anordnen reranked = [documents[i] for i in order if i < len(documents)] return reranked[:top_k]
Stratégie Hybride : Cross-Encoder + LLM
DEVELOPERpythonfrom sentence_transformers import CrossEncoder cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2") def hybrid_rerank(query: str, documents: list, top_k: int = 3) -> list: """ 1. Schneller Cross-encoder zum Filtern (Top 10) 2. LLM für das finale Ranking (Top 3) """ # Étape 1: Cross-encoder (rapide) pairs = [(query, doc['content']) for doc in documents] scores = cross_encoder.predict(pairs) # Top 10 par cross-encoder top_indices = scores.argsort()[-10:][::-1] candidates = [documents[i] for i in top_indices] # Étape 2: LLM pour affiner (coûteux mais sur moins de docs) final_ranking = batch_llm_rerank(query, candidates, top_k) return final_ranking
LLM Reranking avec Contexte Domaine
Prompt Spécialisé
DEVELOPERpythondef domain_specific_rerank( query: str, documents: list, domain: str, top_k: int = 3 ) -> list: """ Reranking mit domänenspezifischem Kontext. """ domain_context = { "legal": """You are a legal research expert. Prioritize: - Exact legal citations and case law - Jurisdictional relevance - Recency of legal precedents""", "medical": """You are a medical research expert. Prioritize: - Clinical evidence and study quality - Patient safety considerations - Guideline compliance""", "ecommerce": """You are an e-commerce product expert. Prioritize: - Product specification matches - Price and availability relevance - User intent (browse vs. buy)""" } context = domain_context.get(domain, "You are a relevance expert.") prompt = f"""{context} Query: {query} Rank these documents by relevance: {format_documents(documents)} Return document numbers in order of relevance.""" # ... LLM-Aufruf
Comparaison des Approches
| Méthode | Latence | Coût | Qualité | Cas d'usage |
|---|---|---|---|---|
| Cross-encoder | ~50ms | Gratuit | Bonne | Usage général |
| Cohere Rerank | ~100ms | $1/1K req | Très bonne | Production |
| GPT-4o-mini | ~500ms | $0.15/1K | Excellente | Domaines spécialisés |
| GPT-4o | ~1s | $2.50/1K | Meilleure | Haute valeur |
| Claude Haiku | ~300ms | $0.25/1K | Très bonne | Bon rapport qualité/prix |
Coûts approximatifs pour reranking de 10 documents
Quand Utiliser le LLM Reranking
Utilisez-le quand :
- Requêtes complexes ou multi-hop
- Domaine très spécialisé sans données d'entraînement
- Haute valeur par requête (juridique, médical, finance)
- Besoin d'explications sur le ranking
- Cross-encoders insuffisants
Évitez quand :
- Volume élevé (> 1000 req/jour)
- Latence critique (< 200ms requis)
- Budget limité
- Requêtes simples et directes
Métriques et Évaluation
DEVELOPERpythondef evaluate_reranking( queries: list, ground_truth: dict, rerank_fn: callable ) -> dict: """ Bewertet die Qualität des Rerankings. """ metrics = { "mrr": [], # Mean Reciprocal Rank "ndcg@3": [], # Normalized DCG "precision@1": [] } for query in queries: # Kandidaten abrufen candidates = retrieve(query, k=20) # Reranken reranked = rerank_fn(query, candidates) # Metriken berechnen relevant_docs = ground_truth[query] # MRR for i, doc in enumerate(reranked): if doc['id'] in relevant_docs: metrics["mrr"].append(1 / (i + 1)) break else: metrics["mrr"].append(0) # Precision@1 if reranked and reranked[0]['id'] in relevant_docs: metrics["precision@1"].append(1) else: metrics["precision@1"].append(0) return {k: sum(v)/len(v) for k, v in metrics.items()}
Exemple Complet
DEVELOPERpythonclass LLMReranker: def __init__(self, model: str = "gpt-4o-mini", domain: str = None): self.model = model self.domain = domain self.client = OpenAI() def rerank( self, query: str, documents: list, top_k: int = 5, use_hybrid: bool = True ) -> list: """ Vollständige Reranking-Pipeline. """ # Étape 1: Pré-filtrage si beaucoup de documents if use_hybrid and len(documents) > 15: documents = self._cross_encoder_filter(query, documents, k=15) # Étape 2: LLM reranking reranked = self._llm_rank(query, documents) return reranked[:top_k] def _cross_encoder_filter(self, query, docs, k): encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2") pairs = [(query, d['content']) for d in docs] scores = encoder.predict(pairs) top_idx = scores.argsort()[-k:][::-1] return [docs[i] for i in top_idx] def _llm_rank(self, query, docs): # ... Implementierung von batch_llm_rerank pass # Usage reranker = LLMReranker(model="gpt-4o-mini", domain="legal") results = reranker.rerank(query, candidates, top_k=5)
Guides connexes
Reranking :
- Reranking pour RAG - Vue d'ensemble du reranking
- Cross-Encoder Reranking - Approche rapide
- API Cohere Rerank - Solution managée
Retrieval :
- Stratégies de Récupération - Techniques avancées
- Optimisation des Coûts RAG - Réduire les coûts
Le LLM reranking est-il adapté à votre cas d'usage ? Évaluons ensemble votre pipeline →
Tags
Verwandte Artikel
Cross-Encoder-Re-Ranking für höhere RAG-Genauigkeit
Erreichen Sie über 95 % Präzision: Verwenden Sie Cross-Encoder, um abgerufene Dokumente neu zu bewerten und False Positives zu eliminieren.
Cohere Rerank API für den RAG-Produktivbetrieb
Steigern Sie die RAG-Genauigkeit um 40% mit der Cohere Rerank API: Einfache Integration, mehrsprachiger Support, produktionsbereit.
Reranking RAG : +40% Genauigkeit mit Cross-Encoders (Leitfaden 2025)
+40% RAG-Genauigkeit dank Reranking. Umfassender Leitfaden zu Cross-Encoders, der API Cohere Rerank und ColBERT für Ihre Retrieval-Systeme in Produktion.