6. RerankingIntermediate

Cohere Rerank API for Production RAG

November 17, 2025
8 min read
Ailog Research Team

Boost RAG accuracy by 40% with Cohere's Rerank API: simple integration, multilingual support, production-ready.

Why Cohere Rerank?

  • ✅ 40% accuracy improvement over bi-encoders
  • ✅ 100+ languages supported
  • ✅ Hosted API (no model hosting)
  • ✅ Fast (< 200ms for 100 docs)

Quick Start

DEVELOPERpython
import cohere co = cohere.Client('YOUR_API_KEY') def rerank_with_cohere(query, documents): results = co.rerank( model='rerank-english-v3.0', # or 'rerank-multilingual-v3.0' query=query, documents=documents, top_n=10 ) return [doc['text'] for doc in results.results] # Use it retrieved_docs = vector_search(query, k=100) reranked = rerank_with_cohere(query, retrieved_docs)

Models (November 2025)

rerank-english-v3.0

  • English only
  • $1 per 1000 searches
  • Best accuracy

rerank-multilingual-v3.0

  • 100+ languages
  • $1 per 1000 searches
  • Excellent for global apps

With Metadata

DEVELOPERpython
results = co.rerank( query=query, documents=[ {"text": doc, "metadata": {"source": "wiki", "date": "2025"}} for doc in documents ], top_n=10, return_documents=True ) for r in results.results: print(f"Score: {r.relevance_score}") print(f"Text: {r.document['text']}") print(f"Metadata: {r.document['metadata']}")

Cost Optimization

DEVELOPERpython
# Only rerank if initial score is low def smart_rerank(query, initial_results, threshold=0.7): # If top result has high confidence, skip reranking if initial_results[0]['score'] > threshold: return initial_results[:10] # Otherwise, rerank return rerank_with_cohere(query, [r['text'] for r in initial_results])

Cohere Rerank is the easiest way to dramatically improve RAG accuracy. Just plug it in after retrieval.

Tags

coherererankapiproduction

Related Guides