Cohere Rerank API for Production RAG
Boost RAG accuracy by 40% with Cohere's Rerank API: simple integration, multilingual support, production-ready.
- Author
- Ailog Research Team
- Published
- Reading time
- 8 min read
- Level
- intermediate
- RAG Pipeline Step
- Reranking
Why Cohere Rerank? • ✅ 40% accuracy improvement over bi-encoders • ✅ 100+ languages supported • ✅ Hosted API (no model hosting) • ✅ Fast (< 200ms for 100 docs)
Quick Start
``python import cohere
co = cohere.Client('YOUR_API_KEY')
def rerank_with_cohere(query, documents): results = co.rerank( model='rerank-english-v3.0', or 'rerank-multilingual-v3.0' query=query, documents=documents, top_n=10 ) return [doc['text'] for doc in results.results]
Use it retrieved_docs = vector_search(query, k=100) reranked = rerank_with_cohere(query, retrieved_docs) `
Models (November 2025)
rerank-english-v3.0 • English only • $1 per 1000 searches • Best accuracy
rerank-multilingual-v3.0 • 100+ languages • $1 per 1000 searches • Excellent for global apps
With Metadata
`python results = co.rerank( query=query, documents=[ {"text": doc, "metadata": {"source": "wiki", "date": "2025"}} for doc in documents ], top_n=10, return_documents=True )
for r in results.results: print(f"Score: {r.relevance_score}") print(f"Text: {r.document['text']}") print(f"Metadata: {r.document['metadata']}") `
Cost Optimization
`python Only rerank if initial score is low def smart_rerank(query, initial_results, threshold=0.7): If top result has high confidence, skip reranking if initial_results[0]['score'] > threshold: return initial_results[:10] Otherwise, rerank return rerank_with_cohere(query, [r['text'] for r in initial_results]) ``
Cohere Rerank is the easiest way to dramatically improve RAG accuracy. Just plug it in after retrieval.