MTEB 2026: State of the Embeddings Benchmark
Analysis of the MTEB benchmark in 2026: new leaders, leaderboard evolution, and implications for RAG pipelines.
MTEB in 2026: The Landscape Has Changed
The Massive Text Embedding Benchmark (MTEB), the global reference for evaluating embedding models, has seen its rankings disrupted in 2025-2026. Open source Alibaba Qwen3 has taken the lead, Google made a splash with Gemini Embedding, and Cohere revolutionized the market with the first production multimodal embedding.
"The MTEB leaderboard constantly evolves with new submissions," explains Dr. Niklas Muennighoff, researcher at Hugging Face and creator of MTEB. "In 2026, we observe a convergence of scores between open source and proprietary APIs."
MTEB Benchmark Structure
Task Categories
MTEB evaluates embeddings across 8 main categories:
| Category | # Datasets | Description |
|---|---|---|
| Retrieval | 15 | Document search (MS MARCO, BEIR) |
| STS | 10 | Semantic textual similarity |
| Classification | 12 | Text classification |
| Clustering | 11 | Semantic grouping |
| Reranking | 4 | Result re-ordering |
| Pair Classification | 3 | Pair classification |
| Summarization | 1 | Summary evaluation |
| Bitext Mining | 4 | Multilingual alignment |
The framework covers over 1000 languages and 58 datasets for English alone.
Evaluation Metrics
| Metric | Description | RAG Usage |
|---|---|---|
| nDCG@10 | Normalized Discounted Cumulative Gain | Ranking quality |
| MRR | Mean Reciprocal Rank | First good result position |
| MAP | Mean Average Precision | Overall precision |
| Recall@k | Recall rate at k results | Coverage |
The ranking uses Borda Count by default, aggregating performance across all tasks.
MTEB Ranking January 2026
Global Top 10
| Rank | Model | MTEB Score | Type | Specificity |
|---|---|---|---|---|
| 1 | Qwen3-Embedding-8B | 70.6 | Open source | Apache 2.0, multilingual |
| 2 | Google Gemini Embedding | 68.3 | API | Ultra-low price ($0.008/1M) |
| 3 | gte-Qwen3-8B | 68.1 | Open source | Apache 2.0 |
| 4 | NVIDIA NV-Embed | 67.5 | Open source | Based on Llama-3.1-8B |
| 5 | Cohere Embed v4 | 65.2 | API | Multimodal (text + images) |
| 6 | OpenAI text-embedding-3-large | 64.6 | API | Complete ecosystem |
| 7 | Voyage-3 | 63.8 | API | Domain specialization |
| 8 | BGE-M3 | 63.2 | Open source | MIT, 568M params |
| 9 | Jina Embeddings v3 | 62.8 | API/Open | 8192 max dimensions |
| 10 | Nomic-embed-v2 | 61.4 | Open source | Compact (137M params) |
Evolution from 2024
| Model | 2024 Score | 2026 Score | Evolution |
|---|---|---|---|
| OpenAI text-embedding-3-large | 64.6 | 64.6 | = (no update) |
| BGE-M3 | 63.2 | 63.2 | = |
| Qwen3-Embedding-8B | N/A | 70.6 | New leader |
| Google Gemini Embedding | N/A | 68.3 | New entrant |
| Cohere Embed v4 | N/A | 65.2 | New (multimodal) |
OpenAI's lack of embedding updates (still text-embedding-3 from late 2023) caused them to lose the top spot.
Best Models by Category
Retrieval (document search)
| Rank | Model | Retrieval Score |
|---|---|---|
| 1 | Qwen3-Embedding-8B | 57.8 |
| 2 | Voyage-3 | 56.2 |
| 3 | OpenAI text-embedding-3-large | 55.4 |
Clustering
| Rank | Model | Clustering Score |
|---|---|---|
| 1 | Qwen3-Embedding-8B | 51.8 |
| 2 | NVIDIA NV-Embed | 50.9 |
| 3 | gte-Qwen3-8B | 50.2 |
Multilingual (non-English)
| Rank | Model | Multilingual Score |
|---|---|---|
| 1 | BGE-M3 | 62.4 |
| 2 | Qwen3-Embedding-8B | 61.8 |
| 3 | Cohere Embed v4 | 59.5 |
To choose the right model, check our guide on choosing embeddings.
Focus: The Rise of Open Source
Qwen3 Takes the Lead
For the first time, an open source model dominates the MTEB leaderboard. Alibaba's Qwen3-Embedding-8B:
- Overall score: 70.6 (surpasses all APIs)
- License: Apache 2.0 (free commercial use)
- Size: 8B parameters
- Multilingual: Excellent on Chinese, good on European
DEVELOPERpythonfrom sentence_transformers import SentenceTransformer # Load Qwen3-Embedding model = SentenceTransformer('Alibaba-NLP/gte-Qwen3-8B-embedding') embeddings = model.encode( ["Your text to encode"], normalize_embeddings=True )
Implications for Businesses
This evolution changes the game:
| Aspect | Before (2024) | Now (2026) |
|---|---|---|
| Best model | Proprietary API | Open source |
| Optimal cost | API ($0.13/1M) | Self-host (free) |
| Sovereignty | Cloud dependency | Self-hosting possible |
| Performance | APIs leading | Open source leading |
Focus: Cohere Embed v4 and Multimodal
A Unique Innovation
Cohere Embed v4 is the only production model capable of vectorizing:
- Text
- Images
- Interleaved documents (PDFs, slides)
Its MTEB score (65.2) is lower than leaders on pure text, but it has no equivalent for visual documents.
DEVELOPERpythonimport cohere co = cohere.ClientV2('your-api-key') # Image embedding (unique to Cohere) response = co.embed( images=["data:image/jpeg;base64,..."], model="embed-v4", input_type="image", embedding_types=["float"] )
For more details, see our article on Cohere Embed v4 Multimodal.
Implications for RAG Pipelines
Model Selection by Use Case
| Use Case | Recommended Model | Reason |
|---|---|---|
| General (budget) | Google Gemini Embedding | Unbeatable price ($0.008/1M) |
| General (performance) | Qwen3-Embedding-8B | Best MTEB score |
| Visual documents | Cohere Embed v4 | Only multimodal |
| Code / Tech | Voyage-code-3 | Code specialized |
| Legal | Voyage-3-legal | Legal specialized |
| Sovereignty | Qwen3 or BGE-M3 | Self-host, open source |
Trade-offs to Consider
| Criterion | APIs | Open source |
|---|---|---|
| Setup | Immediate | GPU configuration |
| Variable cost | Yes | No (fixed) |
| 2026 Performance | Lower | Higher |
| Sovereignty | No | Yes |
| Maintenance | Zero | MLOps team |
Check our guide on RAG cost optimization.
Methodology and Reproducibility
How to Run the Benchmark
DEVELOPERpythonfrom mteb import MTEB, get_model # Load a model model = get_model("Alibaba-NLP/gte-Qwen3-8B-embedding") # Run evaluation on Retrieval evaluation = MTEB(task_types=["Retrieval"]) results = evaluation.run(model) # Display results print(results)
Interactive Leaderboard
The official leaderboard is available at:
Rankings are dynamic - new submissions can change the order at any time.
Trends Observed in 2026
1. Open Source Dominates
The gap between open source and APIs has reversed. Qwen3 surpasses OpenAI by +6 MTEB points.
2. Multimodal Emerges
Cohere paved the way. Google and OpenAI should follow in 2026-2027.
3. Domain Specialization
Specialized models (Voyage legal/finance/code) outperform generic models by 10-15% in their domains.
4. Prices Plummeting
Google Gemini Embedding at $0.008/1M tokens changes RAG economics.
Our Take
The 2026 MTEB landscape represents a turning point:
Key points:
- Open source (Qwen3) surpasses proprietary APIs
- Multimodal (Cohere v4) opens new use cases
- Prices are falling (Gemini 16x cheaper than OpenAI)
Recommendations:
- New projects: evaluate Qwen3 (performance) or Gemini (cost)
- Visual documents: Cohere Embed v4 is essential
- Existing OpenAI projects: consider migration if performance is critical
Platforms like Ailog integrate these benchmarks to automatically select the best models for your use case.
Check our detailed 2026 embedding comparison for more details.
FAQ
Tags
Related Posts
Embedding Models 2026: Benchmark and Comparison
Comprehensive comparison of the best embedding models in 2026. MTEB benchmarks, multilingual performance, and recommendations for your RAG applications.
Cohere Embed v4: The First Production Multimodal Embedding
Cohere launches Embed v4 Multimodal, the first embedding model capable of vectorizing text, images, and interleaved documents. A revolution for multimodal RAG.
Hugging Face: New Open-Source RAG Models
Hugging Face releases a new family of models optimized for RAG: embeddings, rerankers, and specialized LLMs. Complete overview.