News

MTEB 2026: State of the Embeddings Benchmark

May 7, 2026
7 min read
Ailog Team

Analysis of the MTEB benchmark in 2026: new leaders, leaderboard evolution, and implications for RAG pipelines.

MTEB in 2026: The Landscape Has Changed

The Massive Text Embedding Benchmark (MTEB), the global reference for evaluating embedding models, has seen its rankings disrupted in 2025-2026. Open source Alibaba Qwen3 has taken the lead, Google made a splash with Gemini Embedding, and Cohere revolutionized the market with the first production multimodal embedding.

"The MTEB leaderboard constantly evolves with new submissions," explains Dr. Niklas Muennighoff, researcher at Hugging Face and creator of MTEB. "In 2026, we observe a convergence of scores between open source and proprietary APIs."

MTEB Benchmark Structure

Task Categories

MTEB evaluates embeddings across 8 main categories:

Category# DatasetsDescription
Retrieval15Document search (MS MARCO, BEIR)
STS10Semantic textual similarity
Classification12Text classification
Clustering11Semantic grouping
Reranking4Result re-ordering
Pair Classification3Pair classification
Summarization1Summary evaluation
Bitext Mining4Multilingual alignment

The framework covers over 1000 languages and 58 datasets for English alone.

Evaluation Metrics

MetricDescriptionRAG Usage
nDCG@10Normalized Discounted Cumulative GainRanking quality
MRRMean Reciprocal RankFirst good result position
MAPMean Average PrecisionOverall precision
Recall@kRecall rate at k resultsCoverage

The ranking uses Borda Count by default, aggregating performance across all tasks.

MTEB Ranking January 2026

Global Top 10

RankModelMTEB ScoreTypeSpecificity
1Qwen3-Embedding-8B70.6Open sourceApache 2.0, multilingual
2Google Gemini Embedding68.3APIUltra-low price ($0.008/1M)
3gte-Qwen3-8B68.1Open sourceApache 2.0
4NVIDIA NV-Embed67.5Open sourceBased on Llama-3.1-8B
5Cohere Embed v465.2APIMultimodal (text + images)
6OpenAI text-embedding-3-large64.6APIComplete ecosystem
7Voyage-363.8APIDomain specialization
8BGE-M363.2Open sourceMIT, 568M params
9Jina Embeddings v362.8API/Open8192 max dimensions
10Nomic-embed-v261.4Open sourceCompact (137M params)

Evolution from 2024

Model2024 Score2026 ScoreEvolution
OpenAI text-embedding-3-large64.664.6= (no update)
BGE-M363.263.2=
Qwen3-Embedding-8BN/A70.6New leader
Google Gemini EmbeddingN/A68.3New entrant
Cohere Embed v4N/A65.2New (multimodal)

OpenAI's lack of embedding updates (still text-embedding-3 from late 2023) caused them to lose the top spot.

Best Models by Category

Retrieval (document search)

RankModelRetrieval Score
1Qwen3-Embedding-8B57.8
2Voyage-356.2
3OpenAI text-embedding-3-large55.4

Clustering

RankModelClustering Score
1Qwen3-Embedding-8B51.8
2NVIDIA NV-Embed50.9
3gte-Qwen3-8B50.2

Multilingual (non-English)

RankModelMultilingual Score
1BGE-M362.4
2Qwen3-Embedding-8B61.8
3Cohere Embed v459.5

To choose the right model, check our guide on choosing embeddings.

Focus: The Rise of Open Source

Qwen3 Takes the Lead

For the first time, an open source model dominates the MTEB leaderboard. Alibaba's Qwen3-Embedding-8B:

  • Overall score: 70.6 (surpasses all APIs)
  • License: Apache 2.0 (free commercial use)
  • Size: 8B parameters
  • Multilingual: Excellent on Chinese, good on European
DEVELOPERpython
from sentence_transformers import SentenceTransformer # Load Qwen3-Embedding model = SentenceTransformer('Alibaba-NLP/gte-Qwen3-8B-embedding') embeddings = model.encode( ["Your text to encode"], normalize_embeddings=True )

Implications for Businesses

This evolution changes the game:

AspectBefore (2024)Now (2026)
Best modelProprietary APIOpen source
Optimal costAPI ($0.13/1M)Self-host (free)
SovereigntyCloud dependencySelf-hosting possible
PerformanceAPIs leadingOpen source leading

Focus: Cohere Embed v4 and Multimodal

A Unique Innovation

Cohere Embed v4 is the only production model capable of vectorizing:

  • Text
  • Images
  • Interleaved documents (PDFs, slides)

Its MTEB score (65.2) is lower than leaders on pure text, but it has no equivalent for visual documents.

DEVELOPERpython
import cohere co = cohere.ClientV2('your-api-key') # Image embedding (unique to Cohere) response = co.embed( images=["data:image/jpeg;base64,..."], model="embed-v4", input_type="image", embedding_types=["float"] )

For more details, see our article on Cohere Embed v4 Multimodal.

Implications for RAG Pipelines

Model Selection by Use Case

Use CaseRecommended ModelReason
General (budget)Google Gemini EmbeddingUnbeatable price ($0.008/1M)
General (performance)Qwen3-Embedding-8BBest MTEB score
Visual documentsCohere Embed v4Only multimodal
Code / TechVoyage-code-3Code specialized
LegalVoyage-3-legalLegal specialized
SovereigntyQwen3 or BGE-M3Self-host, open source

Trade-offs to Consider

CriterionAPIsOpen source
SetupImmediateGPU configuration
Variable costYesNo (fixed)
2026 PerformanceLowerHigher
SovereigntyNoYes
MaintenanceZeroMLOps team

Check our guide on RAG cost optimization.

Methodology and Reproducibility

How to Run the Benchmark

DEVELOPERpython
from mteb import MTEB, get_model # Load a model model = get_model("Alibaba-NLP/gte-Qwen3-8B-embedding") # Run evaluation on Retrieval evaluation = MTEB(task_types=["Retrieval"]) results = evaluation.run(model) # Display results print(results)

Interactive Leaderboard

The official leaderboard is available at:

Rankings are dynamic - new submissions can change the order at any time.

Trends Observed in 2026

1. Open Source Dominates

The gap between open source and APIs has reversed. Qwen3 surpasses OpenAI by +6 MTEB points.

2. Multimodal Emerges

Cohere paved the way. Google and OpenAI should follow in 2026-2027.

3. Domain Specialization

Specialized models (Voyage legal/finance/code) outperform generic models by 10-15% in their domains.

4. Prices Plummeting

Google Gemini Embedding at $0.008/1M tokens changes RAG economics.

Our Take

The 2026 MTEB landscape represents a turning point:

Key points:

  • Open source (Qwen3) surpasses proprietary APIs
  • Multimodal (Cohere v4) opens new use cases
  • Prices are falling (Gemini 16x cheaper than OpenAI)

Recommendations:

  • New projects: evaluate Qwen3 (performance) or Gemini (cost)
  • Visual documents: Cohere Embed v4 is essential
  • Existing OpenAI projects: consider migration if performance is critical

Platforms like Ailog integrate these benchmarks to automatically select the best models for your use case.

Check our detailed 2026 embedding comparison for more details.

FAQ

Alibaba invested heavily in multilingual embeddings with Qwen3. The 8B parameter model combines an optimized architecture and training on massive Chinese and English corpora. The Apache 2.0 license enables wide adoption, accelerating community contributions and optimizations.
Yes, but less than before. The model remains stable and well-documented with a complete ecosystem (GPT-5, Assistants API). However, its MTEB score (64.6) is now lower than Qwen3 (70.6) and Google Gemini (68.3). For new projects, other options offer better value.
Cohere Embed v4 allows vectorizing PDFs, slides, and images directly without complex OCR pipelines. This radically simplifies architectures for visual documents. The model has no equivalent - other embeddings are text-only.
If performance is critical and you have GPU infrastructure, yes. Qwen3 surpasses OpenAI by +6 MTEB points. Migration requires complete re-encoding and MLOps expertise. For low to medium volumes without GPU constraints, Google Gemini offers better value without self-hosting complexity.
No, the leaderboard constantly evolves with new submissions. Rankings can change. It's recommended to check the Hugging Face leaderboard regularly and evaluate models on your own dataset before deciding. --- **Need help choosing your embeddings?** [Ailog](https://ailog.fr) automatically integrates the best models for your use case. Benefit from our expertise without the technical complexity.

Tags

RAGMTEBbenchmarkembeddingsevaluation

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !