MTEB 2026: State of the Embeddings Benchmark

Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

MTEB in 2026: The Landscape Has Changed

The Massive Text Embedding Benchmark (MTEB), the global reference for evaluating embedding models, has seen its rankings disrupted in 2025-2026. Open source Alibaba Qwen3 has taken the lead, Google made a splash with Gemini Embedding, and Cohere revolutionized the market with the first production multimodal embedding.

"The MTEB leaderboard constantly evolves with new submissions," explains Dr. Niklas Muennighoff, researcher at Hugging Face and creator of MTEB. "In 2026, we observe a convergence of scores between open source and proprietary APIs."

MTEB Benchmark Structure

Task Categories

MTEB evaluates embeddings across 8 main categories:

Category	# Datasets	Description
Retrieval	15	Document search (MS MARCO, BEIR)
STS	10	Semantic textual similarity
Classification	12	Text classification
Clustering	11	Semantic grouping
Reranking	4	Result re-ordering
Pair Classification	3	Pair classification
Summarization	1	Summary evaluation
Bitext Mining	4	Multilingual alignment

The framework covers over 1000 languages and 58 datasets for English alone.

Evaluation Metrics

Metric	Description	RAG Usage
nDCG@10	Normalized Discounted Cumulative Gain	Ranking quality
MRR	Mean Reciprocal Rank	First good result position
MAP	Mean Average Precision	Overall precision
Recall@k	Recall rate at k results	Coverage

The ranking uses Borda Count by default, aggregating performance across all tasks.

MTEB Ranking January 2026

Global Top 10

Rank	Model	MTEB Score	Type	Specificity
1	Qwen3-Embedding-8B	70.6	Open source	Apache 2.0, multilingual
2	Google Gemini Embedding	68.3	API	Ultra-low price ($0.008/1M)
3	gte-Qwen3-8B	68.1	Open source	Apache 2.0
4	NVIDIA NV-Embed	67.5	Open source	Based on Llama-3.1-8B
5	Cohere Embed v4	65.2	API	Multimodal (text + images)
6	OpenAI text-embedding-3-large	64.6	API	Complete ecosystem
7	Voyage-3	63.8	API	Domain specialization
8	BGE-M3	63.2	Open source	MIT, 568M params
9	Jina Embeddings v3	62.8	API/Open	8192 max dimensions
10	Nomic-embed-v2	61.4	Open source	Compact (137M params)

Evolution from 2024

Model	2024 Score	2026 Score	Evolution
OpenAI text-embedding-3-large	64.6	64.6	= (no update)
BGE-M3	63.2	63.2	=
Qwen3-Embedding-8B	N/A	70.6	New leader
Google Gemini Embedding	N/A	68.3	New entrant
Cohere Embed v4	N/A	65.2	New (multimodal)

OpenAI's lack of embedding updates (still text-embedding-3 from late 2023) caused them to lose the top spot.

Best Models by Category

Retrieval (document search)

Rank	Model	Retrieval Score
1	Qwen3-Embedding-8B	57.8
2	Voyage-3	56.2
3	OpenAI text-embedding-3-large	55.4

Clustering

Rank	Model	Clustering Score
1	Qwen3-Embedding-8B	51.8
2	NVIDIA NV-Embed	50.9
3	gte-Qwen3-8B	50.2

Multilingual (non-English)

Rank	Model	Multilingual Score
1	BGE-M3	62.4
2	Qwen3-Embedding-8B	61.8
3	Cohere Embed v4	59.5

To choose the right model, check our guide on choosing embeddings.

Focus: The Rise of Open Source

Qwen3 Takes the Lead

For the first time, an open source model dominates the MTEB leaderboard. Alibaba's Qwen3-Embedding-8B:

Overall score: 70.6 (surpasses all APIs)
License: Apache 2.0 (free commercial use)
Size: 8B parameters
Multilingual: Excellent on Chinese, good on European

DEVELOPERpython
from sentence_transformers import SentenceTransformer

# Load Qwen3-Embedding
model = SentenceTransformer('Alibaba-NLP/gte-Qwen3-8B-embedding')

embeddings = model.encode(
    ["Your text to encode"],
    normalize_embeddings=True
)

Implications for Businesses

This evolution changes the game:

Aspect	Before (2024)	Now (2026)
Best model	Proprietary API	Open source
Optimal cost	API ($0.13/1M)	Self-host (free)
Sovereignty	Cloud dependency	Self-hosting possible
Performance	APIs leading	Open source leading

Focus: Cohere Embed v4 and Multimodal

A Unique Innovation

Cohere Embed v4 is the only production model capable of vectorizing:

Text
Images
Interleaved documents (PDFs, slides)

Its MTEB score (65.2) is lower than leaders on pure text, but it has no equivalent for visual documents.

DEVELOPERpython
import cohere

co = cohere.ClientV2('your-api-key')

# Image embedding (unique to Cohere)
response = co.embed(
    images=["data:image/jpeg;base64,..."],
    model="embed-v4",
    input_type="image",
    embedding_types=["float"]
)

For more details, see our article on Cohere Embed v4 Multimodal.

Implications for RAG Pipelines

Model Selection by Use Case

Use Case	Recommended Model	Reason
General (budget)	Google Gemini Embedding	Unbeatable price ($0.008/1M)
General (performance)	Qwen3-Embedding-8B	Best MTEB score
Visual documents	Cohere Embed v4	Only multimodal
Code / Tech	Voyage-code-3	Code specialized
Legal	Voyage-3-legal	Legal specialized
Sovereignty	Qwen3 or BGE-M3	Self-host, open source

Trade-offs to Consider

Criterion	APIs	Open source
Setup	Immediate	GPU configuration
Variable cost	Yes	No (fixed)
2026 Performance	Lower	Higher
Sovereignty	No	Yes
Maintenance	Zero	MLOps team

Check our guide on RAG cost optimization.

Methodology and Reproducibility

How to Run the Benchmark

DEVELOPERpython
from mteb import MTEB, get_model

# Load a model
model = get_model("Alibaba-NLP/gte-Qwen3-8B-embedding")

# Run evaluation on Retrieval
evaluation = MTEB(task_types=["Retrieval"])
results = evaluation.run(model)

# Display results
print(results)

Interactive Leaderboard

The official leaderboard is available at:

Hugging Face MTEB Leaderboard

Rankings are dynamic - new submissions can change the order at any time.

Trends Observed in 2026

1. Open Source Dominates

The gap between open source and APIs has reversed. Qwen3 surpasses OpenAI by +6 MTEB points.

2. Multimodal Emerges

Cohere paved the way. Google and OpenAI should follow in 2026-2027.

3. Domain Specialization

Specialized models (Voyage legal/finance/code) outperform generic models by 10-15% in their domains.

4. Prices Plummeting

Google Gemini Embedding at $0.008/1M tokens changes RAG economics.

Our Take

The 2026 MTEB landscape represents a turning point:

Key points:

Open source (Qwen3) surpasses proprietary APIs
Multimodal (Cohere v4) opens new use cases
Prices are falling (Gemini 16x cheaper than OpenAI)

Recommendations:

New projects: evaluate Qwen3 (performance) or Gemini (cost)
Visual documents: Cohere Embed v4 is essential
Existing OpenAI projects: consider migration if performance is critical

Platforms like Ailog integrate these benchmarks to automatically select the best models for your use case.

Check our detailed 2026 embedding comparison for more details.

FAQ

Alibaba invested heavily in multilingual embeddings with Qwen3. The 8B parameter model combines an optimized architecture and training on massive Chinese and English corpora. The Apache 2.0 license enables wide adoption, accelerating community contributions and optimizations.

Yes, but less than before. The model remains stable and well-documented with a complete ecosystem (GPT-5, Assistants API). However, its MTEB score (64.6) is now lower than Qwen3 (70.6) and Google Gemini (68.3). For new projects, other options offer better value.

Cohere Embed v4 allows vectorizing PDFs, slides, and images directly without complex OCR pipelines. This radically simplifies architectures for visual documents. The model has no equivalent - other embeddings are text-only.

If performance is critical and you have GPU infrastructure, yes. Qwen3 surpasses OpenAI by +6 MTEB points. Migration requires complete re-encoding and MLOps expertise. For low to medium volumes without GPU constraints, Google Gemini offers better value without self-hosting complexity.

No, the leaderboard constantly evolves with new submissions. Rankings can change. It's recommended to check the Hugging Face leaderboard regularly and evaluate models on your own dataset before deciding. --- **Need help choosing your embeddings?** [Ailog](https://ailog.fr) automatically integrates the best models for your use case. Benefit from our expertise without the technical complexity.