Best Embedding Models 2025: MTEB Scores & Leaderboard (Cohere, OpenAI, BGE)

Compare MTEB scores for top embedding models: Cohere embed-v4 (65.2), OpenAI text-3-large (64.6), BGE-M3 (63.0). Full leaderboard with pricing.

Author
Ailog Research Team
Published
Reading time
11 min read
Level
intermediate
RAG Pipeline Step
Embedding

MTEB Leaderboard 2025 - Top Embedding Models

Quick reference table with MTEB scores for all major embedding models:

| Rank | Model | MTEB Score | Dimensions | Price/1M tokens | Best For | |------|-------|------------|------------|-----------------|----------| | 1 | Gemini-embedding-001 | 68.32 | 3072 | ~$0.004/1K | Overall best, multilingual | | 2 | Qwen3-Embedding-8B | 70.58 | 4096 | Free | Best open-source | | 3 | Voyage-3-large | 66.8 | 1536 | $0.12 | Domain tuning | | 4 | Cohere embed-v4 | 65.2 | 1024 | $0.10 | Enterprise, noisy data | | 5 | OpenAI text-embedding-3-large | 64.6 | 3072 | $0.13 | General purpose | | 6 | BGE-M3 | 63.0 | 1024 | Free | Self-hosted budget | | 7 | Nomic-embed-text-v1.5 | 59.4 | 768 | $0.05 | Budget option | | 8 | all-MiniLM-L6-v2 | 56.3 | 384 | Free | Fast prototyping |

Qwen3-Embedding-8B scores 70.58 on MTEB Multilingual leaderboard. Last updated: January 2026. Source: MTEB Leaderboard

---

Embedding Model Landscape (2025)

The embedding space has evolved dramatically. Here's what's leading:

Top Models by MTEB Score Gemini-embedding-001 (NEW #1) • Dimensions: 3072 • MTEB Score: 68.32 (+5.81 over competitors) • Cost: ~$0.004 per 1K tokens • Best for: Overall best, multilingual (100+ languages) Qwen3-Embedding-8B (Best Open-Source) • Dimensions: 4096 • MTEB Score: 70.58 (multilingual leaderboard) • Cost: Free (Apache 2.0 license) • Best for: Self-hosted, privacy-first, multilingual Voyage-3-large • Dimensions: 1536 • MTEB Score: 66.8 • Cost: $0.12 per 1M tokens • Best for: Domain-specific tuning Cohere embed-v4 • Dimensions: 1024 • MTEB Score: 65.2 • Cost: $0.10 per 1M tokens • Best for: Enterprise, noisy real-world data OpenAI text-embedding-3-large • Dimensions: 3072 (configurable down to 256) • MTEB Score: 64.6 • Cost: $0.13 per 1M tokens • Best for: General purpose, existing OpenAI stack

Key Decision Factors Accuracy vs Cost

``python High accuracy: OpenAI or Cohere from openai import OpenAI client = OpenAI()

response = client.embeddings.create( model="text-embedding-3-large", input="Your text here" ) embedding = response.data[0].embedding

Budget option: Open-source from sentence_transformers import SentenceTransformer model = SentenceTransformer('BAAI/bge-large-en-v1.5') embedding = model.encode("Your text here") ` Dimension Size

Smaller = faster, cheaper storage, but less accurate

`python OpenAI: Configurable dimensions response = client.embeddings.create( model="text-embedding-3-large", input="text", dimensions=512 vs default 3072 ) ` Language Support

Multilingual leaders: • Cohere embed-v4: 100+ languages • BGE-M3: 100+ languages • OpenAI text-embedding-3-large: 100+ languages Domain Specialization

Code: OpenAI text-embedding-3-small, Voyage code-2 Legal: Fine-tuned BGE on legal corpus Medical: BioGPT embeddings, PubMedBERT

Benchmarking Your Use Case

Don't trust generic benchmarks - test on YOUR data:

`python from sentence_transformers import SentenceTransformer, util

def benchmark_model(model_name, queries, documents): model = SentenceTransformer(model_name)

Embed query_embs = model.encode(queries) doc_embs = model.encode(documents)

Calculate similarities similarities = util.cos_sim(query_embs, doc_embs)

return similarities

Test multiple models models = [ "text-embedding-3-large", "BAAI/bge-large-en-v1.5", "sentence-transformers/all-MiniLM-L6-v2" ]

for model in models: scores = benchmark_model(model, test_queries, test_docs) print(f"{model}: {scores.mean()}") `

Matryoshka Embeddings (2025-2026)

New models support variable dimensions from the same embedding:

`python Generate once at full dimension full_embedding = model.encode(text, dimension=1024)

Truncate later as needed small_embedding = full_embedding[:256] Just use first 256 medium_embedding = full_embedding[:512]

Quality degrades gracefully, not catastrophically `

Models supporting this: • OpenAI text-embedding-3- • Nomic embed-v1.5 • Jina embeddings v2

Fine-Tuning for Your Domain

`python from sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader

Load base model model = SentenceTransformer('BAAI/bge-base-en-v1.5')

Create training examples train_examples = [ InputExample(texts=['query', 'positive_doc', 'negative_doc']) ]

train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)

Fine-tune train_loss = losses.MultipleNegativesRankingLoss(model) model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100 ) `

Cost Analysis (1M documents)

| Model | Embedding Cost | Storage | Inference | |-------|---------------|---------|-----------| | Gemini-embedding-001 | ~$40 | $50/month | ~$0.004/1K queries | | OpenAI-3-large | $130 | $50/month | $0.13/1M queries | | Cohere v4 | $100 | $50/month | $0.10/1M queries | | Qwen3-8B (self-hosted) | $0 | $50/month | GPU: $100/month | | all-MiniLM | $0 | $20/month | CPU: $20/month |

Recommendations by Use Case

Startup/MVP: all-MiniLM-L6-v2 (free, fast) Production (quality matters): Cohere embed-v4 or OpenAI text-embedding-3-large Production (budget matters): BGE-M3 self-hosted Multilingual: Cohere embed-v4 or BGE-M3 Code search: Voyage code-2 or OpenAI text-embedding-3-small Privacy-critical: BGE-M3 (MIT license, self-hosted) Enterprise (noisy data): Cohere embed-v4

Migration Strategy

Changing embeddings requires re-embedding everything:

`python Gradual migration def hybrid_search(query, old_index, new_index, alpha=0.5): Search both indices old_results = old_index.search(old_model.encode(query)) new_results = new_index.search(new_model.encode(query))

Blend results return blend_rankings(old_results, new_results, alpha) ``

The embedding model is your RAG foundation. Choose wisely, benchmark thoroughly, and be ready to upgrade as models improve.

---

FAQ

Which embedding model is best for RAG? Cohere embed-v4 leads with 65.2 on MTEB, followed closely by OpenAI text-embedding-3-large (64.6). For open-source, BGE-M3 (63.0) offers excellent performance at no cost.

Is OpenAI text-embedding-3-large worth the cost? Yes. With a 64.6 MTEB score, it's among the top performers and integrates seamlessly with the OpenAI ecosystem. Consider Cohere for slightly better multilingual performance.

What's the best free embedding model? BGE-M3 is the top open-source choice with 63.0 on MTEB, supporting 100+ languages. For English-only use cases, all-MiniLM-L6-v2 offers fast, lightweight embeddings.

How do I choose between embedding models? Consider: (1) accuracy requirements, (2) language support, (3) cost constraints, (4) latency needs. Benchmark on YOUR data - generic scores don't always translate to your domain.

Should I fine-tune my embedding model? Fine-tuning shows +10-30% gains for specialized domains (legal, medical, code). Start with a pre-trained model, then fine-tune if generic performance is insufficient.

Tags

  • embeddings
  • models
  • benchmarks
  • mteb
  • openai
  • cohere
  • bge-m3
  • 2025
  • leaderboard
3. EmbeddingIntermédiaire

Best Embedding Models 2025: MTEB Scores & Leaderboard (Cohere, OpenAI, BGE)

16 janvier 2026
11 min read
Ailog Research Team

Compare MTEB scores for top embedding models: Cohere embed-v4 (65.2), OpenAI text-3-large (64.6), BGE-M3 (63.0). Full leaderboard with pricing.

MTEB Leaderboard 2025 - Top Embedding Models

Quick reference table with MTEB scores for all major embedding models:

RankModelMTEB ScoreDimensionsPrice/1M tokensBest For
1Gemini-embedding-00168.323072~$0.004/1KOverall best, multilingual
2Qwen3-Embedding-8B70.58*4096FreeBest open-source
3Voyage-3-large66.81536$0.12Domain tuning
4Cohere embed-v465.21024$0.10Enterprise, noisy data
5OpenAI text-embedding-3-large64.63072$0.13General purpose
6BGE-M363.01024FreeSelf-hosted budget
7Nomic-embed-text-v1.559.4768$0.05Budget option
8all-MiniLM-L6-v256.3384FreeFast prototyping

Qwen3-Embedding-8B scores 70.58 on MTEB Multilingual leaderboard. Last updated: January 2026. Source: MTEB Leaderboard


Embedding Model Landscape (2025)

The embedding space has evolved dramatically. Here's what's leading:

Top Models by MTEB Score

1. Gemini-embedding-001 (NEW #1)

  • Dimensions: 3072
  • MTEB Score: 68.32 (+5.81 over competitors)
  • Cost: ~$0.004 per 1K tokens
  • Best for: Overall best, multilingual (100+ languages)

2. Qwen3-Embedding-8B (Best Open-Source)

  • Dimensions: 4096
  • MTEB Score: 70.58 (multilingual leaderboard)
  • Cost: Free (Apache 2.0 license)
  • Best for: Self-hosted, privacy-first, multilingual

3. Voyage-3-large

  • Dimensions: 1536
  • MTEB Score: 66.8
  • Cost: $0.12 per 1M tokens
  • Best for: Domain-specific tuning

4. Cohere embed-v4

  • Dimensions: 1024
  • MTEB Score: 65.2
  • Cost: $0.10 per 1M tokens
  • Best for: Enterprise, noisy real-world data

5. OpenAI text-embedding-3-large

  • Dimensions: 3072 (configurable down to 256)
  • MTEB Score: 64.6
  • Cost: $0.13 per 1M tokens
  • Best for: General purpose, existing OpenAI stack

Key Decision Factors

1. Accuracy vs Cost

DEVELOPERpython
# High accuracy: OpenAI or Cohere from openai import OpenAI client = OpenAI() response = client.embeddings.create( model="text-embedding-3-large", input="Your text here" ) embedding = response.data[0].embedding # Budget option: Open-source from sentence_transformers import SentenceTransformer model = SentenceTransformer('BAAI/bge-large-en-v1.5') embedding = model.encode("Your text here")

2. Dimension Size

Smaller = faster, cheaper storage, but less accurate

DEVELOPERpython
# OpenAI: Configurable dimensions response = client.embeddings.create( model="text-embedding-3-large", input="text", dimensions=512 # vs default 3072 )

3. Language Support

Multilingual leaders:

  • Cohere embed-v4: 100+ languages
  • BGE-M3: 100+ languages
  • OpenAI text-embedding-3-large: 100+ languages

4. Domain Specialization

Code: OpenAI text-embedding-3-small, Voyage code-2 Legal: Fine-tuned BGE on legal corpus Medical: BioGPT embeddings, PubMedBERT

Benchmarking Your Use Case

Don't trust generic benchmarks - test on YOUR data:

DEVELOPERpython
from sentence_transformers import SentenceTransformer, util def benchmark_model(model_name, queries, documents): model = SentenceTransformer(model_name) # Embed query_embs = model.encode(queries) doc_embs = model.encode(documents) # Calculate similarities similarities = util.cos_sim(query_embs, doc_embs) return similarities # Test multiple models models = [ "text-embedding-3-large", "BAAI/bge-large-en-v1.5", "sentence-transformers/all-MiniLM-L6-v2" ] for model in models: scores = benchmark_model(model, test_queries, test_docs) print(f"{model}: {scores.mean()}")

Matryoshka Embeddings (2025-2026)

New models support variable dimensions from the same embedding:

DEVELOPERpython
# Generate once at full dimension full_embedding = model.encode(text, dimension=1024) # Truncate later as needed small_embedding = full_embedding[:256] # Just use first 256 medium_embedding = full_embedding[:512] # Quality degrades gracefully, not catastrophically

Models supporting this:

  • OpenAI text-embedding-3-*
  • Nomic embed-v1.5
  • Jina embeddings v2

Fine-Tuning for Your Domain

DEVELOPERpython
from sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # Load base model model = SentenceTransformer('BAAI/bge-base-en-v1.5') # Create training examples train_examples = [ InputExample(texts=['query', 'positive_doc', 'negative_doc']) ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # Fine-tune train_loss = losses.MultipleNegativesRankingLoss(model) model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100 )

Cost Analysis (1M documents)

ModelEmbedding CostStorageInference
Gemini-embedding-001~$40$50/month~$0.004/1K queries
OpenAI-3-large$130$50/month$0.13/1M queries
Cohere v4$100$50/month$0.10/1M queries
Qwen3-8B (self-hosted)$0$50/monthGPU: $100/month
all-MiniLM$0$20/monthCPU: $20/month

Recommendations by Use Case

Startup/MVP: all-MiniLM-L6-v2 (free, fast) Production (quality matters): Cohere embed-v4 or OpenAI text-embedding-3-large Production (budget matters): BGE-M3 self-hosted Multilingual: Cohere embed-v4 or BGE-M3 Code search: Voyage code-2 or OpenAI text-embedding-3-small Privacy-critical: BGE-M3 (MIT license, self-hosted) Enterprise (noisy data): Cohere embed-v4

Migration Strategy

Changing embeddings requires re-embedding everything:

DEVELOPERpython
# Gradual migration def hybrid_search(query, old_index, new_index, alpha=0.5): # Search both indices old_results = old_index.search(old_model.encode(query)) new_results = new_index.search(new_model.encode(query)) # Blend results return blend_rankings(old_results, new_results, alpha)

The embedding model is your RAG foundation. Choose wisely, benchmark thoroughly, and be ready to upgrade as models improve.

FAQ

Cohere embed-v4 leads with 65.2 on MTEB, followed closely by OpenAI text-embedding-3-large (64.6). For open-source, BGE-M3 (63.0) offers excellent performance at no cost.
Yes. With a 64.6 MTEB score, it's among the top performers and integrates seamlessly with the OpenAI ecosystem. Consider Cohere for slightly better multilingual performance.
BGE-M3 is the top open-source choice with 63.0 on MTEB, supporting 100+ languages. For English-only use cases, all-MiniLM-L6-v2 offers fast, lightweight embeddings.
Consider: (1) accuracy requirements, (2) language support, (3) cost constraints, (4) latency needs. Benchmark on YOUR data - generic scores don't always translate to your domain.
Fine-tuning shows +10-30% gains for specialized domains (legal, medical, code). Start with a pre-trained model, then fine-tune if generic performance is insufficient.

Tags

embeddingsmodelsbenchmarksmtebopenaicoherebge-m32025leaderboard

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !