Best Embedding Models 2025: MTEB Scores & Leaderboard (Cohere, OpenAI, BGE)
Compare MTEB scores for top embedding models: Cohere embed-v4 (65.2), OpenAI text-3-large (64.6), BGE-M3 (63.0). Full leaderboard with pricing.
MTEB Leaderboard 2025 & 2026 - Top Embedding Models
Quick reference table with MTEB scores for all major embedding models (updated April 2026):
| Rank | Model | MTEB Score | Dimensions | Price/1M tokens | Best For |
|---|---|---|---|---|---|
| 1 | Harrier-OSS-v1-27B | 74.3 (v2) | 5376 | Free (MIT) | SOTA multilingual |
| 2 | Gemini Embedding 2 | 68.32 | 3072 | $0.20 | Multimodal, best retrieval |
| 3 | Jina v5-text-small | 71.7 (v2) | 1024 | Free (Apache) | Best quality/size ratio |
| 4 | Qwen3-Embedding-8B | 70.58 | 4096 | Free (Apache) | Best open-source multilingual |
| 5 | Voyage 4 Large | ~66.8 | 2048 | $0.12 | Shared embedding space, MoE |
| 6 | Cohere Embed v4 | 65.2 | 1536 | $0.12 | Enterprise, 128K context |
| 7 | OpenAI text-embedding-3-large | 64.6 | 3072 | $0.13 | General purpose |
| 8 | BGE-M3 | 63.0 | 1024 | Free (MIT) | Self-hosted budget |
| 9 | Nomic-embed-text-v1.5 | 59.4 | 768 | $0.05 | Budget option |
| 10 | all-MiniLM-L6-v2 | 56.3 | 384 | Free | Fast prototyping |
Note: MTEB v2 (2026) scores are not directly comparable to MTEB v1. Models marked (v2) use the new benchmark. Source: MTEB Leaderboard, April 2026.
What Changed in Q1 2026
The embedding landscape shifted dramatically in early 2026 with four major releases:
Gemini Embedding 2 (March 2026) — First Multimodal Embedding
Google's breakthrough: a single model that embeds text, images, video, audio, and PDFs into one shared 3,072-dim vector space. Leading retrieval benchmarks with 67.71 MTEB retrieval score.
- Cross-lingual retrieval: 0.997 (highest of any model)
- Code retrieval: 84.0 on MTEB Code
- Matryoshka: truncatable to 128/768/1536 dims
- Pricing: $0.20/M text tokens, $0.10/M batch
Microsoft Harrier-OSS-v1 (March 2026) — SOTA Multilingual
Three MIT-licensed models setting new multilingual records:
- 270M (640 dims, MTEB v2: 66.5)
- 0.6B (1024 dims, MTEB v2: 69.0)
- 27B (5376 dims, MTEB v2: 74.3 — SOTA)
94 languages, 32K context. Requires 80GB+ VRAM for 27B.
Voyage 4 Family (January 2026) — Shared Embedding Space
Industry-first: different models for queries vs documents can share the same vector space. MoE architecture cuts serving costs by 40%.
- Models: voyage-4-large, voyage-4, voyage-4-lite, voyage-4-nano (Apache 2.0)
- Claims +14% over OpenAI 3-large on RTEB
- 200M free tokens included
Jina v5-text (February 2026) — Distilled Quality
Sub-1B models matching 8B quality through distillation:
- v5-text-small (677M): MTEB v2 = 71.7, 119+ languages
- v5-text-nano (239M): MTEB v2 = 71.0
- Task-specific model versions (retrieval, text-matching, classification)
Established Models — Current Status
Cohere Embed v4 (late 2025)
- Now 1536 dims (up from 1024 in v3), 128K token context (longest)
- Multimodal: text + images, interleaved
- Matryoshka: 256, 512, 1024, 1536 dims
- Pricing: $0.12/M tokens
OpenAI text-embedding-3-large (January 2024)
- No update in over 2 years. Now ranks ~7th-9th depending on benchmark.
- Still solid for general use within the OpenAI ecosystem
- Pricing: $0.13/M tokens
Qwen3-Embedding-8B (2025)
- Apache 2.0, 100+ languages, 4096 dims
- Multimodal variants available (Qwen3-VL-Embedding)
- Fully self-hostable
BGE-M3 (2024)
- MIT license, 1024 dims, multi-granularity (dense + sparse + multi-vector)
- Still the go-to budget self-hosted option
Key Decision Factors
1. Accuracy vs Cost
DEVELOPERpython# Best accuracy: Gemini Embedding 2 import google.generativeai as genai result = genai.embed_content( model="models/gemini-embedding-2", content="Your text here" ) embedding = result['embedding'] # Budget option: Open-source from sentence_transformers import SentenceTransformer model = SentenceTransformer('BAAI/bge-m3') embedding = model.encode("Your text here")
2. Dimension Size
Smaller = faster, cheaper storage, but less accurate
DEVELOPERpython# OpenAI: Configurable dimensions response = client.embeddings.create( model="text-embedding-3-large", input="text", dimensions=512 # vs default 3072 )
3. Language Support
Multilingual leaders (2026):
- Microsoft Harrier-OSS-v1: 94 languages (MIT)
- Cohere embed-v4: 100+ languages
- BGE-M3: 100+ languages
- Jina v5-text: 119+ languages
4. Domain Specialization
Code: Voyage code-3, Gemini Embedding 2 (MTEB Code: 84.0) Legal: Fine-tuned BGE or Qwen3 on legal corpus Medical: BioGPT embeddings, PubMedBERT
Matryoshka Embeddings (Standard in 2026)
Matryoshka Representation Learning is now the industry standard. Most new models support variable dimensions from a single embedding:
DEVELOPERpython# Generate once at full dimension full_embedding = model.encode(text, dimension=3072) # Truncate later as needed small_embedding = full_embedding[:256] medium_embedding = full_embedding[:768] # Quality degrades gracefully, not catastrophically
Models supporting Matryoshka (2026): Gemini Embedding 2, Voyage 4, Cohere v4, OpenAI text-3-*, Jina v5, Microsoft Harrier, Nomic v1.5.
Benchmarking Your Use Case
Don't trust generic benchmarks - test on YOUR data:
DEVELOPERpythonfrom sentence_transformers import SentenceTransformer, util def benchmark_model(model_name, queries, documents): model = SentenceTransformer(model_name) # Embed query_embs = model.encode(queries) doc_embs = model.encode(documents) # Calculate similarities similarities = util.cos_sim(query_embs, doc_embs) return similarities # Test multiple models models = [ "BAAI/bge-m3", "Qwen/Qwen3-Embedding-8B", "jinaai/jina-embeddings-v5-text-small" ] for model in models: scores = benchmark_model(model, test_queries, test_docs) print(f"{model}: {scores.mean()}")
Fine-Tuning for Your Domain
DEVELOPERpythonfrom sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # Load base model model = SentenceTransformer('BAAI/bge-base-en-v1.5') # Create training examples train_examples = [ InputExample(texts=['query', 'positive_doc', 'negative_doc']) ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # Fine-tune train_loss = losses.MultipleNegativesRankingLoss(model) model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100 )
Cost Analysis (1M documents, April 2026)
| Model | Embedding Cost | Storage | Inference |
|---|---|---|---|
| Gemini Embedding 2 | ~$200 | $50/month | $0.20/M queries |
| Voyage 4 Large | ~$120 | $50/month | $0.12/M queries |
| Cohere v4 | ~$120 | $50/month | $0.12/M queries |
| OpenAI-3-large | $130 | $50/month | $0.13/M queries |
| Jina v5-small (self-hosted) | $0 | $30/month | GPU: $80/month |
| Qwen3-8B (self-hosted) | $0 | $50/month | GPU: $100/month |
| BGE-M3 (self-hosted) | $0 | $30/month | GPU: $50/month |
| all-MiniLM | $0 | $20/month | CPU: $20/month |
Recommendations by Use Case
Startup/MVP: all-MiniLM-L6-v2 (free, fast) or Jina v5-nano (free, much better quality) Production (quality matters): Gemini Embedding 2 or Voyage 4 Large Production (budget matters): BGE-M3 or Jina v5-small self-hosted Multilingual: Microsoft Harrier-OSS-v1 (MIT, SOTA) or Cohere embed-v4 Multimodal (text + images): Gemini Embedding 2 or Cohere embed-v4 Code search: Gemini Embedding 2 (MTEB Code: 84.0) or Voyage code-3 Privacy-critical: Qwen3-Embedding-8B (Apache 2.0) or BGE-M3 (MIT) Enterprise (128K context): Cohere embed-v4
Migration Strategy
Changing embeddings requires re-embedding everything:
DEVELOPERpython# Gradual migration def hybrid_search(query, old_index, new_index, alpha=0.5): # Search both indices old_results = old_index.search(old_model.encode(query)) new_results = new_index.search(new_model.encode(query)) # Blend results return blend_rankings(old_results, new_results, alpha)
The embedding model is your RAG foundation. Choose wisely, benchmark thoroughly, and be ready to upgrade as models improve.
FAQ
Tags
Related Posts
Embeddings: The Foundation of Semantic Search
Deep dive into embedding models, vector representations, and how to choose the right embedding strategy for your RAG system.
Multilingual Embeddings for Global RAG
Build RAG systems that work across languages using multilingual embedding models and cross-lingual retrieval.
Fine-Tune Embeddings for Your Domain
Boost RAG retrieval accuracy by 30-50% with domain-specific fine-tuning. Learn to create custom embeddings for your documents and queries.