3. EmbeddingIntermediate

Best Embedding Models 2025: MTEB Scores & Leaderboard (Cohere, OpenAI, BGE)

April 8, 2026
11 min read
Ailog Research Team

Compare MTEB scores for top embedding models: Cohere embed-v4 (65.2), OpenAI text-3-large (64.6), BGE-M3 (63.0). Full leaderboard with pricing.

MTEB Leaderboard 2025 & 2026 - Top Embedding Models

Quick reference table with MTEB scores for all major embedding models (updated April 2026):

RankModelMTEB ScoreDimensionsPrice/1M tokensBest For
1Harrier-OSS-v1-27B74.3 (v2)5376Free (MIT)SOTA multilingual
2Gemini Embedding 268.323072$0.20Multimodal, best retrieval
3Jina v5-text-small71.7 (v2)1024Free (Apache)Best quality/size ratio
4Qwen3-Embedding-8B70.584096Free (Apache)Best open-source multilingual
5Voyage 4 Large~66.82048$0.12Shared embedding space, MoE
6Cohere Embed v465.21536$0.12Enterprise, 128K context
7OpenAI text-embedding-3-large64.63072$0.13General purpose
8BGE-M363.01024Free (MIT)Self-hosted budget
9Nomic-embed-text-v1.559.4768$0.05Budget option
10all-MiniLM-L6-v256.3384FreeFast prototyping

Note: MTEB v2 (2026) scores are not directly comparable to MTEB v1. Models marked (v2) use the new benchmark. Source: MTEB Leaderboard, April 2026.


What Changed in Q1 2026

The embedding landscape shifted dramatically in early 2026 with four major releases:

Gemini Embedding 2 (March 2026) — First Multimodal Embedding

Google's breakthrough: a single model that embeds text, images, video, audio, and PDFs into one shared 3,072-dim vector space. Leading retrieval benchmarks with 67.71 MTEB retrieval score.

  • Cross-lingual retrieval: 0.997 (highest of any model)
  • Code retrieval: 84.0 on MTEB Code
  • Matryoshka: truncatable to 128/768/1536 dims
  • Pricing: $0.20/M text tokens, $0.10/M batch

Microsoft Harrier-OSS-v1 (March 2026) — SOTA Multilingual

Three MIT-licensed models setting new multilingual records:

  • 270M (640 dims, MTEB v2: 66.5)
  • 0.6B (1024 dims, MTEB v2: 69.0)
  • 27B (5376 dims, MTEB v2: 74.3 — SOTA)

94 languages, 32K context. Requires 80GB+ VRAM for 27B.

Voyage 4 Family (January 2026) — Shared Embedding Space

Industry-first: different models for queries vs documents can share the same vector space. MoE architecture cuts serving costs by 40%.

  • Models: voyage-4-large, voyage-4, voyage-4-lite, voyage-4-nano (Apache 2.0)
  • Claims +14% over OpenAI 3-large on RTEB
  • 200M free tokens included

Jina v5-text (February 2026) — Distilled Quality

Sub-1B models matching 8B quality through distillation:

  • v5-text-small (677M): MTEB v2 = 71.7, 119+ languages
  • v5-text-nano (239M): MTEB v2 = 71.0
  • Task-specific model versions (retrieval, text-matching, classification)

Established Models — Current Status

Cohere Embed v4 (late 2025)

  • Now 1536 dims (up from 1024 in v3), 128K token context (longest)
  • Multimodal: text + images, interleaved
  • Matryoshka: 256, 512, 1024, 1536 dims
  • Pricing: $0.12/M tokens

OpenAI text-embedding-3-large (January 2024)

  • No update in over 2 years. Now ranks ~7th-9th depending on benchmark.
  • Still solid for general use within the OpenAI ecosystem
  • Pricing: $0.13/M tokens

Qwen3-Embedding-8B (2025)

  • Apache 2.0, 100+ languages, 4096 dims
  • Multimodal variants available (Qwen3-VL-Embedding)
  • Fully self-hostable

BGE-M3 (2024)

  • MIT license, 1024 dims, multi-granularity (dense + sparse + multi-vector)
  • Still the go-to budget self-hosted option

Key Decision Factors

1. Accuracy vs Cost

DEVELOPERpython
# Best accuracy: Gemini Embedding 2 import google.generativeai as genai result = genai.embed_content( model="models/gemini-embedding-2", content="Your text here" ) embedding = result['embedding'] # Budget option: Open-source from sentence_transformers import SentenceTransformer model = SentenceTransformer('BAAI/bge-m3') embedding = model.encode("Your text here")

2. Dimension Size

Smaller = faster, cheaper storage, but less accurate

DEVELOPERpython
# OpenAI: Configurable dimensions response = client.embeddings.create( model="text-embedding-3-large", input="text", dimensions=512 # vs default 3072 )

3. Language Support

Multilingual leaders (2026):

  • Microsoft Harrier-OSS-v1: 94 languages (MIT)
  • Cohere embed-v4: 100+ languages
  • BGE-M3: 100+ languages
  • Jina v5-text: 119+ languages

4. Domain Specialization

Code: Voyage code-3, Gemini Embedding 2 (MTEB Code: 84.0) Legal: Fine-tuned BGE or Qwen3 on legal corpus Medical: BioGPT embeddings, PubMedBERT

Matryoshka Embeddings (Standard in 2026)

Matryoshka Representation Learning is now the industry standard. Most new models support variable dimensions from a single embedding:

DEVELOPERpython
# Generate once at full dimension full_embedding = model.encode(text, dimension=3072) # Truncate later as needed small_embedding = full_embedding[:256] medium_embedding = full_embedding[:768] # Quality degrades gracefully, not catastrophically

Models supporting Matryoshka (2026): Gemini Embedding 2, Voyage 4, Cohere v4, OpenAI text-3-*, Jina v5, Microsoft Harrier, Nomic v1.5.

Benchmarking Your Use Case

Don't trust generic benchmarks - test on YOUR data:

DEVELOPERpython
from sentence_transformers import SentenceTransformer, util def benchmark_model(model_name, queries, documents): model = SentenceTransformer(model_name) # Embed query_embs = model.encode(queries) doc_embs = model.encode(documents) # Calculate similarities similarities = util.cos_sim(query_embs, doc_embs) return similarities # Test multiple models models = [ "BAAI/bge-m3", "Qwen/Qwen3-Embedding-8B", "jinaai/jina-embeddings-v5-text-small" ] for model in models: scores = benchmark_model(model, test_queries, test_docs) print(f"{model}: {scores.mean()}")

Fine-Tuning for Your Domain

DEVELOPERpython
from sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # Load base model model = SentenceTransformer('BAAI/bge-base-en-v1.5') # Create training examples train_examples = [ InputExample(texts=['query', 'positive_doc', 'negative_doc']) ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # Fine-tune train_loss = losses.MultipleNegativesRankingLoss(model) model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100 )

Cost Analysis (1M documents, April 2026)

ModelEmbedding CostStorageInference
Gemini Embedding 2~$200$50/month$0.20/M queries
Voyage 4 Large~$120$50/month$0.12/M queries
Cohere v4~$120$50/month$0.12/M queries
OpenAI-3-large$130$50/month$0.13/M queries
Jina v5-small (self-hosted)$0$30/monthGPU: $80/month
Qwen3-8B (self-hosted)$0$50/monthGPU: $100/month
BGE-M3 (self-hosted)$0$30/monthGPU: $50/month
all-MiniLM$0$20/monthCPU: $20/month

Recommendations by Use Case

Startup/MVP: all-MiniLM-L6-v2 (free, fast) or Jina v5-nano (free, much better quality) Production (quality matters): Gemini Embedding 2 or Voyage 4 Large Production (budget matters): BGE-M3 or Jina v5-small self-hosted Multilingual: Microsoft Harrier-OSS-v1 (MIT, SOTA) or Cohere embed-v4 Multimodal (text + images): Gemini Embedding 2 or Cohere embed-v4 Code search: Gemini Embedding 2 (MTEB Code: 84.0) or Voyage code-3 Privacy-critical: Qwen3-Embedding-8B (Apache 2.0) or BGE-M3 (MIT) Enterprise (128K context): Cohere embed-v4

Migration Strategy

Changing embeddings requires re-embedding everything:

DEVELOPERpython
# Gradual migration def hybrid_search(query, old_index, new_index, alpha=0.5): # Search both indices old_results = old_index.search(old_model.encode(query)) new_results = new_index.search(new_model.encode(query)) # Blend results return blend_rankings(old_results, new_results, alpha)

The embedding model is your RAG foundation. Choose wisely, benchmark thoroughly, and be ready to upgrade as models improve.

FAQ

Gemini Embedding 2 leads retrieval benchmarks (67.71 MTEB retrieval) and is the first to handle text, images, video, audio, and PDFs in one model. For self-hosted, Qwen3-Embedding-8B (Apache 2.0) and Jina v5-text-small offer excellent quality at no API cost.
It remains solid but hasn't been updated since January 2024. Gemini Embedding 2, Voyage 4, and open-source models like Jina v5 and Qwen3 now outperform it on most benchmarks. If you're already in the OpenAI ecosystem, it's still reasonable; otherwise, newer options offer better value.
Jina v5-text-small (677M params, MTEB v2: 71.7, Apache 2.0) offers the best quality-to-size ratio. For larger-scale needs, Qwen3-Embedding-8B (70.58) and Microsoft Harrier-OSS-v1 (MIT, MTEB v2: 74.3 for the 27B model) are strong options.
Consider: (1) accuracy requirements, (2) language support, (3) cost constraints, (4) latency needs, (5) multimodal requirements (new in 2026). Benchmark on YOUR data - generic scores don't always translate to your domain.
Fine-tuning shows +10-30% gains for specialized domains (legal, medical, code). Start with a pre-trained model, then fine-tune if generic performance is insufficient. Most new models (Jina v5, Qwen3) support efficient fine-tuning.

Tags

embeddingsmodelsbenchmarksmtebopenaicoherebge-m32025leaderboard

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !