3. EmbeddingIntermédiaire

Choosing Embedding Models for RAG

10 novembre 2025
11 min read
Ailog Research Team

Compare embedding models in 2025: OpenAI, Cohere, open-source alternatives. Find the best fit for your use case.

Embedding Model Landscape (November 2025)

The embedding space has evolved dramatically. Here's what's leading:

Top Models by MTEB Score

1. OpenAI text-embedding-3-large

  • Dimensions: 3072 (configurable down to 256)
  • MTEB Score: 64.6
  • Cost: $0.13 per 1M tokens
  • Best for: General purpose, high accuracy

2. Cohere embed-v4

  • Dimensions: 1024
  • MTEB Score: 65.2 (highest as of Nov 2025)
  • Cost: $0.10 per 1M tokens
  • Best for: Multilingual, search-optimized

3. Voyage AI voyage-2

  • Dimensions: 1536
  • MTEB Score: 63.8
  • Cost: $0.12 per 1M tokens
  • Best for: Domain-specific tuning

4. BGE-M3 (open-source)

  • Dimensions: 1024
  • MTEB Score: 63.0
  • Cost: Free (self-hosted)
  • Best for: Budget-conscious, privacy

5. all-MiniLM-L6-v2

  • Dimensions: 384
  • MTEB Score: 56.3
  • Cost: Free
  • Best for: Fast prototyping, local dev

Key Decision Factors

1. Accuracy vs Cost

DEVELOPERpython
# High accuracy: OpenAI or Cohere from openai import OpenAI client = OpenAI() response = client.embeddings.create( model="text-embedding-3-large", input="Your text here" ) embedding = response.data[0].embedding # Budget option: Open-source from sentence_transformers import SentenceTransformer model = SentenceTransformer('BAAI/bge-large-en-v1.5') embedding = model.encode("Your text here")

2. Dimension Size

Smaller = faster, cheaper storage, but less accurate

DEVELOPERpython
# OpenAI: Configurable dimensions response = client.embeddings.create( model="text-embedding-3-large", input="text", dimensions=512 # vs default 3072 )

3. Language Support

Multilingual leaders (Nov 2025):

  • Cohere embed-v4: 100+ languages
  • BGE-M3: 100+ languages
  • OpenAI-3-large: Strong multilingual
  • E5-mistral-7b-instruct: Open-source multilingual

4. Domain Specialization

Code: OpenAI text-embedding-3-small, Voyage code-2 Legal: Fine-tuned BGE on legal corpus Medical: BioGPT embeddings, PubMedBERT

Benchmarking Your Use Case

Don't trust generic benchmarks - test on YOUR data:

DEVELOPERpython
from sentence_transformers import SentenceTransformer, util def benchmark_model(model_name, queries, documents): model = SentenceTransformer(model_name) # Embed query_embs = model.encode(queries) doc_embs = model.encode(documents) # Calculate similarities similarities = util.cos_sim(query_embs, doc_embs) return similarities # Test multiple models models = [ "text-embedding-3-large", "BAAI/bge-large-en-v1.5", "sentence-transformers/all-MiniLM-L6-v2" ] for model in models: scores = benchmark_model(model, test_queries, test_docs) print(f"{model}: {scores.mean()}")

Matryoshka Embeddings (2025 Innovation)

New models support variable dimensions from the same embedding:

DEVELOPERpython
# Generate once at full dimension full_embedding = model.encode(text, dimension=1024) # Truncate later as needed small_embedding = full_embedding[:256] # Just use first 256 medium_embedding = full_embedding[:512] # Quality degrades gracefully, not catastrophically

Models supporting this:

  • OpenAI text-embedding-3-*
  • Nomic embed-v1.5
  • Jina embeddings v2

Fine-Tuning for Your Domain

DEVELOPERpython
from sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # Load base model model = SentenceTransformer('BAAI/bge-base-en-v1.5') # Create training examples train_examples = [ InputExample(texts=['query', 'positive_doc', 'negative_doc']) ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # Fine-tune train_loss = losses.MultipleNegativesRankingLoss(model) model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100 )

Cost Analysis (1M documents, Nov 2025)

ModelEmbedding CostStorage (1024d)Inference
OpenAI-3-large$130$50/month$0.13/1M queries
Cohere v4$100$50/month$0.10/1M queries
BGE (self-hosted)$0$50/monthGPU: $100/month
all-MiniLM$0$20/monthCPU: $20/month

Recommendations by Use Case

Startup/MVP: all-MiniLM-L6-v2 (free, fast) Production (quality matters): Cohere embed-v4
Production (budget matters): BGE-large self-hosted Multilingual: Cohere embed-v4 or BGE-M3 Code search: Voyage code-2 Privacy-critical: Self-hosted BGE

Migration Strategy

Changing embeddings requires re-embedding everything:

DEVELOPERpython
# Gradual migration def hybrid_search(query, old_index, new_index, alpha=0.5): # Search both indices old_results = old_index.search(old_model.encode(query)) new_results = new_index.search(new_model.encode(query)) # Blend results return blend_rankings(old_results, new_results, alpha)

The embedding model is your RAG foundation. Choose wisely, benchmark thoroughly, and be ready to upgrade as models improve.

Tags

embeddingsmodelsbenchmarksmteb

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !