Choosing Embedding Models for RAG
Compare embedding models in 2025: OpenAI, Cohere, open-source alternatives. Find the best fit for your use case.
Embedding Model Landscape (November 2025)
The embedding space has evolved dramatically. Here's what's leading:
Top Models by MTEB Score
1. OpenAI text-embedding-3-large
- Dimensions: 3072 (configurable down to 256)
- MTEB Score: 64.6
- Cost: $0.13 per 1M tokens
- Best for: General purpose, high accuracy
2. Cohere embed-v4
- Dimensions: 1024
- MTEB Score: 65.2 (highest as of Nov 2025)
- Cost: $0.10 per 1M tokens
- Best for: Multilingual, search-optimized
3. Voyage AI voyage-2
- Dimensions: 1536
- MTEB Score: 63.8
- Cost: $0.12 per 1M tokens
- Best for: Domain-specific tuning
4. BGE-M3 (open-source)
- Dimensions: 1024
- MTEB Score: 63.0
- Cost: Free (self-hosted)
- Best for: Budget-conscious, privacy
5. all-MiniLM-L6-v2
- Dimensions: 384
- MTEB Score: 56.3
- Cost: Free
- Best for: Fast prototyping, local dev
Key Decision Factors
1. Accuracy vs Cost
DEVELOPERpython# High accuracy: OpenAI or Cohere from openai import OpenAI client = OpenAI() response = client.embeddings.create( model="text-embedding-3-large", input="Your text here" ) embedding = response.data[0].embedding # Budget option: Open-source from sentence_transformers import SentenceTransformer model = SentenceTransformer('BAAI/bge-large-en-v1.5') embedding = model.encode("Your text here")
2. Dimension Size
Smaller = faster, cheaper storage, but less accurate
DEVELOPERpython# OpenAI: Configurable dimensions response = client.embeddings.create( model="text-embedding-3-large", input="text", dimensions=512 # vs default 3072 )
3. Language Support
Multilingual leaders (Nov 2025):
- Cohere embed-v4: 100+ languages
- BGE-M3: 100+ languages
- OpenAI-3-large: Strong multilingual
- E5-mistral-7b-instruct: Open-source multilingual
4. Domain Specialization
Code: OpenAI text-embedding-3-small, Voyage code-2 Legal: Fine-tuned BGE on legal corpus Medical: BioGPT embeddings, PubMedBERT
Benchmarking Your Use Case
Don't trust generic benchmarks - test on YOUR data:
DEVELOPERpythonfrom sentence_transformers import SentenceTransformer, util def benchmark_model(model_name, queries, documents): model = SentenceTransformer(model_name) # Embed query_embs = model.encode(queries) doc_embs = model.encode(documents) # Calculate similarities similarities = util.cos_sim(query_embs, doc_embs) return similarities # Test multiple models models = [ "text-embedding-3-large", "BAAI/bge-large-en-v1.5", "sentence-transformers/all-MiniLM-L6-v2" ] for model in models: scores = benchmark_model(model, test_queries, test_docs) print(f"{model}: {scores.mean()}")
Matryoshka Embeddings (2025 Innovation)
New models support variable dimensions from the same embedding:
DEVELOPERpython# Generate once at full dimension full_embedding = model.encode(text, dimension=1024) # Truncate later as needed small_embedding = full_embedding[:256] # Just use first 256 medium_embedding = full_embedding[:512] # Quality degrades gracefully, not catastrophically
Models supporting this:
- OpenAI text-embedding-3-*
- Nomic embed-v1.5
- Jina embeddings v2
Fine-Tuning for Your Domain
DEVELOPERpythonfrom sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # Load base model model = SentenceTransformer('BAAI/bge-base-en-v1.5') # Create training examples train_examples = [ InputExample(texts=['query', 'positive_doc', 'negative_doc']) ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # Fine-tune train_loss = losses.MultipleNegativesRankingLoss(model) model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100 )
Cost Analysis (1M documents, Nov 2025)
| Model | Embedding Cost | Storage (1024d) | Inference |
|---|---|---|---|
| OpenAI-3-large | $130 | $50/month | $0.13/1M queries |
| Cohere v4 | $100 | $50/month | $0.10/1M queries |
| BGE (self-hosted) | $0 | $50/month | GPU: $100/month |
| all-MiniLM | $0 | $20/month | CPU: $20/month |
Recommendations by Use Case
Startup/MVP: all-MiniLM-L6-v2 (free, fast)
Production (quality matters): Cohere embed-v4
Production (budget matters): BGE-large self-hosted
Multilingual: Cohere embed-v4 or BGE-M3
Code search: Voyage code-2
Privacy-critical: Self-hosted BGE
Migration Strategy
Changing embeddings requires re-embedding everything:
DEVELOPERpython# Gradual migration def hybrid_search(query, old_index, new_index, alpha=0.5): # Search both indices old_results = old_index.search(old_model.encode(query)) new_results = new_index.search(new_model.encode(query)) # Blend results return blend_rankings(old_results, new_results, alpha)
The embedding model is your RAG foundation. Choose wisely, benchmark thoroughly, and be ready to upgrade as models improve.
Tags
Articles connexes
Embeddings: The Foundation of Semantic Search
Deep dive into embedding models, vector representations, and how to choose the right embedding strategy for your RAG system.
Multilingual Embeddings for Global RAG
Build RAG systems that work across languages using multilingual embedding models and cross-lingual retrieval.
Fine-Tune Embeddings for Your Domain
Boost retrieval accuracy by 30%: fine-tune embedding models on your specific documents and queries.