Cohere Launches Embed v4: 30% Better Accuracy with Smaller Model Size
Cohere's new embedding model delivers state-of-the-art performance on MTEB benchmark while reducing dimensions from 1024 to 768, cutting costs and improving speed.
Announcement
Cohere has released Embed v4, their latest embedding model that achieves significant improvements in accuracy, efficiency, and multilingual performance.
Key Improvements
Performance Gains
MTEB (Massive Text Embedding Benchmark) scores:
| Model | Dimensions | Avg Score | Retrieval | Classification |
|---|---|---|---|---|
| Embed v3 | 1024 | 64.2 | 52.3 | 71.8 |
| Embed v4 | 768 | 66.8 | 55.1 | 74.2 |
| OpenAI ada-002 | 1536 | 60.9 | 49.2 | 68.5 |
| OpenAI text-3-large | 3072 | 64.6 | 54.6 | 70.1 |
Reduced Dimensions
Moving from 1024 to 768 dimensions provides:
- 25% less storage per embedding
- 20% faster similarity search
- 15% lower API costs
- No accuracy loss (actually improved)
Multilingual Excellence
Embed v4 supports 100+ languages with strong performance:
- English: 68.2 (MTEB)
- Chinese: 65.1
- Spanish: 64.8
- Arabic: 62.3
- Hindi: 61.7
Cross-lingual retrieval (query in one language, retrieve in another) improved by 35%.
Technical Innovations
Matryoshka Embeddings
Embed v4 uses Matryoshka Representation Learning, allowing flexible dimension reduction:
DEVELOPERpython# Generate full 768-dim embedding full_embedding = co.embed(texts=["sample text"], model="embed-v4") # Truncate to smaller dimensions without recomputing embedding_256 = full_embedding[:256] # Use first 256 dims embedding_512 = full_embedding[:512] # Use first 512 dims # Trade-off: smaller size vs. slight accuracy loss
Dimension vs. accuracy:
- 768 dims: 100% accuracy (baseline)
- 512 dims: 98.5% accuracy
- 256 dims: 95.2% accuracy
- 128 dims: 89.1% accuracy
Instruction-Aware Embeddings
Embed v4 takes optional task instructions for better domain adaptation:
DEVELOPERpython# Standard embedding embedding = co.embed( texts=["Machine learning model"], model="embed-v4" ) # With task instruction for better domain alignment embedding = co.embed( texts=["Machine learning model"], model="embed-v4", input_type="search_document", embedding_types=["float"] ) # For queries (different from documents) query_embedding = co.embed( texts=["How does ML work?"], model="embed-v4", input_type="search_query" )
Training Improvements
Trained on:
- 1.2 trillion tokens (3x more than v3)
- Synthetic hard negatives
- Contrastive learning with dynamic batching
- Multi-task training across 50+ tasks
Pricing
Embed v4 pricing (per 1M tokens):
- embed-v4: $0.10
- embed-v4-light: $0.02 (384 dims, slightly lower accuracy)
Compared to competitors:
- OpenAI text-embedding-3-small: $0.02 (1536 dims)
- OpenAI text-embedding-3-large: $0.13 (3072 dims)
Migration Guide
Upgrading from v3 to v4:
DEVELOPERpython# Old (v3) response = co.embed( texts=texts, model="embed-english-v3.0" ) # New (v4) response = co.embed( texts=texts, model="embed-v4", input_type="search_document" # New parameter )
Note: v3 and v4 embeddings are not compatible. You must re-embed your entire corpus.
Use Cases
Embed v4 particularly excels at:
- Multilingual search: Better cross-language retrieval
- Code search: Improved semantic code understanding
- Domain-specific RAG: Instruction parameter helps adaptation
- Large-scale systems: Reduced dimensions = lower costs
Benchmarks
Retrieval Tasks
Tested on BeIR benchmark (zero-shot retrieval):
| Dataset | Embed v3 | Embed v4 | Improvement |
|---|---|---|---|
| NQ | 52.8 | 56.3 | +6.6% |
| HotpotQA | 63.2 | 67.1 | +6.2% |
| FEVER | 75.3 | 79.8 | +6.0% |
| Climate-FEVER | 23.1 | 28.4 | +22.9% |
| SciFact | 66.2 | 71.8 | +8.5% |
Classification
On standard text classification benchmarks:
- Banking77: 86.2% → 89.1% (+3.4%)
- Amazon Reviews: 63.8% → 67.2% (+5.3%)
- TREC: 91.3% → 93.7% (+2.6%)
Availability
- Generally available via Cohere API
- Supported in all SDKs (Python, Node.js, Go, Java)
- Coming soon to AWS Bedrock and Azure
- Self-hosted option via Cohere Private Deployment
Best Practices
Dimension Selection
- 768 dims: Default, best quality
- 512 dims: Good balance for most use cases
- 256 dims: Cost-optimized, still strong performance
Input Types
search_document: For documents being indexedsearch_query: For search queriesclassification: For classification tasksclustering: For clustering tasks
Migration Strategy
- Test v4 on sample queries
- Compare retrieval quality
- Re-embed corpus incrementally
- Use A/B testing during transition
Conclusion
Embed v4 sets a new standard for production embedding models, combining state-of-the-art accuracy with practical efficiency improvements. The flexible dimensions via Matryoshka embeddings make it suitable for a wide range of deployment scenarios and budgets.
Tags
Related Guides
Choosing Embedding Models for RAG
Compare embedding models in 2025: OpenAI, Cohere, open-source alternatives. Find the best fit for your use case.
Advanced Chunking Strategies for RAG Systems in 2025
Recent research reveals new document chunking approaches that significantly improve RAG system performance
Weaviate Launches Hybrid Search 2.0 with 60% Faster Query Performance
Weaviate's new hybrid search engine combines BM25, vector search, and learned ranking in a single optimized index for superior RAG retrieval.