Cohere Launches Embed v4: 30% Better Accuracy with Smaller Model Size
Cohere's new embedding model delivers state-of-the-art performance on MTEB benchmark while reducing dimensions from 1024 to 768, cutting costs and improving speed.
- Author
- Ailog Research Team
- Published
- Reading time
- 4 min read
Announcement
Cohere has released Embed v4, their latest embedding model that achieves significant improvements in accuracy, efficiency, and multilingual performance.
Key Improvements
Performance Gains
MTEB (Massive Text Embedding Benchmark) scores:
| Model | Dimensions | Avg Score | Retrieval | Classification | |-------|-----------|-----------|-----------|----------------| | Embed v3 | 1024 | 64.2 | 52.3 | 71.8 | | Embed v4 | 768 | 66.8 | 55.1 | 74.2 | | OpenAI ada-002 | 1536 | 60.9 | 49.2 | 68.5 | | OpenAI text-3-large | 3072 | 64.6 | 54.6 | 70.1 |
Reduced Dimensions
Moving from 1024 to 768 dimensions provides: • 25% less storage per embedding • 20% faster similarity search • 15% lower API costs • No accuracy loss (actually improved)
Multilingual Excellence
Embed v4 supports 100+ languages with strong performance: • English: 68.2 (MTEB) • Chinese: 65.1 • Spanish: 64.8 • Arabic: 62.3 • Hindi: 61.7
Cross-lingual retrieval (query in one language, retrieve in another) improved by 35%.
Technical Innovations
Matryoshka Embeddings
Embed v4 uses Matryoshka Representation Learning, allowing flexible dimension reduction:
``python Generate full 768-dim embedding full_embedding = co.embed(texts=["sample text"], model="embed-v4")
Truncate to smaller dimensions without recomputing embedding_256 = full_embedding[:256] Use first 256 dims embedding_512 = full_embedding[:512] Use first 512 dims
Trade-off: smaller size vs. slight accuracy loss `
Dimension vs. accuracy: • 768 dims: 100% accuracy (baseline) • 512 dims: 98.5% accuracy • 256 dims: 95.2% accuracy • 128 dims: 89.1% accuracy
Instruction-Aware Embeddings
Embed v4 takes optional task instructions for better domain adaptation:
`python Standard embedding embedding = co.embed( texts=["Machine learning model"], model="embed-v4" )
With task instruction for better domain alignment embedding = co.embed( texts=["Machine learning model"], model="embed-v4", input_type="search_document", embedding_types=["float"] )
For queries (different from documents) query_embedding = co.embed( texts=["How does ML work?"], model="embed-v4", input_type="search_query" ) `
Training Improvements
Trained on: • 1.2 trillion tokens (3x more than v3) • Synthetic hard negatives • Contrastive learning with dynamic batching • Multi-task training across 50+ tasks
Pricing
Embed v4 pricing (per 1M tokens): • embed-v4: $0.10 • embed-v4-light: $0.02 (384 dims, slightly lower accuracy)
Compared to competitors: • OpenAI text-embedding-3-small: $0.02 (1536 dims) • OpenAI text-embedding-3-large: $0.13 (3072 dims)
Migration Guide
Upgrading from v3 to v4:
`python Old (v3) response = co.embed( texts=texts, model="embed-english-v3.0" )
New (v4) response = co.embed( texts=texts, model="embed-v4", input_type="search_document" New parameter ) `
Note: v3 and v4 embeddings are not compatible. You must re-embed your entire corpus.
Use Cases
Embed v4 particularly excels at: • Multilingual search: Better cross-language retrieval • Code search: Improved semantic code understanding • Domain-specific RAG: Instruction parameter helps adaptation • Large-scale systems: Reduced dimensions = lower costs
Benchmarks
Retrieval Tasks
Tested on BeIR benchmark (zero-shot retrieval):
| Dataset | Embed v3 | Embed v4 | Improvement | |---------|----------|----------|-------------| | NQ | 52.8 | 56.3 | +6.6% | | HotpotQA | 63.2 | 67.1 | +6.2% | | FEVER | 75.3 | 79.8 | +6.0% | | Climate-FEVER | 23.1 | 28.4 | +22.9% | | SciFact | 66.2 | 71.8 | +8.5% |
Classification
On standard text classification benchmarks: • Banking77: 86.2% → 89.1% (+3.4%) • Amazon Reviews: 63.8% → 67.2% (+5.3%) • TREC: 91.3% → 93.7% (+2.6%)
Availability • Generally available via Cohere API • Supported in all SDKs (Python, Node.js, Go, Java) • Coming soon to AWS Bedrock and Azure • Self-hosted option via Cohere Private Deployment
Best Practices
Dimension Selection • 768 dims: Default, best quality • 512 dims: Good balance for most use cases • 256 dims: Cost-optimized, still strong performance
Input Types • search_document: For documents being indexed • search_query: For search queries • classification: For classification tasks • clustering`: For clustering tasks
Migration Strategy Test v4 on sample queries Compare retrieval quality Re-embed corpus incrementally Use A/B testing during transition
Conclusion
Embed v4 sets a new standard for production embedding models, combining state-of-the-art accuracy with practical efficiency improvements. The flexible dimensions via Matryoshka embeddings make it suitable for a wide range of deployment scenarios and budgets.