Cohere Launches Embed v4: 30% Better Accuracy with Smaller Model Size

Announcement

Cohere has released Embed v4, their latest embedding model that achieves significant improvements in accuracy, efficiency, and multilingual performance.

Key Improvements

Performance Gains

MTEB (Massive Text Embedding Benchmark) scores:

Model	Dimensions	Avg Score	Retrieval	Classification
Embed v3	1024	64.2	52.3	71.8
Embed v4	768	66.8	55.1	74.2
OpenAI ada-002	1536	60.9	49.2	68.5
OpenAI text-3-large	3072	64.6	54.6	70.1

Reduced Dimensions

Moving from 1024 to 768 dimensions provides:

25% less storage per embedding
20% faster similarity search
15% lower API costs
No accuracy loss (actually improved)

Multilingual Excellence

Embed v4 supports 100+ languages with strong performance:

English: 68.2 (MTEB)
Chinese: 65.1
Spanish: 64.8
Arabic: 62.3
Hindi: 61.7

Cross-lingual retrieval (query in one language, retrieve in another) improved by 35%.

Technical Innovations

Matryoshka Embeddings

Embed v4 uses Matryoshka Representation Learning, allowing flexible dimension reduction:

DEVELOPERpython
# Generate full 768-dim embedding
full_embedding = co.embed(texts=["sample text"], model="embed-v4")

# Truncate to smaller dimensions without recomputing
embedding_256 = full_embedding[:256]  # Use first 256 dims
embedding_512 = full_embedding[:512]  # Use first 512 dims

# Trade-off: smaller size vs. slight accuracy loss

Dimension vs. accuracy:

768 dims: 100% accuracy (baseline)
512 dims: 98.5% accuracy
256 dims: 95.2% accuracy
128 dims: 89.1% accuracy

Instruction-Aware Embeddings

Embed v4 takes optional task instructions for better domain adaptation:

DEVELOPERpython
# Standard embedding
embedding = co.embed(
    texts=["Machine learning model"],
    model="embed-v4"
)

# With task instruction for better domain alignment
embedding = co.embed(
    texts=["Machine learning model"],
    model="embed-v4",
    input_type="search_document",
    embedding_types=["float"]
)

# For queries (different from documents)
query_embedding = co.embed(
    texts=["How does ML work?"],
    model="embed-v4",
    input_type="search_query"
)

Training Improvements

Trained on:

1.2 trillion tokens (3x more than v3)
Synthetic hard negatives
Contrastive learning with dynamic batching
Multi-task training across 50+ tasks

Pricing

Embed v4 pricing (per 1M tokens):

embed-v4: $0.10
embed-v4-light: $0.02 (384 dims, slightly lower accuracy)

Compared to competitors:

OpenAI text-embedding-3-small: $0.02 (1536 dims)
OpenAI text-embedding-3-large: $0.13 (3072 dims)

Migration Guide

Upgrading from v3 to v4:

DEVELOPERpython
# Old (v3)
response = co.embed(
    texts=texts,
    model="embed-english-v3.0"
)

# New (v4)
response = co.embed(
    texts=texts,
    model="embed-v4",
    input_type="search_document"  # New parameter
)

Note: v3 and v4 embeddings are not compatible. You must re-embed your entire corpus.

Use Cases

Embed v4 particularly excels at:

Multilingual search: Better cross-language retrieval
Code search: Improved semantic code understanding
Domain-specific RAG: Instruction parameter helps adaptation
Large-scale systems: Reduced dimensions = lower costs

Benchmarks

Retrieval Tasks

Tested on BeIR benchmark (zero-shot retrieval):

Dataset	Embed v3	Embed v4	Improvement
NQ	52.8	56.3	+6.6%
HotpotQA	63.2	67.1	+6.2%
FEVER	75.3	79.8	+6.0%
Climate-FEVER	23.1	28.4	+22.9%
SciFact	66.2	71.8	+8.5%

Classification

On standard text classification benchmarks:

Banking77: 86.2% → 89.1% (+3.4%)
Amazon Reviews: 63.8% → 67.2% (+5.3%)
TREC: 91.3% → 93.7% (+2.6%)

Availability

Generally available via Cohere API
Supported in all SDKs (Python, Node.js, Go, Java)
Coming soon to AWS Bedrock and Azure
Self-hosted option via Cohere Private Deployment

Best Practices

Dimension Selection

768 dims: Default, best quality
512 dims: Good balance for most use cases
256 dims: Cost-optimized, still strong performance

Input Types

search_document: For documents being indexed
search_query: For search queries
classification: For classification tasks
clustering: For clustering tasks

Migration Strategy

Test v4 on sample queries
Compare retrieval quality
Re-embed corpus incrementally
Use A/B testing during transition

Conclusion

Embed v4 sets a new standard for production embedding models, combining state-of-the-art accuracy with practical efficiency improvements. The flexible dimensions via Matryoshka embeddings make it suitable for a wide range of deployment scenarios and budgets.