Best Vector Databases for RAG in 2025: Pinecone vs Qdrant vs Weaviate
Complete comparison of vector databases for RAG: Pinecone, Qdrant, Weaviate, Milvus, Chroma. Benchmarks, pricing, and recommendations for your use case.
- Author
- Ailog Research Team
- Published
- Reading time
- 14 min read
- Level
- intermediate
- RAG Pipeline Step
- Storage
TL;DR • For prototyping: ChromaDB (embedded, zero setup) • For production: Pinecone (managed) or Qdrant (self-hosted) • Need hybrid search: Weaviate or Elasticsearch • Key metric: Query latency <100ms for good UX • Test vector DBs on Ailog without infrastructure
What is a Vector Database?
A vector database is a specialized database optimized for storing and searching high-dimensional vectors (embeddings). Unlike traditional databases that search by exact matches or ranges, vector databases find items by semantic similarity.
Core Capabilities Vector storage: Efficiently store millions of high-dimensional vectors Similarity search: Find nearest neighbors in vector space Metadata filtering: Combine semantic search with traditional filters Scalability: Handle billions of vectors with low latency CRUD operations: Create, read, update, delete vectors
Why Not Use a Regular Database?
Traditional databases struggle with vector search:
Problem: Curse of dimensionality • High-dimensional spaces behave counterintuitively • Distance metrics become less meaningful • Exhaustive search is O(n×d) - too slow at scale
Vector DB Solution: Approximate Nearest Neighbor (ANN) • Specialized indexing (HNSW, IVF, etc.) • Sub-linear search time: O(log n) typical • Trade exactness for speed (99%+ recall)
Popular Vector Databases
Pinecone
Type: Managed cloud service
Pros: • Fully managed, no infrastructure • Easy to use, great DX • Auto-scaling • High performance • Good documentation
Cons: • Cost at scale • Vendor lock-in • Limited self-hosting
Pricing: • Starter: Free (1 index, 100K vectors) • Standard: ~$70/month (1M vectors, 1 pod) • Enterprise: Custom
Best for: • Quick prototypes • Production without ops overhead • When budget allows
Weaviate
Type: Open source, self-hostable
Pros: • Open source (Apache 2.0) • Hybrid search (vector + keyword) • GraphQL API • Multi-tenancy support • Active community
Cons: • More complex setup • Self-hosting overhead • Learning curve
Hosting: • Self-hosted: Free (infrastructure costs) • Weaviate Cloud: From $25/month
Best for: • Self-hosting requirement • Hybrid search needs • Complex filtering
Qdrant
Type: Open source, Rust-based
Pros: • Very fast (Rust performance) • Rich filtering capabilities • Good Python SDK • Easy Docker deployment • Snapshot support
Cons: • Smaller ecosystem than others • Less mature managed offering
Hosting: • Self-hosted: Free • Qdrant Cloud: From $25/month
Best for: • Performance-critical applications • Complex filtering requirements • Self-hosting with ease
Chroma
Type: Open source, embedded
Pros: • Embedded mode (no server needed) • Simple API • Good for development • Free and open source
Cons: • Limited scale • No multi-user support in embedded mode • Fewer features than others
Best for: • Development and prototyping • Small-scale applications • Embedded use cases
Milvus
Type: Open source, cloud-native
Pros: • Highly scalable (billions of vectors) • Multiple index types • Cloud-native architecture • GPU support
Cons: • Complex setup • Resource-intensive • Steeper learning curve
Hosting: • Self-hosted: Free • Zilliz Cloud (managed): Custom pricing
Best for: • Large-scale production • Multi-index requirements • When scale is primary concern
PostgreSQL + pgvector
Type: Extension for PostgreSQL
Pros: • Use existing PostgreSQL infrastructure • ACID guarantees • Rich SQL ecosystem • Easy integration
Cons: • Not optimized for massive scale • Slower than specialized vector DBs • Limited to millions, not billions
Cost: • Free (extension) • Postgres hosting costs
Best for: • Already using PostgreSQL • Need transactional guarantees • Moderate scale (< 1M vectors)
Comparison Matrix
| Database | Managed | Open Source | Scale | Best Feature | |----------|---------|-------------|-------|--------------| | Pinecone | ✅ | ❌ | High | Ease of use | | Weaviate | ✅ | ✅ | High | Hybrid search | | Qdrant | ✅ | ✅ | High | Performance | | Chroma | ❌ | ✅ | Low | Simplicity | | Milvus | ✅ | ✅ | Very High | Scalability | | pgvector | ❌ | ✅ | Medium | SQL integration |
Indexing Strategies
HNSW (Hierarchical Navigable Small Worlds)
How it works: • Multi-layer graph structure • Navigable small-world properties • Greedy search from top layer down
Characteristics: • Fast search: O(log n) • High recall (95-99%) • Memory-intensive • Slow index building
Parameters: ``python index_config = { 'M': 16, Connections per node (tradeoff: recall vs memory) 'ef_construction': 64 Search width during build (higher = better recall) }
search_params = { 'ef': 32 Search width at query time (higher = better recall, slower) } `
Tuning: • M: 8-64 (16 default). Higher = better recall, more memory • ef_construction: 64-512. Higher = better index quality, slower build • ef: 32-512. Higher = better recall, slower search
Best for: • High recall requirements • Read-heavy workloads • When memory is available
IVF (Inverted File Index)
How it works: • Cluster vectors into partitions (Voronoi cells) • Search only nearby partitions • Coarse-to-fine approach
Parameters: `python index_config = { 'nlist': 100, Number of clusters (√n to 4×√n typical) }
search_params = { 'nprobe': 10 Number of clusters to search } `
Tuning: • nlist: sqrt(N) typical. More = faster search, slower build • nprobe: 1 to nlist. Higher = better recall, slower search
Best for: • Very large datasets • Acceptable recall tradeoff • When memory is limited
Flat (Brute Force)
How it works: • Compare query to every vector • Exact nearest neighbors • No indexing required
Characteristics: • 100% recall • O(n) search time • No index overhead
Best for: • Small datasets (< 10K vectors) • Exact results required • Ground truth evaluation
HNSW vs IVF
| Aspect | HNSW | IVF | |--------|------|-----| | Speed | Very fast | Fast | | Recall | Higher (98-99%) | Lower (90-95%) | | Memory | High | Lower | | Build time | Slow | Medium | | Updates | Expensive | Cheaper | | Best scale | Millions | Billions |
Metadata Filtering
Combine vector similarity with traditional filters.
Pre-filtering
Filter first, then search vectors.
`python Filter by metadata, then vector search within results results = db.query( vector=query_embedding, filter={"category": "electronics", "price": {"$lt": 1000}}, limit=10 ) `
Pros: • Exact filter application • No irrelevant results
Cons: • May reduce candidate set too much • Slower if filter is selective
Post-filtering
Search vectors first, then filter results.
`python Vector search first, filter results results = db.query( vector=query_embedding, limit=100 Overfetch )
filtered = [r for r in results if r.metadata.get('category') == 'electronics'][:10] `
Pros: • Always get k results (if available) • Faster vector search
Cons: • May waste computation on filtered-out results • Less efficient
Hybrid (HNSW-IF)
Modern approach: filter-aware indexing.
`python Efficient combined search results = db.query( vector=query_embedding, filter={"category": "electronics"}, limit=10, filter_strategy="hnsw_if" Filter-aware HNSW traversal ) `
How it works: • HNSW graph traversal respects filters • Skip filtered-out nodes during search • Best of both approaches
Best for: • Production RAG systems • When filtering is common • Supported by Qdrant, Weaviate
Distance Metrics
Cosine Similarity
Measures angle between vectors.
`python similarity = dot(a, b) / (norm(a) norm(b)) `
Range: [-1, 1] (higher = more similar)
Best for: • Normalized embeddings • Most common choice • Text embeddings
Euclidean (L2) Distance
Straight-line distance.
`python distance = sqrt(sum((a - b) 2)) `
Range: [0, ∞] (lower = more similar)
Best for: • Unnormalized embeddings • Image embeddings • When magnitude matters
Dot Product
Simple multiplication.
`python score = dot(a, b) `
Range: [-∞, ∞] (higher = more similar)
Best for: • Normalized embeddings (equivalent to cosine) • Fastest computation • When vectors are normalized
Note: For normalized vectors: • Cosine similarity ≈ Dot product (scaled) • Dot product is faster (no division) • Use dot product if vectors are normalized
Performance Optimization
Batch Operations
Upload/query in batches for better throughput.
`python Bad: One at a time for vector in vectors: db.upsert(vector)
Good: Batched db.upsert_batch(vectors, batch_size=100) `
Async Operations
Parallelize I/O-bound operations.
`python import asyncio
async def batch_search(queries): tasks = [db.search_async(q) for q in queries] return await asyncio.gather(tasks)
results = asyncio.run(batch_search(query_batch)) `
Indexing Strategies
Incremental indexing: • Add vectors as they arrive • Good for dynamic data • Maintains index quality
Batch reindexing: • Rebuild index periodically • Better index quality • Downtime required
Dual indexing: • Write to two indexes • Switch atomically • Zero downtime • Double storage cost
Sharding
Split data across multiple instances.
`python Route by document ID def get_shard(doc_id, num_shards=4): return hash(doc_id) % num_shards
Parallel search across shards async def search_all_shards(query): tasks = [ search_shard(shard_id, query) for shard_id in range(num_shards) ] results = await asyncio.gather(tasks) return merge_and_rank(results) `
Caching
Cache frequent queries.
`python from functools import lru_cache
@lru_cache(maxsize=1000) def search_cached(query_text, k=5): embedding = embed(query_text) return db.search(embedding, limit=k) `
Monitoring and Observability
Key Metrics
Performance Metrics: • Query latency (p50, p95, p99) • Indexing throughput • CPU/memory utilization
Quality Metrics: • Recall@k • Precision@k • User feedback (thumbs up/down)
Operational Metrics: • Index size • Number of vectors • Query rate • Error rate
Instrumentation
`python import time
def search_with_metrics(query_vector): start = time.time()
try: results = db.search(query_vector, limit=10) latency = time.time() - start
metrics.record('vector_search_latency', latency) metrics.record('vector_search_success', 1)
return results
except Exception as e: metrics.record('vector_search_error', 1) raise `
Backup and Recovery
Snapshot Strategy
`python Regular snapshots def backup_database(db, backup_path): snapshot = db.create_snapshot() snapshot.save(backup_path)
Restore from snapshot def restore_database(db, backup_path): db.restore_snapshot(backup_path) `
Incremental Backups
`python Track changes since last backup last_backup_time = get_last_backup_time()
changed_vectors = db.get_vectors_since(last_backup_time) backup_incremental(changed_vectors) `
Migration Strategies
Zero-Downtime Migration
`python Set up new database new_db = setup_new_database() Backfill data async def migrate(): vectors = old_db.scan_all() await new_db.upsert_batch(vectors) Dual-write during migration def write_both(vector): old_db.upsert(vector) new_db.upsert(vector) Validate new database assert validate_migration(old_db, new_db) Switch reads to new database db = new_db Decommission old database old_db.shutdown() `
Cost Optimization
Calculate Costs
`python Storage costs num_vectors = 1_000_000 dimensions = 768 bytes_per_vector = dimensions 4 float32
storage_gb = (num_vectors bytes_per_vector) / (1024 3) storage_cost_monthly = storage_gb 0.10 $0.10/GB typical
Query costs (for managed services) queries_per_month = 10_000_000 cost_per_1k_queries = 0.05
query_cost_monthly = (queries_per_month / 1000) * cost_per_1k_queries
total_monthly = storage_cost_monthly + query_cost_monthly ``
Optimization Tactics Reduce dimensions: Use smaller embedding models Quantization: Store vectors in lower precision (int8 instead of float32) Tiered storage: Hot/warm/cold data Caching: Reduce redundant queries Batch operations: Lower per-operation overhead
Choosing a Vector Database
Decision Framework
Prototyping / POC: • Chroma (embedded) or Pinecone (cloud) • Ease of use > performance
Production (Small Scale < 1M vectors): • pgvector (if using Postgres) • Pinecone (managed simplicity) • Qdrant (self-hosted performance)
Production (Medium Scale 1-100M vectors): • Qdrant or Weaviate (self-hosted) • Pinecone (managed)
Production (Large Scale > 100M vectors): • Milvus • Weaviate • Distributed Pinecone
Hybrid Search Required: • Weaviate (best hybrid support) • Elasticsearch with vector plugin
Need SQL: • pgvector
Migration Path
Start simple, scale up as needed: Development: Chroma (embedded) MVP: Pinecone or pgvector Scale: Qdrant or Weaviate (self-hosted) Massive scale: Milvus or distributed setup
> 💡 Expert Tip from Ailog: Don't prematurely optimize your vector database choice. We've run production RAG systems serving millions of queries on both Pinecone and self-hosted Qdrant. The database is rarely the bottleneck – poor chunking or embedding strategies are. Start with ChromaDB for prototyping, move to Pinecone for simplicity or Qdrant for control. Only consider Milvus/Weaviate when you're serving 10M+ queries/month.
Compare Vector Databases on Ailog
Test different vector databases with your actual data:
Ailog supports: • ChromaDB, Pinecone, Qdrant, Weaviate • Performance benchmarks with your documents • Cost projections based on your scale • One-click migration between databases
Try all vector DBs free →
Next Steps
With embeddings stored and searchable, the next challenge is retrieving the most relevant context. Advanced retrieval strategies including hybrid search, query expansion, and reranking are covered in the next guide.