4. StorageIntermédiaire

Vector Databases: Storing and Searching Embeddings

1 février 2025
14 min read
Ailog Research Team

Comprehensive guide to vector databases for RAG: comparison of popular options, indexing strategies, and performance optimization.

TL;DR

  • For prototyping: ChromaDB (embedded, zero setup)
  • For production: Pinecone (managed) or Qdrant (self-hosted)
  • Need hybrid search: Weaviate or Elasticsearch
  • Key metric: Query latency <100ms for good UX
  • Test vector DBs on Ailog without infrastructure

What is a Vector Database?

A vector database is a specialized database optimized for storing and searching high-dimensional vectors (embeddings). Unlike traditional databases that search by exact matches or ranges, vector databases find items by semantic similarity.

Core Capabilities

  1. Vector storage: Efficiently store millions of high-dimensional vectors
  2. Similarity search: Find nearest neighbors in vector space
  3. Metadata filtering: Combine semantic search with traditional filters
  4. Scalability: Handle billions of vectors with low latency
  5. CRUD operations: Create, read, update, delete vectors

Why Not Use a Regular Database?

Traditional databases struggle with vector search:

Problem: Curse of dimensionality

  • High-dimensional spaces behave counterintuitively
  • Distance metrics become less meaningful
  • Exhaustive search is O(n×d) - too slow at scale

Vector DB Solution: Approximate Nearest Neighbor (ANN)

  • Specialized indexing (HNSW, IVF, etc.)
  • Sub-linear search time: O(log n) typical
  • Trade exactness for speed (99%+ recall)

Popular Vector Databases

Pinecone

Type: Managed cloud service

Pros:

  • Fully managed, no infrastructure
  • Easy to use, great DX
  • Auto-scaling
  • High performance
  • Good documentation

Cons:

  • Cost at scale
  • Vendor lock-in
  • Limited self-hosting

Pricing:

  • Starter: Free (1 index, 100K vectors)
  • Standard: ~$70/month (1M vectors, 1 pod)
  • Enterprise: Custom

Best for:

  • Quick prototypes
  • Production without ops overhead
  • When budget allows

Weaviate

Type: Open source, self-hostable

Pros:

  • Open source (Apache 2.0)
  • Hybrid search (vector + keyword)
  • GraphQL API
  • Multi-tenancy support
  • Active community

Cons:

  • More complex setup
  • Self-hosting overhead
  • Learning curve

Hosting:

  • Self-hosted: Free (infrastructure costs)
  • Weaviate Cloud: From $25/month

Best for:

  • Self-hosting requirement
  • Hybrid search needs
  • Complex filtering

Qdrant

Type: Open source, Rust-based

Pros:

  • Very fast (Rust performance)
  • Rich filtering capabilities
  • Good Python SDK
  • Easy Docker deployment
  • Snapshot support

Cons:

  • Smaller ecosystem than others
  • Less mature managed offering

Hosting:

  • Self-hosted: Free
  • Qdrant Cloud: From $25/month

Best for:

  • Performance-critical applications
  • Complex filtering requirements
  • Self-hosting with ease

Chroma

Type: Open source, embedded

Pros:

  • Embedded mode (no server needed)
  • Simple API
  • Good for development
  • Free and open source

Cons:

  • Limited scale
  • No multi-user support in embedded mode
  • Fewer features than others

Best for:

  • Development and prototyping
  • Small-scale applications
  • Embedded use cases

Milvus

Type: Open source, cloud-native

Pros:

  • Highly scalable (billions of vectors)
  • Multiple index types
  • Cloud-native architecture
  • GPU support

Cons:

  • Complex setup
  • Resource-intensive
  • Steeper learning curve

Hosting:

  • Self-hosted: Free
  • Zilliz Cloud (managed): Custom pricing

Best for:

  • Large-scale production
  • Multi-index requirements
  • When scale is primary concern

PostgreSQL + pgvector

Type: Extension for PostgreSQL

Pros:

  • Use existing PostgreSQL infrastructure
  • ACID guarantees
  • Rich SQL ecosystem
  • Easy integration

Cons:

  • Not optimized for massive scale
  • Slower than specialized vector DBs
  • Limited to millions, not billions

Cost:

  • Free (extension)
  • Postgres hosting costs

Best for:

  • Already using PostgreSQL
  • Need transactional guarantees
  • Moderate scale (< 1M vectors)

Comparison Matrix

DatabaseManagedOpen SourceScaleBest Feature
PineconeHighEase of use
WeaviateHighHybrid search
QdrantHighPerformance
ChromaLowSimplicity
MilvusVery HighScalability
pgvectorMediumSQL integration

Indexing Strategies

HNSW (Hierarchical Navigable Small Worlds)

How it works:

  • Multi-layer graph structure
  • Navigable small-world properties
  • Greedy search from top layer down

Characteristics:

  • Fast search: O(log n)
  • High recall (95-99%)
  • Memory-intensive
  • Slow index building

Parameters:

DEVELOPERpython
index_config = { 'M': 16, # Connections per node (tradeoff: recall vs memory) 'ef_construction': 64 # Search width during build (higher = better recall) } search_params = { 'ef': 32 # Search width at query time (higher = better recall, slower) }

Tuning:

  • M: 8-64 (16 default). Higher = better recall, more memory
  • ef_construction: 64-512. Higher = better index quality, slower build
  • ef: 32-512. Higher = better recall, slower search

Best for:

  • High recall requirements
  • Read-heavy workloads
  • When memory is available

IVF (Inverted File Index)

How it works:

  • Cluster vectors into partitions (Voronoi cells)
  • Search only nearby partitions
  • Coarse-to-fine approach

Parameters:

DEVELOPERpython
index_config = { 'nlist': 100, # Number of clusters (√n to 4×√n typical) } search_params = { 'nprobe': 10 # Number of clusters to search }

Tuning:

  • nlist: sqrt(N) typical. More = faster search, slower build
  • nprobe: 1 to nlist. Higher = better recall, slower search

Best for:

  • Very large datasets
  • Acceptable recall tradeoff
  • When memory is limited

Flat (Brute Force)

How it works:

  • Compare query to every vector
  • Exact nearest neighbors
  • No indexing required

Characteristics:

  • 100% recall
  • O(n) search time
  • No index overhead

Best for:

  • Small datasets (< 10K vectors)
  • Exact results required
  • Ground truth evaluation

HNSW vs IVF

AspectHNSWIVF
SpeedVery fastFast
RecallHigher (98-99%)Lower (90-95%)
MemoryHighLower
Build timeSlowMedium
UpdatesExpensiveCheaper
Best scaleMillionsBillions

Metadata Filtering

Combine vector similarity with traditional filters.

Pre-filtering

Filter first, then search vectors.

DEVELOPERpython
# Filter by metadata, then vector search within results results = db.query( vector=query_embedding, filter={"category": "electronics", "price": {"$lt": 1000}}, limit=10 )

Pros:

  • Exact filter application
  • No irrelevant results

Cons:

  • May reduce candidate set too much
  • Slower if filter is selective

Post-filtering

Search vectors first, then filter results.

DEVELOPERpython
# Vector search first, filter results results = db.query( vector=query_embedding, limit=100 # Overfetch ) filtered = [r for r in results if r.metadata.get('category') == 'electronics'][:10]

Pros:

  • Always get k results (if available)
  • Faster vector search

Cons:

  • May waste computation on filtered-out results
  • Less efficient

Hybrid (HNSW-IF)

Modern approach: filter-aware indexing.

DEVELOPERpython
# Efficient combined search results = db.query( vector=query_embedding, filter={"category": "electronics"}, limit=10, filter_strategy="hnsw_if" # Filter-aware HNSW traversal )

How it works:

  • HNSW graph traversal respects filters
  • Skip filtered-out nodes during search
  • Best of both approaches

Best for:

  • Production RAG systems
  • When filtering is common
  • Supported by Qdrant, Weaviate

Distance Metrics

Cosine Similarity

Measures angle between vectors.

DEVELOPERpython
similarity = dot(a, b) / (norm(a) * norm(b))

Range: [-1, 1] (higher = more similar)

Best for:

  • Normalized embeddings
  • Most common choice
  • Text embeddings

Euclidean (L2) Distance

Straight-line distance.

DEVELOPERpython
distance = sqrt(sum((a - b) ** 2))

Range: [0, ∞] (lower = more similar)

Best for:

  • Unnormalized embeddings
  • Image embeddings
  • When magnitude matters

Dot Product

Simple multiplication.

DEVELOPERpython
score = dot(a, b)

Range: [-∞, ∞] (higher = more similar)

Best for:

  • Normalized embeddings (equivalent to cosine)
  • Fastest computation
  • When vectors are normalized

Note: For normalized vectors:

  • Cosine similarity ≈ Dot product (scaled)
  • Dot product is faster (no division)
  • Use dot product if vectors are normalized

Performance Optimization

Batch Operations

Upload/query in batches for better throughput.

DEVELOPERpython
# Bad: One at a time for vector in vectors: db.upsert(vector) # Good: Batched db.upsert_batch(vectors, batch_size=100)

Async Operations

Parallelize I/O-bound operations.

DEVELOPERpython
import asyncio async def batch_search(queries): tasks = [db.search_async(q) for q in queries] return await asyncio.gather(*tasks) results = asyncio.run(batch_search(query_batch))

Indexing Strategies

Incremental indexing:

  • Add vectors as they arrive
  • Good for dynamic data
  • Maintains index quality

Batch reindexing:

  • Rebuild index periodically
  • Better index quality
  • Downtime required

Dual indexing:

  • Write to two indexes
  • Switch atomically
  • Zero downtime
  • Double storage cost

Sharding

Split data across multiple instances.

DEVELOPERpython
# Route by document ID def get_shard(doc_id, num_shards=4): return hash(doc_id) % num_shards # Parallel search across shards async def search_all_shards(query): tasks = [ search_shard(shard_id, query) for shard_id in range(num_shards) ] results = await asyncio.gather(*tasks) return merge_and_rank(results)

Caching

Cache frequent queries.

DEVELOPERpython
from functools import lru_cache @lru_cache(maxsize=1000) def search_cached(query_text, k=5): embedding = embed(query_text) return db.search(embedding, limit=k)

Monitoring and Observability

Key Metrics

Performance Metrics:

  • Query latency (p50, p95, p99)
  • Indexing throughput
  • CPU/memory utilization

Quality Metrics:

  • Recall@k
  • Precision@k
  • User feedback (thumbs up/down)

Operational Metrics:

  • Index size
  • Number of vectors
  • Query rate
  • Error rate

Instrumentation

DEVELOPERpython
import time def search_with_metrics(query_vector): start = time.time() try: results = db.search(query_vector, limit=10) latency = time.time() - start metrics.record('vector_search_latency', latency) metrics.record('vector_search_success', 1) return results except Exception as e: metrics.record('vector_search_error', 1) raise

Backup and Recovery

Snapshot Strategy

DEVELOPERpython
# Regular snapshots def backup_database(db, backup_path): snapshot = db.create_snapshot() snapshot.save(backup_path) # Restore from snapshot def restore_database(db, backup_path): db.restore_snapshot(backup_path)

Incremental Backups

DEVELOPERpython
# Track changes since last backup last_backup_time = get_last_backup_time() changed_vectors = db.get_vectors_since(last_backup_time) backup_incremental(changed_vectors)

Migration Strategies

Zero-Downtime Migration

DEVELOPERpython
# 1. Set up new database new_db = setup_new_database() # 2. Backfill data async def migrate(): vectors = old_db.scan_all() await new_db.upsert_batch(vectors) # 3. Dual-write during migration def write_both(vector): old_db.upsert(vector) new_db.upsert(vector) # 4. Validate new database assert validate_migration(old_db, new_db) # 5. Switch reads to new database db = new_db # 6. Decommission old database old_db.shutdown()

Cost Optimization

Calculate Costs

DEVELOPERpython
# Storage costs num_vectors = 1_000_000 dimensions = 768 bytes_per_vector = dimensions * 4 # float32 storage_gb = (num_vectors * bytes_per_vector) / (1024 ** 3) storage_cost_monthly = storage_gb * 0.10 # $0.10/GB typical # Query costs (for managed services) queries_per_month = 10_000_000 cost_per_1k_queries = 0.05 query_cost_monthly = (queries_per_month / 1000) * cost_per_1k_queries total_monthly = storage_cost_monthly + query_cost_monthly

Optimization Tactics

  1. Reduce dimensions: Use smaller embedding models
  2. Quantization: Store vectors in lower precision (int8 instead of float32)
  3. Tiered storage: Hot/warm/cold data
  4. Caching: Reduce redundant queries
  5. Batch operations: Lower per-operation overhead

Choosing a Vector Database

Decision Framework

Prototyping / POC:

  • Chroma (embedded) or Pinecone (cloud)
  • Ease of use > performance

Production (Small Scale < 1M vectors):

  • pgvector (if using Postgres)
  • Pinecone (managed simplicity)
  • Qdrant (self-hosted performance)

Production (Medium Scale 1-100M vectors):

  • Qdrant or Weaviate (self-hosted)
  • Pinecone (managed)

Production (Large Scale > 100M vectors):

  • Milvus
  • Weaviate
  • Distributed Pinecone

Hybrid Search Required:

  • Weaviate (best hybrid support)
  • Elasticsearch with vector plugin

Need SQL:

  • pgvector

Migration Path

Start simple, scale up as needed:

  1. Development: Chroma (embedded)
  2. MVP: Pinecone or pgvector
  3. Scale: Qdrant or Weaviate (self-hosted)
  4. Massive scale: Milvus or distributed setup

💡 Expert Tip from Ailog: Don't prematurely optimize your vector database choice. We've run production RAG systems serving millions of queries on both Pinecone and self-hosted Qdrant. The database is rarely the bottleneck – poor chunking or embedding strategies are. Start with ChromaDB for prototyping, move to Pinecone for simplicity or Qdrant for control. Only consider Milvus/Weaviate when you're serving 10M+ queries/month.

Compare Vector Databases on Ailog

Test different vector databases with your actual data:

Ailog supports:

  • ChromaDB, Pinecone, Qdrant, Weaviate
  • Performance benchmarks with your documents
  • Cost projections based on your scale
  • One-click migration between databases

Try all vector DBs free →

Next Steps

With embeddings stored and searchable, the next challenge is retrieving the most relevant context. Advanced retrieval strategies including hybrid search, query expansion, and reranking are covered in the next guide.

Tags

vector databaseindexingsimilarity searchperformance

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !