MMR: Diversify Search Results with Maximal Marginal Relevance
Reduce redundancy in RAG retrieval: use MMR to balance relevance and diversity for better context quality.
The Redundancy Problem
Standard similarity search returns similar documents - often too similar:
Query: "How does photosynthesis work?"
Top 5 results:
1. "Photosynthesis converts light to energy..."
2. "Photosynthesis is how plants make energy..." ← Redundant
3. "Plants use photosynthesis to create..." ← Redundant
4. "Chlorophyll enables photosynthesis..." ← Different aspect!
5. "Photosynthesis occurs in chloroplasts..." ← Different aspect!
You waste context on repetition.
Maximal Marginal Relevance (MMR)
MMR balances relevance to query and diversity from already-selected docs.
Formula:
MMR = argmax[λ * Sim(Di, Q) - (1-λ) * max Sim(Di, Dj)]
Di ↑ ↑
relevance to query similarity to selected
λ = 0.7 typical (70% relevance, 30% diversity)
Implementation
DEVELOPERpythonimport numpy as np from sklearn.metrics.pairwise import cosine_similarity def mmr_search(query_embedding, doc_embeddings, documents, k=10, lambda_param=0.7): """ MMR retrieval Args: query_embedding: Query vector doc_embeddings: All document vectors documents: Original documents k: Number of results lambda_param: Relevance vs diversity (0-1) """ # Calculate similarity to query query_sim = cosine_similarity([query_embedding], doc_embeddings)[0] selected_indices = [] remaining_indices = list(range(len(documents))) # Select first document (most similar to query) first_idx = np.argmax(query_sim) selected_indices.append(first_idx) remaining_indices.remove(first_idx) # Iteratively select k-1 more documents for _ in range(k - 1): mmr_scores = [] for idx in remaining_indices: # Relevance to query relevance = query_sim[idx] # Max similarity to already selected docs selected_embeddings = doc_embeddings[selected_indices] diversity = max(cosine_similarity([doc_embeddings[idx]], selected_embeddings)[0]) # MMR score mmr_score = lambda_param * relevance - (1 - lambda_param) * diversity mmr_scores.append((idx, mmr_score)) # Select doc with highest MMR score best_idx = max(mmr_scores, key=lambda x: x[1])[0] selected_indices.append(best_idx) remaining_indices.remove(best_idx) return [documents[i] for i in selected_indices]
LangChain Built-in MMR
DEVELOPERpythonfrom langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings()) # MMR search results = vectorstore.max_marginal_relevance_search( query="photosynthesis", k=5, fetch_k=20, # Fetch 20 candidates, return diverse 5 lambda_mult=0.7 # Relevance weight )
When to Use MMR
Use MMR when:
- Documents have high overlap
- Context window is limited
- You need broad coverage
- Multi-aspect queries ("tell me about X, Y, and Z")
Skip MMR when:
- Documents already diverse
- Speed is critical (MMR is slower)
- Single-aspect queries
Tuning Lambda
DEVELOPERpython# Test different lambda values lambdas = [0.3, 0.5, 0.7, 0.9] for lam in lambdas: results = mmr_search(query, embeddings, docs, lambda_param=lam) # Measure diversity diversity = measure_diversity(results) relevance = measure_relevance(results, query) print(f"λ={lam}: Relevance={relevance:.2f}, Diversity={diversity:.2f}")
Recommendations:
- High redundancy domain: λ = 0.5-0.6
- General purpose: λ = 0.7
- Precision critical: λ = 0.8-0.9
Performance Optimization
MMR is O(k²) - slow for large k:
DEVELOPERpython# Faster: Fetch candidates first def fast_mmr(query, vectordb, k=10, fetch_k=100, lambda_param=0.7): # 1. Get fetch_k candidates (fast vector search) candidates = vectordb.search(query, k=fetch_k) # 2. Apply MMR on smaller set return mmr_search( query_embedding=query, doc_embeddings=[c['embedding'] for c in candidates], documents=[c['doc'] for c in candidates], k=k, lambda_param=lambda_param )
Combining with Reranking
Pipeline: Retrieval → MMR → Reranking
DEVELOPERpython# 1. Initial retrieval candidates = vector_search(query, k=100) # 2. Diversify with MMR diverse_docs = mmr_search(query, candidates, k=20) # 3. Rerank for precision final_results = cross_encoder_rerank(query, diverse_docs, k=10)
MMR ensures your LLM sees varied, non-redundant context. Essential for high-quality RAG.
Tags
Related Guides
Advanced Retrieval Strategies for RAG
Beyond basic similarity search: hybrid search, query expansion, MMR, and multi-stage retrieval for better RAG performance.
Hybrid Search: Combine Semantic and Keyword Search
Boost retrieval accuracy by 20-30%: combine vector search with BM25 keyword matching for superior RAG performance.
Query Expansion: Retrieve More Relevant Results
Improve recall by 40%: expand user queries with synonyms, sub-queries, and LLM-generated variations.