MMR: Diversify Search Results with Maximal Marginal Relevance
Reduce redundancy in RAG retrieval: use MMR to balance relevance and diversity for better context quality.
- Author
- Ailog Research Team
- Published
- Reading time
- 9 min read
- Level
- advanced
- RAG Pipeline Step
- Retrieval
The Redundancy Problem
Standard similarity search returns similar documents - often too similar:
`` Query: "How does photosynthesis work?"
Top 5 results: "Photosynthesis converts light to energy..." "Photosynthesis is how plants make energy..." ← Redundant "Plants use photosynthesis to create..." ← Redundant "Chlorophyll enables photosynthesis..." ← Different aspect! "Photosynthesis occurs in chloroplasts..." ← Different aspect! `
You waste context on repetition.
Maximal Marginal Relevance (MMR)
MMR balances relevance to query and diversity from already-selected docs.
Formula: ` MMR = argmax[λ Sim(Di, Q) - (1-λ) max Sim(Di, Dj)] Di ↑ ↑ relevance to query similarity to selected `
λ = 0.7 typical (70% relevance, 30% diversity)
Implementation
`python import numpy as np from sklearn.metrics.pairwise import cosine_similarity
def mmr_search(query_embedding, doc_embeddings, documents, k=10, lambda_param=0.7): """ MMR retrieval Args: query_embedding: Query vector doc_embeddings: All document vectors documents: Original documents k: Number of results lambda_param: Relevance vs diversity (0-1) """ Calculate similarity to query query_sim = cosine_similarity([query_embedding], doc_embeddings)[0] selected_indices = [] remaining_indices = list(range(len(documents))) Select first document (most similar to query) first_idx = np.argmax(query_sim) selected_indices.append(first_idx) remaining_indices.remove(first_idx) Iteratively select k-1 more documents for _ in range(k - 1): mmr_scores = [] for idx in remaining_indices: Relevance to query relevance = query_sim[idx] Max similarity to already selected docs selected_embeddings = doc_embeddings[selected_indices] diversity = max(cosine_similarity([doc_embeddings[idx]], selected_embeddings)[0]) MMR score mmr_score = lambda_param relevance - (1 - lambda_param) diversity mmr_scores.append((idx, mmr_score)) Select doc with highest MMR score best_idx = max(mmr_scores, key=lambda x: x[1])[0] selected_indices.append(best_idx) remaining_indices.remove(best_idx) return [documents[i] for i in selected_indices] `
LangChain Built-in MMR
`python from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings
vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings())
MMR search results = vectorstore.max_marginal_relevance_search( query="photosynthesis", k=5, fetch_k=20, Fetch 20 candidates, return diverse 5 lambda_mult=0.7 Relevance weight ) `
When to Use MMR
Use MMR when: • Documents have high overlap • Context window is limited • You need broad coverage • Multi-aspect queries ("tell me about X, Y, and Z")
Skip MMR when: • Documents already diverse • Speed is critical (MMR is slower) • Single-aspect queries
Tuning Lambda
`python Test different lambda values lambdas = [0.3, 0.5, 0.7, 0.9]
for lam in lambdas: results = mmr_search(query, embeddings, docs, lambda_param=lam) Measure diversity diversity = measure_diversity(results) relevance = measure_relevance(results, query) print(f"λ={lam}: Relevance={relevance:.2f}, Diversity={diversity:.2f}") `
Recommendations: • High redundancy domain: λ = 0.5-0.6 • General purpose: λ = 0.7 • Precision critical: λ = 0.8-0.9
Performance Optimization
MMR is O(k²) - slow for large k:
`python Faster: Fetch candidates first def fast_mmr(query, vectordb, k=10, fetch_k=100, lambda_param=0.7): Get fetch_k candidates (fast vector search) candidates = vectordb.search(query, k=fetch_k) Apply MMR on smaller set return mmr_search( query_embedding=query, doc_embeddings=[c['embedding'] for c in candidates], documents=[c['doc'] for c in candidates], k=k, lambda_param=lambda_param ) `
Combining with Reranking
Pipeline: Retrieval → MMR → Reranking
`python Initial retrieval candidates = vector_search(query, k=100) Diversify with MMR diverse_docs = mmr_search(query, candidates, k=20) Rerank for precision final_results = cross_encoder_rerank(query, diverse_docs, k=10) ``
MMR ensures your LLM sees varied, non-redundant context. Essential for high-quality RAG.