5. RetrievalAdvanced

MMR: Diversify Search Results with Maximal Marginal Relevance

November 15, 2025
9 min read
Ailog Research Team

Reduce redundancy in RAG retrieval: use MMR to balance relevance and diversity for better context quality.

The Redundancy Problem

Standard similarity search returns similar documents - often too similar:

Query: "How does photosynthesis work?"

Top 5 results:
1. "Photosynthesis converts light to energy..."
2. "Photosynthesis is how plants make energy..."  ← Redundant
3. "Plants use photosynthesis to create..."      ← Redundant
4. "Chlorophyll enables photosynthesis..."       ← Different aspect!
5. "Photosynthesis occurs in chloroplasts..."    ← Different aspect!

You waste context on repetition.

Maximal Marginal Relevance (MMR)

MMR balances relevance to query and diversity from already-selected docs.

Formula:

MMR = argmax[λ * Sim(Di, Q) - (1-λ) * max Sim(Di, Dj)]
         Di              ↑                    ↑
                   relevance to query    similarity to selected

λ = 0.7 typical (70% relevance, 30% diversity)

Implementation

DEVELOPERpython
import numpy as np from sklearn.metrics.pairwise import cosine_similarity def mmr_search(query_embedding, doc_embeddings, documents, k=10, lambda_param=0.7): """ MMR retrieval Args: query_embedding: Query vector doc_embeddings: All document vectors documents: Original documents k: Number of results lambda_param: Relevance vs diversity (0-1) """ # Calculate similarity to query query_sim = cosine_similarity([query_embedding], doc_embeddings)[0] selected_indices = [] remaining_indices = list(range(len(documents))) # Select first document (most similar to query) first_idx = np.argmax(query_sim) selected_indices.append(first_idx) remaining_indices.remove(first_idx) # Iteratively select k-1 more documents for _ in range(k - 1): mmr_scores = [] for idx in remaining_indices: # Relevance to query relevance = query_sim[idx] # Max similarity to already selected docs selected_embeddings = doc_embeddings[selected_indices] diversity = max(cosine_similarity([doc_embeddings[idx]], selected_embeddings)[0]) # MMR score mmr_score = lambda_param * relevance - (1 - lambda_param) * diversity mmr_scores.append((idx, mmr_score)) # Select doc with highest MMR score best_idx = max(mmr_scores, key=lambda x: x[1])[0] selected_indices.append(best_idx) remaining_indices.remove(best_idx) return [documents[i] for i in selected_indices]

LangChain Built-in MMR

DEVELOPERpython
from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings()) # MMR search results = vectorstore.max_marginal_relevance_search( query="photosynthesis", k=5, fetch_k=20, # Fetch 20 candidates, return diverse 5 lambda_mult=0.7 # Relevance weight )

When to Use MMR

Use MMR when:

  • Documents have high overlap
  • Context window is limited
  • You need broad coverage
  • Multi-aspect queries ("tell me about X, Y, and Z")

Skip MMR when:

  • Documents already diverse
  • Speed is critical (MMR is slower)
  • Single-aspect queries

Tuning Lambda

DEVELOPERpython
# Test different lambda values lambdas = [0.3, 0.5, 0.7, 0.9] for lam in lambdas: results = mmr_search(query, embeddings, docs, lambda_param=lam) # Measure diversity diversity = measure_diversity(results) relevance = measure_relevance(results, query) print(f"λ={lam}: Relevance={relevance:.2f}, Diversity={diversity:.2f}")

Recommendations:

  • High redundancy domain: λ = 0.5-0.6
  • General purpose: λ = 0.7
  • Precision critical: λ = 0.8-0.9

Performance Optimization

MMR is O(k²) - slow for large k:

DEVELOPERpython
# Faster: Fetch candidates first def fast_mmr(query, vectordb, k=10, fetch_k=100, lambda_param=0.7): # 1. Get fetch_k candidates (fast vector search) candidates = vectordb.search(query, k=fetch_k) # 2. Apply MMR on smaller set return mmr_search( query_embedding=query, doc_embeddings=[c['embedding'] for c in candidates], documents=[c['doc'] for c in candidates], k=k, lambda_param=lambda_param )

Combining with Reranking

Pipeline: Retrieval → MMR → Reranking

DEVELOPERpython
# 1. Initial retrieval candidates = vector_search(query, k=100) # 2. Diversify with MMR diverse_docs = mmr_search(query, candidates, k=20) # 3. Rerank for precision final_results = cross_encoder_rerank(query, diverse_docs, k=10)

MMR ensures your LLM sees varied, non-redundant context. Essential for high-quality RAG.

Tags

mmrretrievaldiversityredundancy

Related Guides