Query Decomposition Breakthrough: DecomposeRAG Handles Complex Questions 50% Better
UC Berkeley researchers introduce DecomposeRAG, an automated query decomposition framework that significantly improves multi-hop question answering.
Research Overview
UC Berkeley's NLP lab published DecomposeRAG, a framework that automatically breaks complex queries into simpler sub-queries, achieving state-of-the-art results on multi-hop QA benchmarks.
The Problem
Complex questions require multi-hop reasoning:
Example: "What is the population of the capital of the country where the Eiffel Tower is located?"
Requires:
- Where is the Eiffel Tower? → France
- What is the capital of France? → Paris
- What is the population of Paris? → 2.1 million
Traditional RAG retrieves context for the full question, often missing intermediate steps.
DecomposeRAG Approach
Automatic Decomposition
Uses GPT-4 to break queries into sub-questions:
DEVELOPERpythondef decompose_query(complex_query): prompt = f"""Break this question into simple sub-questions that must be answered in order. Question: {complex_query} Sub-questions (in order): 1.""" response = gpt4.generate(prompt) sub_questions = parse_questions(response) return sub_questions
Sequential Retrieval
Answer sub-questions in order, using previous answers as context:
DEVELOPERpythondef sequential_rag(sub_questions): context = "" for i, sub_q in enumerate(sub_questions): # Retrieve for this sub-question docs = retrieve(sub_q + " " + context, k=5) # Generate answer answer = llm.generate( query=sub_q, context=docs, previous_answers=context ) # Add to cumulative context context += f"\nQ{i+1}: {sub_q}\nA{i+1}: {answer}\n" return answer # Answer to final sub-question
Answer Validation
Validates each intermediate answer before proceeding:
DEVELOPERpythondef validate_answer(question, answer, retrieved_docs): prompt = f"""Is this answer supported by the documents? Question: {question} Answer: {answer} Documents: {retrieved_docs} Supported? (yes/no):""" validation = llm.generate(prompt) return "yes" in validation.lower()
If validation fails, retry with more context or alternative retrieval strategy.
Benchmark Results
Tested on four multi-hop QA datasets:
| Dataset | Baseline RAG | DecomposeRAG | Improvement |
|---|---|---|---|
| HotpotQA | 45.3% | 68.7% | +51.7% |
| 2WikiMultihopQA | 38.2% | 57.9% | +51.6% |
| MuSiQue | 32.1% | 49.8% | +55.1% |
| IIRC | 41.7% | 62.3% | +49.4% |
Average improvement: +52%
Comparison to Other Methods
| Method | Avg F1 | Cost (relative) |
|---|---|---|
| Standard RAG | 39.3% | 1x |
| Chain-of-Thought | 43.8% | 2x |
| ReACT | 48.2% | 3x |
| DecomposeRAG | 59.7% | 2.5x |
DecomposeRAG achieves best accuracy at moderate cost.
Key Insights
When Decomposition Helps
Effectiveness varies by query complexity:
| Hops | Baseline | DecomposeRAG | Gain |
|---|---|---|---|
| 1 (simple) | 68.2% | 69.1% | +1.3% |
| 2 (medium) | 51.3% | 67.4% | +31.4% |
| 3 (complex) | 28.7% | 52.3% | +82.2% |
| 4+ (very complex) | 15.2% | 38.9% | +156.3% |
Finding: More hops = bigger gains from decomposition.
Decomposition Quality
Analyzed quality of LLM-generated decompositions:
- Correct decomposition: 87.3%
- Missing steps: 8.2%
- Incorrect order: 3.1%
- Circular logic: 1.4%
Even imperfect decompositions improve results.
Error Analysis
Where does DecomposeRAG fail?
- Decomposition errors (23%): Wrong sub-questions
- Retrieval failures (34%): Can't find relevant docs for sub-question
- Answer errors (28%): Wrong intermediate answer propagates
- Integration failures (15%): Can't combine sub-answers
Most common: Retrieval still fails for sub-questions.
Implementation
Basic Version
DEVELOPERpythonclass DecomposeRAG: def __init__(self, retriever, llm): self.retriever = retriever self.llm = llm async def query(self, complex_question): # Step 1: Decompose sub_questions = await self.decompose(complex_question) # Step 2: Sequential RAG context = "" for sub_q in sub_questions: # Retrieve docs = await self.retriever.retrieve( sub_q + " " + context, k=5 ) # Generate answer = await self.llm.generate( query=sub_q, context=docs, previous=context ) context += f"\n{sub_q} -> {answer}" # Return final answer return answer async def decompose(self, query): # Use LLM to decompose return await self.llm.decompose(query)
Advanced: With Validation
DEVELOPERpythonclass ValidatedDecomposeRAG(DecomposeRAG): async def query(self, complex_question, max_retries=2): sub_questions = await self.decompose(complex_question) context = "" for sub_q in sub_questions: for attempt in range(max_retries): docs = await self.retriever.retrieve(sub_q + " " + context) answer = await self.llm.generate(sub_q, docs, context) # Validate if await self.validate(sub_q, answer, docs): context += f"\n{sub_q} -> {answer}" break elif attempt == max_retries - 1: # Failed validation, use best-effort answer context += f"\n{sub_q} -> {answer} (unverified)" return answer
Optimizations
Parallel Sub-Queries
When sub-questions are independent:
DEVELOPERpython# Identify independent sub-questions dependencies = analyze_dependencies(sub_questions) # Group independent questions independent_groups = group_by_dependencies(sub_questions, dependencies) # Process groups in parallel for group in independent_groups: # Parallel retrieval for group results = await asyncio.gather(*[ self.retrieve_and_answer(q, context) for q in group ]) # Add all to context for q, answer in zip(group, results): context += f"\n{q} -> {answer}"
Caching Intermediate Results
DEVELOPERpythonclass CachedDecomposeRAG(DecomposeRAG): def __init__(self, retriever, llm): super().__init__(retriever, llm) self.cache = {} async def retrieve_and_answer(self, sub_q, context): cache_key = hash(sub_q + context) if cache_key in self.cache: return self.cache[cache_key] result = await super().retrieve_and_answer(sub_q, context) self.cache[cache_key] = result return result
Practical Considerations
Latency
DecomposeRAG is 2-3x slower:
- 2-hop query: +2-3 seconds
- 3-hop query: +4-6 seconds
- 4-hop query: +6-10 seconds
Mitigation:
- Parallel sub-queries when possible
- Cache common decompositions
- Use faster LLMs for intermediate steps
Cost
More LLM calls = higher cost:
- Decomposition: 1 LLM call
- Each sub-question: 1 LLM call
- Validation (optional): 1 call per sub-question
Example:
- 3 sub-questions + validation = 7 LLM calls
- vs. 1 call for standard RAG
Cost multiplier: 2-5x depending on complexity
When to Use
Use DecomposeRAG when:
- Questions are complex (multi-hop)
- Accuracy more important than speed
- Budget allows higher costs
Use standard RAG when:
- Simple lookups
- Speed critical
- Cost-sensitive
Future Directions
Planned improvements:
- Better decomposition: Fine-tune smaller models
- Adaptive strategy: Auto-detect when to decompose
- Iterative refinement: Retry failed sub-questions
- Multimodal: Decompose across modalities
Resources
- Paper: "DecomposeRAG: Automatic Query Decomposition for Multi-Hop Question Answering"
- Code: github.com/berkeley-nlp/decomposerag
- Demo: decomposerag.demo.berkeley.edu
Conclusion
DecomposeRAG demonstrates that explicit query decomposition significantly improves multi-hop question answering. While costlier and slower than standard RAG, the accuracy gains justify the overhead for complex queries where correctness is critical.
Tags
Related Guides
Microsoft Research Introduces GraphRAG: Combining Knowledge Graphs with RAG
Microsoft Research unveils GraphRAG, a novel approach that combines RAG with knowledge graphs to improve contextual understanding
Automatic RAG Evaluation: New Framework Achieves 95% Correlation with Human Judgments
Google Research introduces AutoRAGEval, an automated evaluation framework that reliably assesses RAG quality without human annotation.
New Research: Cross-Encoder Reranking Improves RAG Accuracy by 40%
MIT study demonstrates that two-stage retrieval with cross-encoder reranking significantly outperforms single-stage vector search across multiple benchmarks.