Query Decomposition Breakthrough: DecomposeRAG Handles Complex Questions 50% Better
UC Berkeley researchers introduce DecomposeRAG, an automated query decomposition framework that significantly improves multi-hop question answering.
- Author
- Ailog Research Team
- Published
- Reading time
- 4 min read
Research Overview
UC Berkeley's NLP lab published DecomposeRAG, a framework that automatically breaks complex queries into simpler sub-queries, achieving state-of-the-art results on multi-hop QA benchmarks.
The Problem
Complex questions require multi-hop reasoning:
Example: "What is the population of the capital of the country where the Eiffel Tower is located?"
Requires: Where is the Eiffel Tower? → France What is the capital of France? → Paris What is the population of Paris? → 2.1 million
Traditional RAG retrieves context for the full question, often missing intermediate steps.
DecomposeRAG Approach
Automatic Decomposition
Uses GPT-4 to break queries into sub-questions:
``python def decompose_query(complex_query): prompt = f"""Break this question into simple sub-questions that must be answered in order.
Question: {complex_query}
Sub-questions (in order): 1."""
response = gpt4.generate(prompt) sub_questions = parse_questions(response)
return sub_questions `
Sequential Retrieval
Answer sub-questions in order, using previous answers as context:
`python def sequential_rag(sub_questions): context = ""
for i, sub_q in enumerate(sub_questions): Retrieve for this sub-question docs = retrieve(sub_q + " " + context, k=5)
Generate answer answer = llm.generate( query=sub_q, context=docs, previous_answers=context )
Add to cumulative context context += f"\nQ{i+1}: {sub_q}\nA{i+1}: {answer}\n"
return answer Answer to final sub-question `
Answer Validation
Validates each intermediate answer before proceeding:
`python def validate_answer(question, answer, retrieved_docs): prompt = f"""Is this answer supported by the documents?
Question: {question} Answer: {answer}
Documents: {retrieved_docs}
Supported? (yes/no):"""
validation = llm.generate(prompt)
return "yes" in validation.lower() `
If validation fails, retry with more context or alternative retrieval strategy.
Benchmark Results
Tested on four multi-hop QA datasets:
| Dataset | Baseline RAG | DecomposeRAG | Improvement | |---------|--------------|--------------|-------------| | HotpotQA | 45.3% | 68.7% | +51.7% | | 2WikiMultihopQA | 38.2% | 57.9% | +51.6% | | MuSiQue | 32.1% | 49.8% | +55.1% | | IIRC | 41.7% | 62.3% | +49.4% |
Average improvement: +52%
Comparison to Other Methods
| Method | Avg F1 | Cost (relative) | |--------|--------|-----------------| | Standard RAG | 39.3% | 1x | | Chain-of-Thought | 43.8% | 2x | | ReACT | 48.2% | 3x | | DecomposeRAG | 59.7% | 2.5x |
DecomposeRAG achieves best accuracy at moderate cost.
Key Insights
When Decomposition Helps
Effectiveness varies by query complexity:
| Hops | Baseline | DecomposeRAG | Gain | |------|----------|--------------|------| | 1 (simple) | 68.2% | 69.1% | +1.3% | | 2 (medium) | 51.3% | 67.4% | +31.4% | | 3 (complex) | 28.7% | 52.3% | +82.2% | | 4+ (very complex) | 15.2% | 38.9% | +156.3% |
Finding: More hops = bigger gains from decomposition.
Decomposition Quality
Analyzed quality of LLM-generated decompositions: • Correct decomposition: 87.3% • Missing steps: 8.2% • Incorrect order: 3.1% • Circular logic: 1.4%
Even imperfect decompositions improve results.
Error Analysis
Where does DecomposeRAG fail? Decomposition errors (23%): Wrong sub-questions Retrieval failures (34%): Can't find relevant docs for sub-question Answer errors (28%): Wrong intermediate answer propagates Integration failures (15%): Can't combine sub-answers
Most common: Retrieval still fails for sub-questions.
Implementation
Basic Version
`python class DecomposeRAG: def __init__(self, retriever, llm): self.retriever = retriever self.llm = llm
async def query(self, complex_question): Step 1: Decompose sub_questions = await self.decompose(complex_question)
Step 2: Sequential RAG context = "" for sub_q in sub_questions: Retrieve docs = await self.retriever.retrieve( sub_q + " " + context, k=5 )
Generate answer = await self.llm.generate( query=sub_q, context=docs, previous=context )
context += f"\n{sub_q} -> {answer}"
Return final answer return answer
async def decompose(self, query): Use LLM to decompose return await self.llm.decompose(query) `
Advanced: With Validation
`python class ValidatedDecomposeRAG(DecomposeRAG): async def query(self, complex_question, max_retries=2): sub_questions = await self.decompose(complex_question)
context = "" for sub_q in sub_questions: for attempt in range(max_retries): docs = await self.retriever.retrieve(sub_q + " " + context) answer = await self.llm.generate(sub_q, docs, context)
Validate if await self.validate(sub_q, answer, docs): context += f"\n{sub_q} -> {answer}" break elif attempt == max_retries - 1: Failed validation, use best-effort answer context += f"\n{sub_q} -> {answer} (unverified)"
return answer `
Optimizations
Parallel Sub-Queries
When sub-questions are independent:
`python Identify independent sub-questions dependencies = analyze_dependencies(sub_questions)
Group independent questions independent_groups = group_by_dependencies(sub_questions, dependencies)
Process groups in parallel for group in independent_groups: Parallel retrieval for group results = await asyncio.gather(*[ self.retrieve_and_answer(q, context) for q in group ])
Add all to context for q, answer in zip(group, results): context += f"\n{q} -> {answer}" `
Caching Intermediate Results
`python class CachedDecomposeRAG(DecomposeRAG): def __init__(self, retriever, llm): super().__init__(retriever, llm) self.cache = {}
async def retrieve_and_answer(self, sub_q, context): cache_key = hash(sub_q + context)
if cache_key in self.cache: return self.cache[cache_key]
result = await super().retrieve_and_answer(sub_q, context) self.cache[cache_key] = result
return result ``
Practical Considerations
Latency
DecomposeRAG is 2-3x slower: • 2-hop query: +2-3 seconds • 3-hop query: +4-6 seconds • 4-hop query: +6-10 seconds
Mitigation: • Parallel sub-queries when possible • Cache common decompositions • Use faster LLMs for intermediate steps
Cost
More LLM calls = higher cost: • Decomposition: 1 LLM call • Each sub-question: 1 LLM call • Validation (optional): 1 call per sub-question
Example: • 3 sub-questions + validation = 7 LLM calls • vs. 1 call for standard RAG
Cost multiplier: 2-5x depending on complexity
When to Use
Use DecomposeRAG when: • Questions are complex (multi-hop) • Accuracy more important than speed • Budget allows higher costs
Use standard RAG when: • Simple lookups • Speed critical • Cost-sensitive
Future Directions
Planned improvements: Better decomposition: Fine-tune smaller models Adaptive strategy: Auto-detect when to decompose Iterative refinement: Retry failed sub-questions Multimodal: Decompose across modalities
Resources • Paper: "DecomposeRAG: Automatic Query Decomposition for Multi-Hop Question Answering" • Code: github.com/berkeley-nlp/decomposerag • Demo: decomposerag.demo.berkeley.edu
Conclusion
DecomposeRAG demonstrates that explicit query decomposition significantly improves multi-hop question answering. While costlier and slower than standard RAG, the accuracy gains justify the overhead for complex queries where correctness is critical.