News

Query Decomposition Breakthrough: DecomposeRAG Handles Complex Questions 50% Better

November 5, 2025
4 min read
Ailog Research Team

UC Berkeley researchers introduce DecomposeRAG, an automated query decomposition framework that significantly improves multi-hop question answering.

Research Overview

UC Berkeley's NLP lab published DecomposeRAG, a framework that automatically breaks complex queries into simpler sub-queries, achieving state-of-the-art results on multi-hop QA benchmarks.

The Problem

Complex questions require multi-hop reasoning:

Example: "What is the population of the capital of the country where the Eiffel Tower is located?"

Requires:

  1. Where is the Eiffel Tower? → France
  2. What is the capital of France? → Paris
  3. What is the population of Paris? → 2.1 million

Traditional RAG retrieves context for the full question, often missing intermediate steps.

DecomposeRAG Approach

Automatic Decomposition

Uses GPT-4 to break queries into sub-questions:

DEVELOPERpython
def decompose_query(complex_query): prompt = f"""Break this question into simple sub-questions that must be answered in order. Question: {complex_query} Sub-questions (in order): 1.""" response = gpt4.generate(prompt) sub_questions = parse_questions(response) return sub_questions

Sequential Retrieval

Answer sub-questions in order, using previous answers as context:

DEVELOPERpython
def sequential_rag(sub_questions): context = "" for i, sub_q in enumerate(sub_questions): # Retrieve for this sub-question docs = retrieve(sub_q + " " + context, k=5) # Generate answer answer = llm.generate( query=sub_q, context=docs, previous_answers=context ) # Add to cumulative context context += f"\nQ{i+1}: {sub_q}\nA{i+1}: {answer}\n" return answer # Answer to final sub-question

Answer Validation

Validates each intermediate answer before proceeding:

DEVELOPERpython
def validate_answer(question, answer, retrieved_docs): prompt = f"""Is this answer supported by the documents? Question: {question} Answer: {answer} Documents: {retrieved_docs} Supported? (yes/no):""" validation = llm.generate(prompt) return "yes" in validation.lower()

If validation fails, retry with more context or alternative retrieval strategy.

Benchmark Results

Tested on four multi-hop QA datasets:

DatasetBaseline RAGDecomposeRAGImprovement
HotpotQA45.3%68.7%+51.7%
2WikiMultihopQA38.2%57.9%+51.6%
MuSiQue32.1%49.8%+55.1%
IIRC41.7%62.3%+49.4%

Average improvement: +52%

Comparison to Other Methods

MethodAvg F1Cost (relative)
Standard RAG39.3%1x
Chain-of-Thought43.8%2x
ReACT48.2%3x
DecomposeRAG59.7%2.5x

DecomposeRAG achieves best accuracy at moderate cost.

Key Insights

When Decomposition Helps

Effectiveness varies by query complexity:

HopsBaselineDecomposeRAGGain
1 (simple)68.2%69.1%+1.3%
2 (medium)51.3%67.4%+31.4%
3 (complex)28.7%52.3%+82.2%
4+ (very complex)15.2%38.9%+156.3%

Finding: More hops = bigger gains from decomposition.

Decomposition Quality

Analyzed quality of LLM-generated decompositions:

  • Correct decomposition: 87.3%
  • Missing steps: 8.2%
  • Incorrect order: 3.1%
  • Circular logic: 1.4%

Even imperfect decompositions improve results.

Error Analysis

Where does DecomposeRAG fail?

  1. Decomposition errors (23%): Wrong sub-questions
  2. Retrieval failures (34%): Can't find relevant docs for sub-question
  3. Answer errors (28%): Wrong intermediate answer propagates
  4. Integration failures (15%): Can't combine sub-answers

Most common: Retrieval still fails for sub-questions.

Implementation

Basic Version

DEVELOPERpython
class DecomposeRAG: def __init__(self, retriever, llm): self.retriever = retriever self.llm = llm async def query(self, complex_question): # Step 1: Decompose sub_questions = await self.decompose(complex_question) # Step 2: Sequential RAG context = "" for sub_q in sub_questions: # Retrieve docs = await self.retriever.retrieve( sub_q + " " + context, k=5 ) # Generate answer = await self.llm.generate( query=sub_q, context=docs, previous=context ) context += f"\n{sub_q} -> {answer}" # Return final answer return answer async def decompose(self, query): # Use LLM to decompose return await self.llm.decompose(query)

Advanced: With Validation

DEVELOPERpython
class ValidatedDecomposeRAG(DecomposeRAG): async def query(self, complex_question, max_retries=2): sub_questions = await self.decompose(complex_question) context = "" for sub_q in sub_questions: for attempt in range(max_retries): docs = await self.retriever.retrieve(sub_q + " " + context) answer = await self.llm.generate(sub_q, docs, context) # Validate if await self.validate(sub_q, answer, docs): context += f"\n{sub_q} -> {answer}" break elif attempt == max_retries - 1: # Failed validation, use best-effort answer context += f"\n{sub_q} -> {answer} (unverified)" return answer

Optimizations

Parallel Sub-Queries

When sub-questions are independent:

DEVELOPERpython
# Identify independent sub-questions dependencies = analyze_dependencies(sub_questions) # Group independent questions independent_groups = group_by_dependencies(sub_questions, dependencies) # Process groups in parallel for group in independent_groups: # Parallel retrieval for group results = await asyncio.gather(*[ self.retrieve_and_answer(q, context) for q in group ]) # Add all to context for q, answer in zip(group, results): context += f"\n{q} -> {answer}"

Caching Intermediate Results

DEVELOPERpython
class CachedDecomposeRAG(DecomposeRAG): def __init__(self, retriever, llm): super().__init__(retriever, llm) self.cache = {} async def retrieve_and_answer(self, sub_q, context): cache_key = hash(sub_q + context) if cache_key in self.cache: return self.cache[cache_key] result = await super().retrieve_and_answer(sub_q, context) self.cache[cache_key] = result return result

Practical Considerations

Latency

DecomposeRAG is 2-3x slower:

  • 2-hop query: +2-3 seconds
  • 3-hop query: +4-6 seconds
  • 4-hop query: +6-10 seconds

Mitigation:

  • Parallel sub-queries when possible
  • Cache common decompositions
  • Use faster LLMs for intermediate steps

Cost

More LLM calls = higher cost:

  • Decomposition: 1 LLM call
  • Each sub-question: 1 LLM call
  • Validation (optional): 1 call per sub-question

Example:

  • 3 sub-questions + validation = 7 LLM calls
  • vs. 1 call for standard RAG

Cost multiplier: 2-5x depending on complexity

When to Use

Use DecomposeRAG when:

  • Questions are complex (multi-hop)
  • Accuracy more important than speed
  • Budget allows higher costs

Use standard RAG when:

  • Simple lookups
  • Speed critical
  • Cost-sensitive

Future Directions

Planned improvements:

  1. Better decomposition: Fine-tune smaller models
  2. Adaptive strategy: Auto-detect when to decompose
  3. Iterative refinement: Retry failed sub-questions
  4. Multimodal: Decompose across modalities

Resources

  • Paper: "DecomposeRAG: Automatic Query Decomposition for Multi-Hop Question Answering"
  • Code: github.com/berkeley-nlp/decomposerag
  • Demo: decomposerag.demo.berkeley.edu

Conclusion

DecomposeRAG demonstrates that explicit query decomposition significantly improves multi-hop question answering. While costlier and slower than standard RAG, the accuracy gains justify the overhead for complex queries where correctness is critical.

Tags

query optimizationmulti-hopresearchdecomposition

Related Guides