Query Decomposition Breakthrough: DecomposeRAG Handles Complex Questions 50% Better

UC Berkeley researchers introduce DecomposeRAG, an automated query decomposition framework that significantly improves multi-hop question answering.

Author
Ailog Research Team
Published
Reading time
4 min read

Research Overview

UC Berkeley's NLP lab published DecomposeRAG, a framework that automatically breaks complex queries into simpler sub-queries, achieving state-of-the-art results on multi-hop QA benchmarks.

The Problem

Complex questions require multi-hop reasoning:

Example: "What is the population of the capital of the country where the Eiffel Tower is located?"

Requires: Where is the Eiffel Tower? → France What is the capital of France? → Paris What is the population of Paris? → 2.1 million

Traditional RAG retrieves context for the full question, often missing intermediate steps.

DecomposeRAG Approach

Automatic Decomposition

Uses GPT-4 to break queries into sub-questions:

``python def decompose_query(complex_query): prompt = f"""Break this question into simple sub-questions that must be answered in order.

Question: {complex_query}

Sub-questions (in order): 1."""

response = gpt4.generate(prompt) sub_questions = parse_questions(response)

return sub_questions `

Sequential Retrieval

Answer sub-questions in order, using previous answers as context:

`python def sequential_rag(sub_questions): context = ""

for i, sub_q in enumerate(sub_questions): Retrieve for this sub-question docs = retrieve(sub_q + " " + context, k=5)

Generate answer answer = llm.generate( query=sub_q, context=docs, previous_answers=context )

Add to cumulative context context += f"\nQ{i+1}: {sub_q}\nA{i+1}: {answer}\n"

return answer Answer to final sub-question `

Answer Validation

Validates each intermediate answer before proceeding:

`python def validate_answer(question, answer, retrieved_docs): prompt = f"""Is this answer supported by the documents?

Question: {question} Answer: {answer}

Documents: {retrieved_docs}

Supported? (yes/no):"""

validation = llm.generate(prompt)

return "yes" in validation.lower() `

If validation fails, retry with more context or alternative retrieval strategy.

Benchmark Results

Tested on four multi-hop QA datasets:

| Dataset | Baseline RAG | DecomposeRAG | Improvement | |---------|--------------|--------------|-------------| | HotpotQA | 45.3% | 68.7% | +51.7% | | 2WikiMultihopQA | 38.2% | 57.9% | +51.6% | | MuSiQue | 32.1% | 49.8% | +55.1% | | IIRC | 41.7% | 62.3% | +49.4% |

Average improvement: +52%

Comparison to Other Methods

| Method | Avg F1 | Cost (relative) | |--------|--------|-----------------| | Standard RAG | 39.3% | 1x | | Chain-of-Thought | 43.8% | 2x | | ReACT | 48.2% | 3x | | DecomposeRAG | 59.7% | 2.5x |

DecomposeRAG achieves best accuracy at moderate cost.

Key Insights

When Decomposition Helps

Effectiveness varies by query complexity:

| Hops | Baseline | DecomposeRAG | Gain | |------|----------|--------------|------| | 1 (simple) | 68.2% | 69.1% | +1.3% | | 2 (medium) | 51.3% | 67.4% | +31.4% | | 3 (complex) | 28.7% | 52.3% | +82.2% | | 4+ (very complex) | 15.2% | 38.9% | +156.3% |

Finding: More hops = bigger gains from decomposition.

Decomposition Quality

Analyzed quality of LLM-generated decompositions: • Correct decomposition: 87.3% • Missing steps: 8.2% • Incorrect order: 3.1% • Circular logic: 1.4%

Even imperfect decompositions improve results.

Error Analysis

Where does DecomposeRAG fail? Decomposition errors (23%): Wrong sub-questions Retrieval failures (34%): Can't find relevant docs for sub-question Answer errors (28%): Wrong intermediate answer propagates Integration failures (15%): Can't combine sub-answers

Most common: Retrieval still fails for sub-questions.

Implementation

Basic Version

`python class DecomposeRAG: def __init__(self, retriever, llm): self.retriever = retriever self.llm = llm

async def query(self, complex_question): Step 1: Decompose sub_questions = await self.decompose(complex_question)

Step 2: Sequential RAG context = "" for sub_q in sub_questions: Retrieve docs = await self.retriever.retrieve( sub_q + " " + context, k=5 )

Generate answer = await self.llm.generate( query=sub_q, context=docs, previous=context )

context += f"\n{sub_q} -> {answer}"

Return final answer return answer

async def decompose(self, query): Use LLM to decompose return await self.llm.decompose(query) `

Advanced: With Validation

`python class ValidatedDecomposeRAG(DecomposeRAG): async def query(self, complex_question, max_retries=2): sub_questions = await self.decompose(complex_question)

context = "" for sub_q in sub_questions: for attempt in range(max_retries): docs = await self.retriever.retrieve(sub_q + " " + context) answer = await self.llm.generate(sub_q, docs, context)

Validate if await self.validate(sub_q, answer, docs): context += f"\n{sub_q} -> {answer}" break elif attempt == max_retries - 1: Failed validation, use best-effort answer context += f"\n{sub_q} -> {answer} (unverified)"

return answer `

Optimizations

Parallel Sub-Queries

When sub-questions are independent:

`python Identify independent sub-questions dependencies = analyze_dependencies(sub_questions)

Group independent questions independent_groups = group_by_dependencies(sub_questions, dependencies)

Process groups in parallel for group in independent_groups: Parallel retrieval for group results = await asyncio.gather(*[ self.retrieve_and_answer(q, context) for q in group ])

Add all to context for q, answer in zip(group, results): context += f"\n{q} -> {answer}" `

Caching Intermediate Results

`python class CachedDecomposeRAG(DecomposeRAG): def __init__(self, retriever, llm): super().__init__(retriever, llm) self.cache = {}

async def retrieve_and_answer(self, sub_q, context): cache_key = hash(sub_q + context)

if cache_key in self.cache: return self.cache[cache_key]

result = await super().retrieve_and_answer(sub_q, context) self.cache[cache_key] = result

return result ``

Practical Considerations

Latency

DecomposeRAG is 2-3x slower: • 2-hop query: +2-3 seconds • 3-hop query: +4-6 seconds • 4-hop query: +6-10 seconds

Mitigation: • Parallel sub-queries when possible • Cache common decompositions • Use faster LLMs for intermediate steps

Cost

More LLM calls = higher cost: • Decomposition: 1 LLM call • Each sub-question: 1 LLM call • Validation (optional): 1 call per sub-question

Example: • 3 sub-questions + validation = 7 LLM calls • vs. 1 call for standard RAG

Cost multiplier: 2-5x depending on complexity

When to Use

Use DecomposeRAG when: • Questions are complex (multi-hop) • Accuracy more important than speed • Budget allows higher costs

Use standard RAG when: • Simple lookups • Speed critical • Cost-sensitive

Future Directions

Planned improvements: Better decomposition: Fine-tune smaller models Adaptive strategy: Auto-detect when to decompose Iterative refinement: Retry failed sub-questions Multimodal: Decompose across modalities

Resources • Paper: "DecomposeRAG: Automatic Query Decomposition for Multi-Hop Question Answering" • Code: github.com/berkeley-nlp/decomposerag • Demo: decomposerag.demo.berkeley.edu

Conclusion

DecomposeRAG demonstrates that explicit query decomposition significantly improves multi-hop question answering. While costlier and slower than standard RAG, the accuracy gains justify the overhead for complex queries where correctness is critical.

Tags

  • query optimization
  • multi-hop
  • research
  • decomposition
Actualités

Query Decomposition Breakthrough: DecomposeRAG Handles Complex Questions 50% Better

5 novembre 2025
4 min read
Ailog Research Team

UC Berkeley researchers introduce DecomposeRAG, an automated query decomposition framework that significantly improves multi-hop question answering.

Research Overview

UC Berkeley's NLP lab published DecomposeRAG, a framework that automatically breaks complex queries into simpler sub-queries, achieving state-of-the-art results on multi-hop QA benchmarks.

The Problem

Complex questions require multi-hop reasoning:

Example: "What is the population of the capital of the country where the Eiffel Tower is located?"

Requires:

  1. Where is the Eiffel Tower? → France
  2. What is the capital of France? → Paris
  3. What is the population of Paris? → 2.1 million

Traditional RAG retrieves context for the full question, often missing intermediate steps.

DecomposeRAG Approach

Automatic Decomposition

Uses GPT-4 to break queries into sub-questions:

DEVELOPERpython
def decompose_query(complex_query): prompt = f"""Break this question into simple sub-questions that must be answered in order. Question: {complex_query} Sub-questions (in order): 1.""" response = gpt4.generate(prompt) sub_questions = parse_questions(response) return sub_questions

Sequential Retrieval

Answer sub-questions in order, using previous answers as context:

DEVELOPERpython
def sequential_rag(sub_questions): context = "" for i, sub_q in enumerate(sub_questions): # Retrieve for this sub-question docs = retrieve(sub_q + " " + context, k=5) # Generate answer answer = llm.generate( query=sub_q, context=docs, previous_answers=context ) # Add to cumulative context context += f"\nQ{i+1}: {sub_q}\nA{i+1}: {answer}\n" return answer # Answer to final sub-question

Answer Validation

Validates each intermediate answer before proceeding:

DEVELOPERpython
def validate_answer(question, answer, retrieved_docs): prompt = f"""Is this answer supported by the documents? Question: {question} Answer: {answer} Documents: {retrieved_docs} Supported? (yes/no):""" validation = llm.generate(prompt) return "yes" in validation.lower()

If validation fails, retry with more context or alternative retrieval strategy.

Benchmark Results

Tested on four multi-hop QA datasets:

DatasetBaseline RAGDecomposeRAGImprovement
HotpotQA45.3%68.7%+51.7%
2WikiMultihopQA38.2%57.9%+51.6%
MuSiQue32.1%49.8%+55.1%
IIRC41.7%62.3%+49.4%

Average improvement: +52%

Comparison to Other Methods

MethodAvg F1Cost (relative)
Standard RAG39.3%1x
Chain-of-Thought43.8%2x
ReACT48.2%3x
DecomposeRAG59.7%2.5x

DecomposeRAG achieves best accuracy at moderate cost.

Key Insights

When Decomposition Helps

Effectiveness varies by query complexity:

HopsBaselineDecomposeRAGGain
1 (simple)68.2%69.1%+1.3%
2 (medium)51.3%67.4%+31.4%
3 (complex)28.7%52.3%+82.2%
4+ (very complex)15.2%38.9%+156.3%

Finding: More hops = bigger gains from decomposition.

Decomposition Quality

Analyzed quality of LLM-generated decompositions:

  • Correct decomposition: 87.3%
  • Missing steps: 8.2%
  • Incorrect order: 3.1%
  • Circular logic: 1.4%

Even imperfect decompositions improve results.

Error Analysis

Where does DecomposeRAG fail?

  1. Decomposition errors (23%): Wrong sub-questions
  2. Retrieval failures (34%): Can't find relevant docs for sub-question
  3. Answer errors (28%): Wrong intermediate answer propagates
  4. Integration failures (15%): Can't combine sub-answers

Most common: Retrieval still fails for sub-questions.

Implementation

Basic Version

DEVELOPERpython
class DecomposeRAG: def __init__(self, retriever, llm): self.retriever = retriever self.llm = llm async def query(self, complex_question): # Step 1: Decompose sub_questions = await self.decompose(complex_question) # Step 2: Sequential RAG context = "" for sub_q in sub_questions: # Retrieve docs = await self.retriever.retrieve( sub_q + " " + context, k=5 ) # Generate answer = await self.llm.generate( query=sub_q, context=docs, previous=context ) context += f"\n{sub_q} -> {answer}" # Return final answer return answer async def decompose(self, query): # Use LLM to decompose return await self.llm.decompose(query)

Advanced: With Validation

DEVELOPERpython
class ValidatedDecomposeRAG(DecomposeRAG): async def query(self, complex_question, max_retries=2): sub_questions = await self.decompose(complex_question) context = "" for sub_q in sub_questions: for attempt in range(max_retries): docs = await self.retriever.retrieve(sub_q + " " + context) answer = await self.llm.generate(sub_q, docs, context) # Validate if await self.validate(sub_q, answer, docs): context += f"\n{sub_q} -> {answer}" break elif attempt == max_retries - 1: # Failed validation, use best-effort answer context += f"\n{sub_q} -> {answer} (unverified)" return answer

Optimizations

Parallel Sub-Queries

When sub-questions are independent:

DEVELOPERpython
# Identify independent sub-questions dependencies = analyze_dependencies(sub_questions) # Group independent questions independent_groups = group_by_dependencies(sub_questions, dependencies) # Process groups in parallel for group in independent_groups: # Parallel retrieval for group results = await asyncio.gather(*[ self.retrieve_and_answer(q, context) for q in group ]) # Add all to context for q, answer in zip(group, results): context += f"\n{q} -> {answer}"

Caching Intermediate Results

DEVELOPERpython
class CachedDecomposeRAG(DecomposeRAG): def __init__(self, retriever, llm): super().__init__(retriever, llm) self.cache = {} async def retrieve_and_answer(self, sub_q, context): cache_key = hash(sub_q + context) if cache_key in self.cache: return self.cache[cache_key] result = await super().retrieve_and_answer(sub_q, context) self.cache[cache_key] = result return result

Practical Considerations

Latency

DecomposeRAG is 2-3x slower:

  • 2-hop query: +2-3 seconds
  • 3-hop query: +4-6 seconds
  • 4-hop query: +6-10 seconds

Mitigation:

  • Parallel sub-queries when possible
  • Cache common decompositions
  • Use faster LLMs for intermediate steps

Cost

More LLM calls = higher cost:

  • Decomposition: 1 LLM call
  • Each sub-question: 1 LLM call
  • Validation (optional): 1 call per sub-question

Example:

  • 3 sub-questions + validation = 7 LLM calls
  • vs. 1 call for standard RAG

Cost multiplier: 2-5x depending on complexity

When to Use

Use DecomposeRAG when:

  • Questions are complex (multi-hop)
  • Accuracy more important than speed
  • Budget allows higher costs

Use standard RAG when:

  • Simple lookups
  • Speed critical
  • Cost-sensitive

Future Directions

Planned improvements:

  1. Better decomposition: Fine-tune smaller models
  2. Adaptive strategy: Auto-detect when to decompose
  3. Iterative refinement: Retry failed sub-questions
  4. Multimodal: Decompose across modalities

Resources

  • Paper: "DecomposeRAG: Automatic Query Decomposition for Multi-Hop Question Answering"
  • Code: github.com/berkeley-nlp/decomposerag
  • Demo: decomposerag.demo.berkeley.edu

Conclusion

DecomposeRAG demonstrates that explicit query decomposition significantly improves multi-hop question answering. While costlier and slower than standard RAG, the accuracy gains justify the overhead for complex queries where correctness is critical.

Tags

query optimizationmulti-hopresearchdecomposition

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !