Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

Research Overview

UC Berkeley's NLP lab published DecomposeRAG, a framework that automatically breaks complex queries into simpler sub-queries, achieving state-of-the-art results on multi-hop QA benchmarks.

The Problem

Complex questions require multi-hop reasoning:

Example: "What is the population of the capital of the country where the Eiffel Tower is located?"

Requires:

Where is the Eiffel Tower? → France
What is the capital of France? → Paris
What is the population of Paris? → 2.1 million

Traditional RAG retrieves context for the full question, often missing intermediate steps.

DecomposeRAG Approach

Automatic Decomposition

Uses GPT-4 to break queries into sub-questions:

DEVELOPERpython
def decompose_query(complex_query):
    prompt = f"""Break this question into simple sub-questions that must be answered in order.

Question: {complex_query}

Sub-questions (in order):
1."""

    response = gpt4.generate(prompt)
    sub_questions = parse_questions(response)

    return sub_questions

Sequential Retrieval

Answer sub-questions in order, using previous answers as context:

DEVELOPERpython
def sequential_rag(sub_questions):
    context = ""

    for i, sub_q in enumerate(sub_questions):
        # Retrieve for this sub-question
        docs = retrieve(sub_q + " " + context, k=5)

        # Generate answer
        answer = llm.generate(
            query=sub_q,
            context=docs,
            previous_answers=context
        )

        # Add to cumulative context
        context += f"\nQ{i+1}: {sub_q}\nA{i+1}: {answer}\n"

    return answer  # Answer to final sub-question

Answer Validation

Validates each intermediate answer before proceeding:

DEVELOPERpython
def validate_answer(question, answer, retrieved_docs):
    prompt = f"""Is this answer supported by the documents?

Question: {question}
Answer: {answer}

Documents: {retrieved_docs}

Supported? (yes/no):"""

    validation = llm.generate(prompt)

    return "yes" in validation.lower()

If validation fails, retry with more context or alternative retrieval strategy.

Benchmark Results

Tested on four multi-hop QA datasets:

Dataset	Baseline RAG	DecomposeRAG	Improvement
HotpotQA	45.3%	68.7%	+51.7%
2WikiMultihopQA	38.2%	57.9%	+51.6%
MuSiQue	32.1%	49.8%	+55.1%
IIRC	41.7%	62.3%	+49.4%

Average improvement: +52%

Comparison to Other Methods

Method	Avg F1	Cost (relative)
Standard RAG	39.3%	1x
Chain-of-Thought	43.8%	2x
ReACT	48.2%	3x
DecomposeRAG	59.7%	2.5x

DecomposeRAG achieves best accuracy at moderate cost.

Key Insights

When Decomposition Helps

Effectiveness varies by query complexity:

Hops	Baseline	DecomposeRAG	Gain
1 (simple)	68.2%	69.1%	+1.3%
2 (medium)	51.3%	67.4%	+31.4%
3 (complex)	28.7%	52.3%	+82.2%
4+ (very complex)	15.2%	38.9%	+156.3%

Finding: More hops = bigger gains from decomposition.

Decomposition Quality

Analyzed quality of LLM-generated decompositions:

Correct decomposition: 87.3%
Missing steps: 8.2%
Incorrect order: 3.1%
Circular logic: 1.4%

Even imperfect decompositions improve results.

Error Analysis

Where does DecomposeRAG fail?

Decomposition errors (23%): Wrong sub-questions
Retrieval failures (34%): Can't find relevant docs for sub-question
Answer errors (28%): Wrong intermediate answer propagates
Integration failures (15%): Can't combine sub-answers

Most common: Retrieval still fails for sub-questions.

Implementation

Basic Version

DEVELOPERpython
class DecomposeRAG:
    def __init__(self, retriever, llm):
        self.retriever = retriever
        self.llm = llm

    async def query(self, complex_question):
        # Step 1: Decompose
        sub_questions = await self.decompose(complex_question)

        # Step 2: Sequential RAG
        context = ""
        for sub_q in sub_questions:
            # Retrieve
            docs = await self.retriever.retrieve(
                sub_q + " " + context,
                k=5
            )

            # Generate
            answer = await self.llm.generate(
                query=sub_q,
                context=docs,
                previous=context
            )

            context += f"\n{sub_q} -> {answer}"

        # Return final answer
        return answer

    async def decompose(self, query):
        # Use LLM to decompose
        return await self.llm.decompose(query)

Advanced: With Validation

DEVELOPERpython
class ValidatedDecomposeRAG(DecomposeRAG):
    async def query(self, complex_question, max_retries=2):
        sub_questions = await self.decompose(complex_question)

        context = ""
        for sub_q in sub_questions:
            for attempt in range(max_retries):
                docs = await self.retriever.retrieve(sub_q + " " + context)
                answer = await self.llm.generate(sub_q, docs, context)

                # Validate
                if await self.validate(sub_q, answer, docs):
                    context += f"\n{sub_q} -> {answer}"
                    break
                elif attempt == max_retries - 1:
                    # Failed validation, use best-effort answer
                    context += f"\n{sub_q} -> {answer} (unverified)"

        return answer

Optimizations

Parallel Sub-Queries

When sub-questions are independent:

DEVELOPERpython
# Identify independent sub-questions
dependencies = analyze_dependencies(sub_questions)

# Group independent questions
independent_groups = group_by_dependencies(sub_questions, dependencies)

# Process groups in parallel
for group in independent_groups:
    # Parallel retrieval for group
    results = await asyncio.gather(*[
        self.retrieve_and_answer(q, context)
        for q in group
    ])

    # Add all to context
    for q, answer in zip(group, results):
        context += f"\n{q} -> {answer}"

Caching Intermediate Results

DEVELOPERpython
class CachedDecomposeRAG(DecomposeRAG):
    def __init__(self, retriever, llm):
        super().__init__(retriever, llm)
        self.cache = {}

    async def retrieve_and_answer(self, sub_q, context):
        cache_key = hash(sub_q + context)

        if cache_key in self.cache:
            return self.cache[cache_key]

        result = await super().retrieve_and_answer(sub_q, context)
        self.cache[cache_key] = result

        return result

Practical Considerations

Latency

DecomposeRAG is 2-3x slower:

2-hop query: +2-3 seconds
3-hop query: +4-6 seconds
4-hop query: +6-10 seconds

Mitigation:

Parallel sub-queries when possible
Cache common decompositions
Use faster LLMs for intermediate steps

Cost

More LLM calls = higher cost:

Decomposition: 1 LLM call
Each sub-question: 1 LLM call
Validation (optional): 1 call per sub-question

Example:

3 sub-questions + validation = 7 LLM calls
vs. 1 call for standard RAG

Cost multiplier: 2-5x depending on complexity

When to Use

Use DecomposeRAG when:

Questions are complex (multi-hop)
Accuracy more important than speed
Budget allows higher costs

Use standard RAG when:

Simple lookups
Speed critical
Cost-sensitive

Future Directions

Planned improvements:

Better decomposition: Fine-tune smaller models
Adaptive strategy: Auto-detect when to decompose
Iterative refinement: Retry failed sub-questions
Multimodal: Decompose across modalities

Resources

Paper: "DecomposeRAG: Automatic Query Decomposition for Multi-Hop Question Answering"
Code: github.com/berkeley-nlp/decomposerag
Demo: decomposerag.demo.berkeley.edu

Conclusion

DecomposeRAG demonstrates that explicit query decomposition significantly improves multi-hop question answering. While costlier and slower than standard RAG, the accuracy gains justify the overhead for complex queries where correctness is critical.

Query Decomposition Breakthrough: DecomposeRAG Handles Complex Questions 50% Better