AutoGen: Multi-Agent Systems for RAG

Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

AutoGen is Microsoft's multi-agent framework that enables conversations between multiple AI agents. Unlike single-agent approaches, AutoGen orchestrates teams of specialized agents that collaborate to solve complex problems.

Prerequisites: Review the RAG fundamentals and our guide on RAG agent orchestration.

Why AutoGen for RAG?

Benefits of Multi-Agents

Approach	Strengths	Limitations
Classic RAG	Simple, fast	No complex reasoning
Single agent	Iterative reasoning	Cognitive overload
Multi-agents	Specialization, collaboration	More complex configuration

Multi-Agent RAG Use Cases

Deep research: One agent searches, another validates, a third synthesizes
Document analysis: Specialized agents by document type
Complex Q&A: Decomposition and recomposition of answers
Cross-validation: Mutual verification between agents

Multi-Agent ROI

+40% accuracy on complex questions
-60% hallucinations thanks to cross-validation
Traceability: Each agent documents its reasoning

AutoGen Architecture

Fundamental Concepts

AUTOGEN ARCHITECTURE

┌──────────────────────────────────────────────────────────┐
│                      GroupChat                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│  │ Agent 1  │  │ Agent 2  │  │ Agent 3  │              │
│  │ Research │  │ Analysis │  │Synthesis │              │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘              │
│       │             │             │                      │
│       └─────────────┼─────────────┘                      │
│                     │                                    │
│              ┌──────┴──────┐                            │
│              │  Manager    │                            │
│              │  (Router)   │                            │
│              └─────────────┘                            │
└──────────────────────────────────────────────────────────┘

Flow:
1. User → Manager
2. Manager → Specialized agent
3. Agents communicate with each other
4. Manager → User (final response)

Installation and Configuration

DEVELOPERpython
# Installation
# pip install pyautogen

from autogen import ConversableAgent, AssistantAgent, UserProxyAgent
from autogen import GroupChat, GroupChatManager
import os

# LLM configuration
config_list = [
    {
        "model": "gpt-4o",
        "api_key": os.environ["OPENAI_API_KEY"]
    }
]

llm_config = {
    "config_list": config_list,
    "temperature": 0.7,
    "timeout": 120
}

Multi-Agent RAG with AutoGen

Retrieval Agent (Retriever)

DEVELOPERpython
from typing import List, Dict
import chromadb
from chromadb.utils import embedding_functions

class RAGRetriever:
    """Retrieval manager for agents."""

    def __init__(self, collection_name: str = "documents"):
        self.client = chromadb.Client()
        self.embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
            api_key=os.environ["OPENAI_API_KEY"],
            model_name="text-embedding-3-small"
        )
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_fn
        )

    def search(self, query: str, n_results: int = 5) -> List[Dict]:
        """Search for relevant documents."""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )

        documents = []
        for i, doc in enumerate(results["documents"][0]):
            documents.append({
                "content": doc,
                "metadata": results["metadatas"][0][i] if results["metadatas"] else {},
                "distance": results["distances"][0][i] if results["distances"] else 0
            })

        return documents

    def add_documents(self, documents: List[str], metadatas: List[Dict] = None):
        """Add documents to the collection."""
        ids = [f"doc_{i}" for i in range(len(documents))]
        self.collection.add(
            documents=documents,
            metadatas=metadatas or [{}] * len(documents),
            ids=ids
        )

# Initialization
retriever = RAGRetriever()

# Search function for agents
def search_documents(query: str) -> str:
    """Search documents and return context."""
    results = retriever.search(query, n_results=5)

    if not results:
        return "No relevant documents found."

    context = "Documents found:\n\n"
    for i, doc in enumerate(results, 1):
        context += f"[{i}] {doc['content'][:500]}...\n"
        context += f"    Score: {1 - doc['distance']:.2f}\n\n"

    return context

Creating Specialized Agents

DEVELOPERpython
# Research agent
research_agent = AssistantAgent(
    name="Researcher",
    system_message="""You are a specialized research agent.

    Your role:
    1. Analyze the user's question
    2. Formulate relevant search queries
    3. Use the search_documents function to find information
    4. Evaluate result relevance
    5. Reformulate the search if necessary

    You must always justify your search choices.
    If results are not relevant, try other formulations.
    """,
    llm_config=llm_config
)

# Analysis agent
analyst_agent = AssistantAgent(
    name="Analyst",
    system_message="""You are a critical analysis agent.

    Your role:
    1. Examine documents provided by the researcher
    2. Identify key information and facts
    3. Detect contradictions or inconsistencies
    4. Evaluate source reliability
    5. Organize information logically

    You must be rigorous and flag uncertainties.
    """,
    llm_config=llm_config
)

# Synthesis agent
writer_agent = AssistantAgent(
    name="Writer",
    system_message="""You are a writing and synthesis agent.

    Your role:
    1. Take the provided analyses
    2. Write a clear and structured response
    3. Cite sources used
    4. Adapt detail level to context
    5. Ensure the response is complete

    You must produce professional and well-formatted responses.
    """,
    llm_config=llm_config
)

# User proxy agent
user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config=False,
    llm_config=llm_config
)

GroupChat Configuration

DEVELOPERpython
from autogen import register_function

# Register the search function
register_function(
    search_documents,
    caller=research_agent,
    executor=user_proxy,
    name="search_documents",
    description="Search documents in the knowledge base."
)

# Create the GroupChat
groupchat = GroupChat(
    agents=[user_proxy, research_agent, analyst_agent, writer_agent],
    messages=[],
    max_round=15,
    speaker_selection_method="auto"
)

# GroupChat Manager
manager = GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config
)

def query_rag_team(question: str) -> str:
    """Query the RAG agent team."""
    groupchat.messages = []

    result = user_proxy.initiate_chat(
        manager,
        message=f"""User question: {question}

        Process:
        1. Researcher: search for relevant documents
        2. Analyst: analyze and verify information
        3. Writer: write the final response

        Start with the search.
        """
    )

    for msg in reversed(groupchat.messages):
        if msg.get("name") == "Writer":
            return msg.get("content", "")

    return "No response generated."

Advanced Patterns

Pattern 1: Cross-Validation

DEVELOPERpython
class CrossValidationRAG:
    """RAG with cross-validation between agents."""

    def __init__(self, llm_config: dict):
        self.searcher_1 = AssistantAgent(
            name="Searcher_Primary",
            system_message="You search for documents exhaustively.",
            llm_config=llm_config
        )

        self.searcher_2 = AssistantAgent(
            name="Searcher_Secondary",
            system_message="You search with alternative formulations.",
            llm_config=llm_config
        )

        self.validator = AssistantAgent(
            name="Validator",
            system_message="""You compare results from both searchers.
            1. Identify common information (high confidence)
            2. Flag contradictions
            3. Merge unique results
            4. Assign a confidence score""",
            llm_config=llm_config
        )

        self.synthesizer = AssistantAgent(
            name="Synthesizer",
            system_message="You produce the final response based on validated results.",
            llm_config=llm_config
        )

    def query(self, question: str) -> dict:
        """Execute a query with cross-validation."""
        results_1 = self._search_with_agent(self.searcher_1, question)
        results_2 = self._search_with_agent(self.searcher_2, question)

        validated = self._validate(results_1, results_2)
        response = self._synthesize(question, validated)

        return {
            "response": response,
            "confidence": validated["confidence"],
            "sources_primary": len(results_1),
            "sources_secondary": len(results_2)
        }

    def _validate(self, results_1: list, results_2: list) -> dict:
        """Validate and merge results."""
        common = []
        unique_1 = []
        unique_2 = []

        confidence = len(common) / max(len(results_1), len(results_2), 1)

        return {
            "common": common,
            "unique_1": unique_1,
            "unique_2": unique_2,
            "confidence": confidence
        }

Pattern 2: Domain-Specialized Agents

DEVELOPERpython
class DomainSpecialistRAG:
    """RAG with domain-specialized agents."""

    def __init__(self, llm_config: dict):
        self.llm_config = llm_config

        self.specialists = {
            "technical": self._create_specialist(
                "Technical_Expert",
                "You are an expert in technical documentation, code, and architecture."
            ),
            "legal": self._create_specialist(
                "Legal_Expert",
                "You are an expert in legal documents and compliance."
            ),
            "financial": self._create_specialist(
                "Financial_Expert",
                "You are an expert in financial and accounting documents."
            ),
            "general": self._create_specialist(
                "General_Expert",
                "You handle general and cross-functional questions."
            )
        }

        self.router = AssistantAgent(
            name="Router",
            system_message="""Analyze the question and determine the domain:
            - technical: code, architecture, APIs, infrastructure
            - legal: contracts, GDPR, compliance, licenses
            - financial: budgets, invoices, accounting
            - general: other

            Respond only with the domain (one word).""",
            llm_config=llm_config
        )

    def _create_specialist(self, name: str, expertise: str) -> AssistantAgent:
        return AssistantAgent(
            name=name,
            system_message=f"""{expertise}

            You must:
            1. Search in your specialized knowledge base
            2. Provide precise answers with sources
            3. Use appropriate technical vocabulary
            4. Flag if the question is outside your domain""",
            llm_config=self.llm_config
        )

    def route_and_query(self, question: str) -> dict:
        """Route the question to the right specialist."""
        routing_result = self.router.generate_reply(
            messages=[{"role": "user", "content": question}]
        )
        domain = routing_result.strip().lower()

        specialist = self.specialists.get(domain, self.specialists["general"])

        response = specialist.generate_reply(
            messages=[{"role": "user", "content": question}]
        )

        return {
            "domain": domain,
            "specialist": specialist.name,
            "response": response
        }

Pattern 3: Agent Debate

DEVELOPERpython
class DebateRAG:
    """RAG with structured debate between agents."""

    def __init__(self, llm_config: dict):
        self.proposer = AssistantAgent(
            name="Proposer",
            system_message="""You propose an initial response based on documents.
            You must be confident but open to criticism.""",
            llm_config=llm_config
        )

        self.critic = AssistantAgent(
            name="Critic",
            system_message="""You constructively criticize the proposed response.
            1. Identify weaknesses or gaps
            2. Question sources
            3. Propose alternatives
            4. Don't criticize for the sake of criticizing""",
            llm_config=llm_config
        )

        self.arbiter = AssistantAgent(
            name="Arbiter",
            system_message="""You arbitrate the debate between Proposer and Critic.
            1. Evaluate arguments from both sides
            2. Settle disagreements
            3. Produce the final consensual response
            4. Integrate the best contributions from each""",
            llm_config=llm_config
        )

    def debate_and_answer(self, question: str, context: str, rounds: int = 2) -> dict:
        """Launch a structured debate and produce a response."""
        debate_history = []

        proposal = self.proposer.generate_reply(
            messages=[{
                "role": "user",
                "content": f"Context: {context}\n\nQuestion: {question}\n\nPropose a response."
            }]
        )
        debate_history.append({"agent": "Proposer", "content": proposal})

        for round_num in range(rounds):
            critique = self.critic.generate_reply(
                messages=[{
                    "role": "user",
                    "content": f"Current proposal: {proposal}\n\nCriticize this response."
                }]
            )
            debate_history.append({"agent": "Critic", "content": critique})

            proposal = self.proposer.generate_reply(
                messages=[{
                    "role": "user",
                    "content": f"Critique received: {critique}\n\nImprove your proposal."
                }]
            )
            debate_history.append({"agent": "Proposer", "content": proposal})

        final_answer = self.arbiter.generate_reply(
            messages=[{
                "role": "user",
                "content": f"""Debate history:
                {self._format_history(debate_history)}

                Produce the final answer to the question: {question}"""
            }]
        )

        return {
            "answer": final_answer,
            "debate_rounds": len(debate_history),
            "history": debate_history
        }

    def _format_history(self, history: list) -> str:
        return "\n\n".join([f"[{h['agent']}]: {h['content']}" for h in history])

Memory Management

Shared Memory Between Agents

DEVELOPERpython
from typing import Dict, Any
import json

class SharedMemory:
    """Shared memory between agents."""

    def __init__(self):
        self.facts = {}
        self.decisions = []
        self.sources = {}
        self.confidence_scores = {}

    def add_fact(self, key: str, value: Any, source: str, confidence: float):
        """Add a fact to memory."""
        self.facts[key] = value
        self.sources[key] = source
        self.confidence_scores[key] = confidence

    def get_context(self) -> str:
        """Return context for agents."""
        context = "Established facts:\n"
        for key, value in self.facts.items():
            conf = self.confidence_scores.get(key, 0)
            context += f"- {key}: {value} (confidence: {conf:.0%})\n"
        return context

    def add_decision(self, decision: str, reasoning: str):
        """Record a decision."""
        self.decisions.append({
            "decision": decision,
            "reasoning": reasoning
        })

    def to_json(self) -> str:
        return json.dumps({
            "facts": self.facts,
            "decisions": self.decisions,
            "sources": self.sources
        }, indent=2)

memory = SharedMemory()

def research_with_memory(agent: AssistantAgent, query: str) -> str:
    """Research with shared memory."""
    context = memory.get_context()

    response = agent.generate_reply(
        messages=[{
            "role": "user",
            "content": f"""Known context:
            {context}

            New question: {query}

            Search and add new facts to memory."""
        }]
    )

    return response

Monitoring and Observability

Conversation Tracing

DEVELOPERpython
import logging
from datetime import datetime
from typing import List, Dict

class ConversationTracer:
    """Trace multi-agent conversations."""

    def __init__(self):
        self.traces = []
        self.logger = logging.getLogger("autogen_tracer")

    def trace_message(self, sender: str, receiver: str, content: str, metadata: dict = None):
        """Record a message."""
        trace = {
            "timestamp": datetime.now().isoformat(),
            "sender": sender,
            "receiver": receiver,
            "content_length": len(content),
            "content_preview": content[:200],
            "metadata": metadata or {}
        }
        self.traces.append(trace)
        self.logger.info(f"{sender} -> {receiver}: {content[:100]}...")

    def get_summary(self) -> dict:
        """Conversation summary."""
        agents = set()
        for trace in self.traces:
            agents.add(trace["sender"])
            agents.add(trace["receiver"])

        return {
            "total_messages": len(self.traces),
            "agents_involved": list(agents),
            "duration_seconds": self._calculate_duration(),
            "average_message_length": self._avg_message_length()
        }

    def _calculate_duration(self) -> float:
        if len(self.traces) < 2:
            return 0
        start = datetime.fromisoformat(self.traces[0]["timestamp"])
        end = datetime.fromisoformat(self.traces[-1]["timestamp"])
        return (end - start).total_seconds()

    def _avg_message_length(self) -> float:
        if not self.traces:
            return 0
        return sum(t["content_length"] for t in self.traces) / len(self.traces)

Costs and Performance

Cost Estimation

Configuration	Messages/request	Estimated cost	Latency
2 simple agents	4-6	$0.02-0.04	5-10s
3 agents + validation	8-12	$0.05-0.10	10-20s
Debate (2 rounds)	10-15	$0.08-0.15	15-30s
Full team (5 agents)	15-25	$0.15-0.30	25-45s

Optimizations

DEVELOPERpython
# 1. Limit rounds
groupchat = GroupChat(
    agents=agents,
    max_round=10
)

# 2. Search caching
from functools import lru_cache

@lru_cache(maxsize=100)
def cached_search(query: str) -> str:
    return search_documents(query)

# 3. Different models per agent
cheap_config = {"model": "gpt-4o-mini", "temperature": 0}
expensive_config = {"model": "gpt-4o", "temperature": 0.7}

research_agent = AssistantAgent("Researcher", llm_config={"config_list": [cheap_config]})
writer_agent = AssistantAgent("Writer", llm_config={"config_list": [expensive_config]})

Implementation Checklist

Conclusion

AutoGen enables building sophisticated RAG systems with multi-agent collaboration. The key is to properly define each agent's roles and responsibilities, and to efficiently manage communication between them.

AutoGen: Multi-Agent Systems for RAG

AutoGen: Multi-Agent Systems for RAG

Why AutoGen for RAG?

Benefits of Multi-Agents

Multi-Agent RAG Use Cases

Multi-Agent ROI

AutoGen Architecture

Fundamental Concepts

Installation and Configuration

Multi-Agent RAG with AutoGen

Retrieval Agent (Retriever)

Creating Specialized Agents

GroupChat Configuration

Advanced Patterns

Pattern 1: Cross-Validation

Pattern 2: Domain-Specialized Agents

Pattern 3: Agent Debate

Memory Management

Shared Memory Between Agents

Monitoring and Observability

Conversation Tracing

Costs and Performance

Cost Estimation

Optimizations

Implementation Checklist

Conclusion

Further Reading

Tags

Related Posts

RAG Agents: Orchestrating Multi-Agent Systems

Agentic RAG: Building AI Agents with Dynamic Knowledge Retrieval

Conversational RAG: Memory and Multi-Session Context

Ailog Assistant