AutoGen: Multi-Agent Systems for RAG
Complete guide to building multi-agent RAG systems with Microsoft AutoGen. Agent conversations, orchestration, and advanced use cases.
AutoGen: Multi-Agent Systems for RAG
AutoGen is Microsoft's multi-agent framework that enables conversations between multiple AI agents. Unlike single-agent approaches, AutoGen orchestrates teams of specialized agents that collaborate to solve complex problems.
Prerequisites: Review the RAG fundamentals and our guide on RAG agent orchestration.
Why AutoGen for RAG?
Benefits of Multi-Agents
| Approach | Strengths | Limitations |
|---|---|---|
| Classic RAG | Simple, fast | No complex reasoning |
| Single agent | Iterative reasoning | Cognitive overload |
| Multi-agents | Specialization, collaboration | More complex configuration |
Multi-Agent RAG Use Cases
- Deep research: One agent searches, another validates, a third synthesizes
- Document analysis: Specialized agents by document type
- Complex Q&A: Decomposition and recomposition of answers
- Cross-validation: Mutual verification between agents
Multi-Agent ROI
- +40% accuracy on complex questions
- -60% hallucinations thanks to cross-validation
- Traceability: Each agent documents its reasoning
AutoGen Architecture
Fundamental Concepts
AUTOGEN ARCHITECTURE
┌──────────────────────────────────────────────────────────┐
│ GroupChat │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Agent 1 │ │ Agent 2 │ │ Agent 3 │ │
│ │ Research │ │ Analysis │ │Synthesis │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └─────────────┼─────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ Manager │ │
│ │ (Router) │ │
│ └─────────────┘ │
└──────────────────────────────────────────────────────────┘
Flow:
1. User → Manager
2. Manager → Specialized agent
3. Agents communicate with each other
4. Manager → User (final response)
Installation and Configuration
DEVELOPERpython# Installation # pip install pyautogen from autogen import ConversableAgent, AssistantAgent, UserProxyAgent from autogen import GroupChat, GroupChatManager import os # LLM configuration config_list = [ { "model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"] } ] llm_config = { "config_list": config_list, "temperature": 0.7, "timeout": 120 }
Multi-Agent RAG with AutoGen
Retrieval Agent (Retriever)
DEVELOPERpythonfrom typing import List, Dict import chromadb from chromadb.utils import embedding_functions class RAGRetriever: """Retrieval manager for agents.""" def __init__(self, collection_name: str = "documents"): self.client = chromadb.Client() self.embedding_fn = embedding_functions.OpenAIEmbeddingFunction( api_key=os.environ["OPENAI_API_KEY"], model_name="text-embedding-3-small" ) self.collection = self.client.get_or_create_collection( name=collection_name, embedding_function=self.embedding_fn ) def search(self, query: str, n_results: int = 5) -> List[Dict]: """Search for relevant documents.""" results = self.collection.query( query_texts=[query], n_results=n_results ) documents = [] for i, doc in enumerate(results["documents"][0]): documents.append({ "content": doc, "metadata": results["metadatas"][0][i] if results["metadatas"] else {}, "distance": results["distances"][0][i] if results["distances"] else 0 }) return documents def add_documents(self, documents: List[str], metadatas: List[Dict] = None): """Add documents to the collection.""" ids = [f"doc_{i}" for i in range(len(documents))] self.collection.add( documents=documents, metadatas=metadatas or [{}] * len(documents), ids=ids ) # Initialization retriever = RAGRetriever() # Search function for agents def search_documents(query: str) -> str: """Search documents and return context.""" results = retriever.search(query, n_results=5) if not results: return "No relevant documents found." context = "Documents found:\n\n" for i, doc in enumerate(results, 1): context += f"[{i}] {doc['content'][:500]}...\n" context += f" Score: {1 - doc['distance']:.2f}\n\n" return context
Creating Specialized Agents
DEVELOPERpython# Research agent research_agent = AssistantAgent( name="Researcher", system_message="""You are a specialized research agent. Your role: 1. Analyze the user's question 2. Formulate relevant search queries 3. Use the search_documents function to find information 4. Evaluate result relevance 5. Reformulate the search if necessary You must always justify your search choices. If results are not relevant, try other formulations. """, llm_config=llm_config ) # Analysis agent analyst_agent = AssistantAgent( name="Analyst", system_message="""You are a critical analysis agent. Your role: 1. Examine documents provided by the researcher 2. Identify key information and facts 3. Detect contradictions or inconsistencies 4. Evaluate source reliability 5. Organize information logically You must be rigorous and flag uncertainties. """, llm_config=llm_config ) # Synthesis agent writer_agent = AssistantAgent( name="Writer", system_message="""You are a writing and synthesis agent. Your role: 1. Take the provided analyses 2. Write a clear and structured response 3. Cite sources used 4. Adapt detail level to context 5. Ensure the response is complete You must produce professional and well-formatted responses. """, llm_config=llm_config ) # User proxy agent user_proxy = UserProxyAgent( name="User", human_input_mode="NEVER", max_consecutive_auto_reply=10, code_execution_config=False, llm_config=llm_config )
GroupChat Configuration
DEVELOPERpythonfrom autogen import register_function # Register the search function register_function( search_documents, caller=research_agent, executor=user_proxy, name="search_documents", description="Search documents in the knowledge base." ) # Create the GroupChat groupchat = GroupChat( agents=[user_proxy, research_agent, analyst_agent, writer_agent], messages=[], max_round=15, speaker_selection_method="auto" ) # GroupChat Manager manager = GroupChatManager( groupchat=groupchat, llm_config=llm_config ) def query_rag_team(question: str) -> str: """Query the RAG agent team.""" groupchat.messages = [] result = user_proxy.initiate_chat( manager, message=f"""User question: {question} Process: 1. Researcher: search for relevant documents 2. Analyst: analyze and verify information 3. Writer: write the final response Start with the search. """ ) for msg in reversed(groupchat.messages): if msg.get("name") == "Writer": return msg.get("content", "") return "No response generated."
Advanced Patterns
Pattern 1: Cross-Validation
DEVELOPERpythonclass CrossValidationRAG: """RAG with cross-validation between agents.""" def __init__(self, llm_config: dict): self.searcher_1 = AssistantAgent( name="Searcher_Primary", system_message="You search for documents exhaustively.", llm_config=llm_config ) self.searcher_2 = AssistantAgent( name="Searcher_Secondary", system_message="You search with alternative formulations.", llm_config=llm_config ) self.validator = AssistantAgent( name="Validator", system_message="""You compare results from both searchers. 1. Identify common information (high confidence) 2. Flag contradictions 3. Merge unique results 4. Assign a confidence score""", llm_config=llm_config ) self.synthesizer = AssistantAgent( name="Synthesizer", system_message="You produce the final response based on validated results.", llm_config=llm_config ) def query(self, question: str) -> dict: """Execute a query with cross-validation.""" results_1 = self._search_with_agent(self.searcher_1, question) results_2 = self._search_with_agent(self.searcher_2, question) validated = self._validate(results_1, results_2) response = self._synthesize(question, validated) return { "response": response, "confidence": validated["confidence"], "sources_primary": len(results_1), "sources_secondary": len(results_2) } def _validate(self, results_1: list, results_2: list) -> dict: """Validate and merge results.""" common = [] unique_1 = [] unique_2 = [] confidence = len(common) / max(len(results_1), len(results_2), 1) return { "common": common, "unique_1": unique_1, "unique_2": unique_2, "confidence": confidence }
Pattern 2: Domain-Specialized Agents
DEVELOPERpythonclass DomainSpecialistRAG: """RAG with domain-specialized agents.""" def __init__(self, llm_config: dict): self.llm_config = llm_config self.specialists = { "technical": self._create_specialist( "Technical_Expert", "You are an expert in technical documentation, code, and architecture." ), "legal": self._create_specialist( "Legal_Expert", "You are an expert in legal documents and compliance." ), "financial": self._create_specialist( "Financial_Expert", "You are an expert in financial and accounting documents." ), "general": self._create_specialist( "General_Expert", "You handle general and cross-functional questions." ) } self.router = AssistantAgent( name="Router", system_message="""Analyze the question and determine the domain: - technical: code, architecture, APIs, infrastructure - legal: contracts, GDPR, compliance, licenses - financial: budgets, invoices, accounting - general: other Respond only with the domain (one word).""", llm_config=llm_config ) def _create_specialist(self, name: str, expertise: str) -> AssistantAgent: return AssistantAgent( name=name, system_message=f"""{expertise} You must: 1. Search in your specialized knowledge base 2. Provide precise answers with sources 3. Use appropriate technical vocabulary 4. Flag if the question is outside your domain""", llm_config=self.llm_config ) def route_and_query(self, question: str) -> dict: """Route the question to the right specialist.""" routing_result = self.router.generate_reply( messages=[{"role": "user", "content": question}] ) domain = routing_result.strip().lower() specialist = self.specialists.get(domain, self.specialists["general"]) response = specialist.generate_reply( messages=[{"role": "user", "content": question}] ) return { "domain": domain, "specialist": specialist.name, "response": response }
Pattern 3: Agent Debate
DEVELOPERpythonclass DebateRAG: """RAG with structured debate between agents.""" def __init__(self, llm_config: dict): self.proposer = AssistantAgent( name="Proposer", system_message="""You propose an initial response based on documents. You must be confident but open to criticism.""", llm_config=llm_config ) self.critic = AssistantAgent( name="Critic", system_message="""You constructively criticize the proposed response. 1. Identify weaknesses or gaps 2. Question sources 3. Propose alternatives 4. Don't criticize for the sake of criticizing""", llm_config=llm_config ) self.arbiter = AssistantAgent( name="Arbiter", system_message="""You arbitrate the debate between Proposer and Critic. 1. Evaluate arguments from both sides 2. Settle disagreements 3. Produce the final consensual response 4. Integrate the best contributions from each""", llm_config=llm_config ) def debate_and_answer(self, question: str, context: str, rounds: int = 2) -> dict: """Launch a structured debate and produce a response.""" debate_history = [] proposal = self.proposer.generate_reply( messages=[{ "role": "user", "content": f"Context: {context}\n\nQuestion: {question}\n\nPropose a response." }] ) debate_history.append({"agent": "Proposer", "content": proposal}) for round_num in range(rounds): critique = self.critic.generate_reply( messages=[{ "role": "user", "content": f"Current proposal: {proposal}\n\nCriticize this response." }] ) debate_history.append({"agent": "Critic", "content": critique}) proposal = self.proposer.generate_reply( messages=[{ "role": "user", "content": f"Critique received: {critique}\n\nImprove your proposal." }] ) debate_history.append({"agent": "Proposer", "content": proposal}) final_answer = self.arbiter.generate_reply( messages=[{ "role": "user", "content": f"""Debate history: {self._format_history(debate_history)} Produce the final answer to the question: {question}""" }] ) return { "answer": final_answer, "debate_rounds": len(debate_history), "history": debate_history } def _format_history(self, history: list) -> str: return "\n\n".join([f"[{h['agent']}]: {h['content']}" for h in history])
Memory Management
Shared Memory Between Agents
DEVELOPERpythonfrom typing import Dict, Any import json class SharedMemory: """Shared memory between agents.""" def __init__(self): self.facts = {} self.decisions = [] self.sources = {} self.confidence_scores = {} def add_fact(self, key: str, value: Any, source: str, confidence: float): """Add a fact to memory.""" self.facts[key] = value self.sources[key] = source self.confidence_scores[key] = confidence def get_context(self) -> str: """Return context for agents.""" context = "Established facts:\n" for key, value in self.facts.items(): conf = self.confidence_scores.get(key, 0) context += f"- {key}: {value} (confidence: {conf:.0%})\n" return context def add_decision(self, decision: str, reasoning: str): """Record a decision.""" self.decisions.append({ "decision": decision, "reasoning": reasoning }) def to_json(self) -> str: return json.dumps({ "facts": self.facts, "decisions": self.decisions, "sources": self.sources }, indent=2) memory = SharedMemory() def research_with_memory(agent: AssistantAgent, query: str) -> str: """Research with shared memory.""" context = memory.get_context() response = agent.generate_reply( messages=[{ "role": "user", "content": f"""Known context: {context} New question: {query} Search and add new facts to memory.""" }] ) return response
Monitoring and Observability
Conversation Tracing
DEVELOPERpythonimport logging from datetime import datetime from typing import List, Dict class ConversationTracer: """Trace multi-agent conversations.""" def __init__(self): self.traces = [] self.logger = logging.getLogger("autogen_tracer") def trace_message(self, sender: str, receiver: str, content: str, metadata: dict = None): """Record a message.""" trace = { "timestamp": datetime.now().isoformat(), "sender": sender, "receiver": receiver, "content_length": len(content), "content_preview": content[:200], "metadata": metadata or {} } self.traces.append(trace) self.logger.info(f"{sender} -> {receiver}: {content[:100]}...") def get_summary(self) -> dict: """Conversation summary.""" agents = set() for trace in self.traces: agents.add(trace["sender"]) agents.add(trace["receiver"]) return { "total_messages": len(self.traces), "agents_involved": list(agents), "duration_seconds": self._calculate_duration(), "average_message_length": self._avg_message_length() } def _calculate_duration(self) -> float: if len(self.traces) < 2: return 0 start = datetime.fromisoformat(self.traces[0]["timestamp"]) end = datetime.fromisoformat(self.traces[-1]["timestamp"]) return (end - start).total_seconds() def _avg_message_length(self) -> float: if not self.traces: return 0 return sum(t["content_length"] for t in self.traces) / len(self.traces)
Costs and Performance
Cost Estimation
| Configuration | Messages/request | Estimated cost | Latency |
|---|---|---|---|
| 2 simple agents | 4-6 | $0.02-0.04 | 5-10s |
| 3 agents + validation | 8-12 | $0.05-0.10 | 10-20s |
| Debate (2 rounds) | 10-15 | $0.08-0.15 | 15-30s |
| Full team (5 agents) | 15-25 | $0.15-0.30 | 25-45s |
Optimizations
DEVELOPERpython# 1. Limit rounds groupchat = GroupChat( agents=agents, max_round=10 ) # 2. Search caching from functools import lru_cache @lru_cache(maxsize=100) def cached_search(query: str) -> str: return search_documents(query) # 3. Different models per agent cheap_config = {"model": "gpt-4o-mini", "temperature": 0} expensive_config = {"model": "gpt-4o", "temperature": 0.7} research_agent = AssistantAgent("Researcher", llm_config={"config_list": [cheap_config]}) writer_agent = AssistantAgent("Writer", llm_config={"config_list": [expensive_config]})
Implementation Checklist
- Well-defined agents with clear roles
- Registered search function
- GroupChat configured with round limit
- Error and timeout handling
- Shared memory if needed
- Tracing for debugging
- Tests with different question types
- Cost monitoring
Conclusion
AutoGen enables building sophisticated RAG systems with multi-agent collaboration. The key is to properly define each agent's roles and responsibilities, and to efficiently manage communication between them.
Further Reading
Need multi-agent RAG? Ailog offers RAG solutions with intelligent orchestration of specialized agents. Robust and scalable architecture.
Tags
Related Posts
RAG Agents: Orchestrating Multi-Agent Systems
Architect multi-agent RAG systems: orchestration, specialization, collaboration and failure handling for complex assistants.
Agentic RAG: Building AI Agents with Dynamic Knowledge Retrieval
Comprehensive guide to Agentic RAG: architecture, design patterns, implementing autonomous agents with knowledge retrieval, multi-tool orchestration, and advanced use cases.
Conversational RAG: Memory and Multi-Session Context
Implement RAG with conversational memory: context management, multi-session history, and personalized responses.