GuideAdvanced

Conversational RAG: Memory and Multi-Session Context

March 1, 2026
18 min read
Ailog Team

Implement RAG with conversational memory: context management, multi-session history, and personalized responses.

Conversational RAG: Memory and Multi-Session Context

A static RAG responds to each question in isolation. A conversational RAG maintains the thread of discussion, remembers user preferences, and personalizes responses over time. This guide explains how to implement effective memory.

Why Memory is Essential

Limitations of Memoryless RAG

Without memory, each request is processed independently:

User: What is your return policy?
Assistant: You have 30 days to return a product.

User: And for electronic products?
Assistant: [No context] What would you like to know about electronic products?

With memory:

User: What is your return policy?
Assistant: You have 30 days to return a product.

User: And for electronic products?
Assistant: For electronic products, the return period is also 30 days,
          but they must be unopened unless defective.

Types of Memory

TypeScopeDurationUsage
Immediate contextCurrent turnEphemeralCo-references ("it", "that")
Session memoryConversationSessionDiscussion thread tracking
Long-term memoryUserPermanentPreferences, history
Semantic memoryGlobal basePermanentFacts learned from conversations

Conversational Architecture

┌─────────────────────────────────────────────────────────────┐
│                    USER REQUEST                              │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│               CONTEXT MANAGER                                │
│  ┌─────────────┐ ┌─────────────┐ ┌───────────────────────┐  │
│  │  Session    │ │   User      │ │      Long-term        │  │
│  │  History    │ │  Profile    │ │      Memory           │  │
│  └──────┬──────┘ └──────┬──────┘ └───────────┬───────────┘  │
│         └───────────────┼───────────────────┘               │
│                         ▼                                    │
│               ┌─────────────────┐                           │
│               │ Context Builder │                           │
│               └────────┬────────┘                           │
└────────────────────────┼────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    RAG PIPELINE                              │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│  │  Query       │ │   Retrieval  │ │     Generation       │ │
│  │  Rewriting   │ │   + Context  │ │     + Memory         │ │
│  └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Session History Management

Conversation Structure

DEVELOPERpython
from dataclasses import dataclass, field from datetime import datetime from typing import List, Optional @dataclass class Message: role: str # "user" or "assistant" content: str timestamp: datetime = field(default_factory=datetime.now) metadata: dict = field(default_factory=dict) @dataclass class Conversation: id: str user_id: str messages: List[Message] = field(default_factory=list) created_at: datetime = field(default_factory=datetime.now) context: dict = field(default_factory=dict) class ConversationManager: def __init__(self, storage, max_messages: int = 20): self.storage = storage self.max_messages = max_messages async def add_message( self, conversation_id: str, role: str, content: str, metadata: dict = None ): """Add a message to the conversation""" conversation = await self.storage.get(conversation_id) message = Message( role=role, content=content, metadata=metadata or {} ) conversation.messages.append(message) # Keep only the last N messages if len(conversation.messages) > self.max_messages: conversation.messages = conversation.messages[-self.max_messages:] await self.storage.save(conversation) async def get_context( self, conversation_id: str, max_turns: int = 5 ) -> str: """Retrieve formatted conversation context""" conversation = await self.storage.get(conversation_id) # Take the last turns recent_messages = conversation.messages[-(max_turns * 2):] context_parts = [] for msg in recent_messages: role = "User" if msg.role == "user" else "Assistant" context_parts.append(f"{role}: {msg.content}") return "\n".join(context_parts)

History Compression

For long conversations, compress old history:

DEVELOPERpython
class HistoryCompressor: def __init__(self, llm): self.llm = llm async def compress( self, messages: List[Message], keep_recent: int = 4 ) -> dict: """ Compress history while keeping recent messages intact """ if len(messages) <= keep_recent: return { "summary": None, "recent_messages": messages } # Messages to compress to_compress = messages[:-keep_recent] recent = messages[-keep_recent:] # Generate summary history_text = "\n".join([ f"{m.role}: {m.content}" for m in to_compress ]) prompt = f""" Summarize this conversation while preserving: - Key information exchanged - Decisions or commitments made - Important context for continuation Conversation: {history_text} Concise summary (max 200 words): """ summary = await self.llm.generate(prompt, temperature=0) return { "summary": summary, "recent_messages": recent }

Query Rewriting with Context

Co-reference Resolution

DEVELOPERpython
class QueryRewriter: def __init__(self, llm): self.llm = llm async def rewrite( self, query: str, conversation_context: str ) -> str: """ Rewrite query by resolving co-references """ prompt = f""" Rewrite the user's question to make it standalone, by resolving pronouns and context references. Conversation history: {conversation_context} Last question: {query} Rewritten question (standalone, no ambiguous pronouns): """ rewritten = await self.llm.generate(prompt, temperature=0) return rewritten.strip() async def extract_search_queries( self, query: str, context: str ) -> List[str]: """ Generate multiple optimized search queries """ prompt = f""" From this conversation and question, generate 2-3 optimized search queries to find relevant information. Context: {context} Question: {query} Search queries (one per line): """ result = await self.llm.generate(prompt, temperature=0.3) queries = [q.strip() for q in result.strip().split("\n") if q.strip()] return queries[:3]

Application Example

DEVELOPERpython
# Before rewriting context = """ User: I'm looking for a gaming laptop Assistant: I recommend the ASUS ROG Strix G15, excellent value. User: How much RAM does it have? """ query = "And the graphics card?" rewritten = await rewriter.rewrite(query, context) # "What is the graphics card of the ASUS ROG Strix G15?"

Long-term Memory

User Profile Storage

DEVELOPERpython
from datetime import datetime, timedelta class UserMemory: def __init__(self, db): self.db = db async def update_preference( self, user_id: str, key: str, value: any, confidence: float = 1.0 ): """ Update a user preference """ await self.db.upsert( "user_preferences", { "user_id": user_id, "key": key, "value": value, "confidence": confidence, "last_updated": datetime.now() }, conflict_keys=["user_id", "key"] ) async def get_preferences(self, user_id: str) -> dict: """ Retrieve all user preferences """ prefs = await self.db.find( "user_preferences", {"user_id": user_id} ) return {p["key"]: p["value"] for p in prefs} async def log_interaction( self, user_id: str, query: str, response: str, topic: str, satisfaction: float = None ): """ Log an interaction for future analysis """ await self.db.insert("user_interactions", { "user_id": user_id, "query": query, "response": response, "topic": topic, "satisfaction": satisfaction, "timestamp": datetime.now() }) async def extract_preferences_from_history( self, user_id: str, llm ) -> dict: """ Extract preferences from past interactions """ # Retrieve recent interactions interactions = await self.db.find( "user_interactions", { "user_id": user_id, "timestamp": {"$gte": datetime.now() - timedelta(days=30)} }, limit=50 ) if not interactions: return {} history = "\n".join([ f"Q: {i['query']}\nA: {i['response']}" for i in interactions ]) prompt = f""" Analyze this conversation history and extract user preferences and characteristics. History: {history} Extract in JSON: - preferred_language: preferred language - technical_level: beginner/intermediate/expert - topics_of_interest: list of frequent topics - communication_style: formal/informal - preferences: other detected preferences JSON: """ result = await llm.generate(prompt, temperature=0) return self._parse_json(result)

Response Personalization

DEVELOPERpython
class PersonalizedRAG: def __init__(self, rag_pipeline, user_memory, llm): self.rag = rag_pipeline self.memory = user_memory self.llm = llm async def query( self, user_id: str, conversation_id: str, query: str ) -> dict: """ Personalized RAG query """ # 1. Retrieve user profile preferences = await self.memory.get_preferences(user_id) # 2. Retrieve conversation context conv_context = await self.conv_manager.get_context(conversation_id) # 3. Rewrite query with context rewritten_query = await self.query_rewriter.rewrite(query, conv_context) # 4. Execute standard RAG rag_result = await self.rag.query(rewritten_query) # 5. Personalize response personalized = await self._personalize_response( query=query, rag_response=rag_result["answer"], preferences=preferences, context=conv_context ) return { "answer": personalized, "sources": rag_result["sources"], "original_query": query, "rewritten_query": rewritten_query } async def _personalize_response( self, query: str, rag_response: str, preferences: dict, context: str ) -> str: """ Adapt response to user preferences """ tech_level = preferences.get("technical_level", "intermediate") style = preferences.get("communication_style", "professional") prompt = f""" Adapt this response according to user profile. Profile: - Technical level: {tech_level} - Preferred style: {style} Conversation context: {context} Question: {query} Original response: {rag_response} Adapted response: """ return await self.llm.generate(prompt, temperature=0.3)

Shared Semantic Memory

Learning from Conversations

DEVELOPERpython
class SemanticMemory: def __init__(self, vector_db, embedder, llm): self.vector_db = vector_db self.embedder = embedder self.llm = llm async def learn_from_conversation( self, conversation: Conversation ): """ Extract and store facts learned from a conversation """ # Extract facts facts = await self._extract_facts(conversation) for fact in facts: # Check if fact already exists existing = await self._find_similar_fact(fact) if existing: # Update confidence await self._update_fact_confidence(existing, fact) else: # Store new fact await self._store_fact(fact) async def _extract_facts(self, conversation: Conversation) -> List[dict]: """ Extract factual information from a conversation """ messages_text = "\n".join([ f"{m.role}: {m.content}" for m in conversation.messages ]) prompt = f""" Extract new facts learned from this conversation. A fact must be: - Factual and verifiable - Useful for future conversations - New (not already known information) Conversation: {messages_text} Facts (JSON array format): [ {{"fact": "...", "category": "...", "confidence": 0.9}}, ... ] """ result = await self.llm.generate(prompt, temperature=0) return self._parse_json(result) async def recall( self, query: str, top_k: int = 5 ) -> List[dict]: """ Retrieve relevant facts for a query """ query_embedding = self.embedder.encode(query) results = await self.vector_db.search( collection="semantic_memory", query_vector=query_embedding, limit=top_k ) return [r.payload for r in results]

Session Management

Multi-device and Continuity

DEVELOPERpython
import redis import json class SessionManager: def __init__(self, redis_client: redis.Redis): self.redis = redis_client self.session_ttl = 3600 * 24 # 24 hours async def create_session( self, user_id: str, device_id: str = None ) -> str: """ Create a new session """ session_id = f"session_{user_id}_{datetime.now().timestamp()}" session_data = { "user_id": user_id, "device_id": device_id, "created_at": datetime.now().isoformat(), "messages": [], "context": {} } await self.redis.setex( session_id, self.session_ttl, json.dumps(session_data) ) return session_id async def resume_or_create( self, user_id: str, device_id: str = None ) -> str: """ Resume last session or create a new one """ # Look for recent active session pattern = f"session_{user_id}_*" keys = await self.redis.keys(pattern) if keys: # Sort by creation date sessions = [] for key in keys: data = json.loads(await self.redis.get(key)) sessions.append((key, data)) # Take most recent sessions.sort(key=lambda x: x[1]["created_at"], reverse=True) return sessions[0][0] # Create new session return await self.create_session(user_id, device_id)

Metrics and Evaluation

Conversational Quality

DEVELOPERpython
class ConversationMetrics: def __init__(self, db): self.db = db async def calculate_metrics( self, conversation_id: str ) -> dict: """ Calculate conversation metrics """ conversation = await self.db.get("conversations", conversation_id) return { # Length and engagement "total_turns": len(conversation["messages"]) // 2, "avg_user_message_length": self._avg_length( [m for m in conversation["messages"] if m["role"] == "user"] ), "avg_assistant_message_length": self._avg_length( [m for m in conversation["messages"] if m["role"] == "assistant"] ), # Resolution "ended_naturally": self._check_natural_end(conversation), "required_clarification": self._count_clarifications(conversation), # Coherence "topic_coherence": await self._calculate_topic_coherence(conversation), "context_usage": self._measure_context_usage(conversation) }

Best Practices

1. Limit Context Size

Don't overload the prompt with too much history. Compress or summarize.

2. Handle Topic Changes

Detect when user changes topic to avoid polluting context.

3. Respect Privacy

Allow users to delete their history and preferences.

4. Graceful Fallback

If memory is unavailable, system should work in stateless mode.

Learn More


Conversational RAG with Ailog

Implementing robust conversational memory is complex. With Ailog, benefit from native features:

  • Automatic session history with compression
  • Persistent user memory and personalization
  • Query rewriting for co-reference resolution
  • Multi-device with conversation resumption
  • Conversational analytics to measure engagement
  • GDPR compliance with data export and deletion

Try Ailog for free and deploy an intelligent conversational assistant.

Tags

RAGconversationmemorycontextchatLLM

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !