Query Routing: Direct Queries to the Right Source
Implement query routing to direct each query to the optimal data source. Classification, LLM routing, and advanced strategies explained.
Query Routing: Direct Queries to the Right Source
Query routing is a technique that directs each query to the most relevant data source. Rather than searching all databases simultaneously, an intelligent router analyzes the query and decides where to search. This guide explores routing strategies, from simple heuristics to sophisticated LLM classifiers.
Why Query Routing?
In an enterprise RAG system, data comes from multiple sources:
┌─────────────────────────────────────────────────────────────┐
│ Data Sources │
├──────────────┬──────────────┬──────────────┬────────────────┤
│ FAQ │ Docs │ Products │ Tickets │
│ Support │ Technical │ Catalog │ Resolved │
├──────────────┴──────────────┴──────────────┴────────────────┤
│ │
│ "How to return?" → FAQ Support │
│ "API rate limits?" → Technical Docs │
│ "iPhone 15 price?" → Product Catalog │
│ "WiFi connection bug?" → Resolved Tickets │
│ │
└─────────────────────────────────────────────────────────────┘
Routing Benefits
| Without Routing | With Routing |
|---|---|
| Searches all sources | Targets relevant source |
| Diluted results | Precise results |
| High latency | Optimal latency |
| Costs proportional to sources | Optimized costs |
Routing Strategies
1. Keyword Routing
The simplest method: detect patterns in the query.
DEVELOPERpythonimport re class KeywordRouter: def __init__(self): self.routes = { "faq": [ r"how\s+(do|can)\s+i", r"is\s+it\s+possible", r"return|refund|cancel", r"delivery|shipping" ], "docs": [ r"api|endpoint|webhook", r"integrat|configur|install", r"authentication|token|oauth", r"error\s+\d{3}" ], "products": [ r"price|cost|pricing", r"available|stock|inventory", r"features|specifications|specs", r"compare|versus|vs" ], "tickets": [ r"bug|issue|error", r"(not|doesn't|won't)\s+work", r"stuck|blocked", r"resolved|solution|fix" ] } def route(self, query: str) -> str: query_lower = query.lower() scores = {} for route, patterns in self.routes.items(): scores[route] = sum( 1 for p in patterns if re.search(p, query_lower) ) if max(scores.values()) == 0: return "default" return max(scores, key=scores.get) router = KeywordRouter() print(router.route("How do I return a product?")) # "faq" print(router.route("API rate limit exceeded")) # "docs"
Advantages: Fast, predictable, no LLM costs Disadvantages: Rigid, pattern maintenance
2. Embedding Routing
Classify queries by similarity with representative examples.
DEVELOPERpythonfrom sentence_transformers import SentenceTransformer import numpy as np class EmbeddingRouter: def __init__(self, model_name: str = "BAAI/bge-m3"): self.model = SentenceTransformer(model_name) self.route_embeddings = {} self.route_examples = { "faq": [ "How do I make a return?", "What are the delivery times?", "Can I cancel my order?", "How to contact support?" ], "docs": [ "How to integrate the API?", "What is the request limit?", "How to configure OAuth?", "Webhook documentation" ], "products": [ "What is the price of this product?", "Is it available in stock?", "Compare features", "Best sellers right now" ] } self._build_route_embeddings() def _build_route_embeddings(self): for route, examples in self.route_examples.items(): embeddings = self.model.encode(examples) # Centroid of examples self.route_embeddings[route] = np.mean(embeddings, axis=0) def route(self, query: str) -> tuple[str, float]: query_embedding = self.model.encode(query) best_route = None best_similarity = -1 for route, centroid in self.route_embeddings.items(): similarity = np.dot(query_embedding, centroid) / ( np.linalg.norm(query_embedding) * np.linalg.norm(centroid) ) if similarity > best_similarity: best_similarity = similarity best_route = route return best_route, best_similarity router = EmbeddingRouter() route, confidence = router.route("What is the monthly rate?") print(f"Route: {route}, Confidence: {confidence:.3f}") # Route: products, Confidence: 0.847
3. LLM Routing
The LLM analyzes the query and decides the optimal route.
DEVELOPERpythonfrom openai import OpenAI class LLMRouter: def __init__(self): self.client = OpenAI() self.routes = { "faq": "General questions about service, returns, shipping, user accounts", "docs": "Technical documentation, API, integration, configuration, webhooks", "products": "Product catalog, pricing, availability, features, comparisons", "tickets": "Technical issues, bugs, incidents, specific errors" } def route(self, query: str) -> dict: routes_description = "\n".join([ f"- {name}: {desc}" for name, desc in self.routes.items() ]) response = self.client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": f"""You are a query router. Analyze the question and determine the best source. Available sources: {routes_description} Respond ONLY in JSON format: {{"route": "route_name", "confidence": 0.0-1.0, "reasoning": "brief explanation"}}""" }, {"role": "user", "content": query} ], temperature=0, response_format={"type": "json_object"} ) return json.loads(response.choices[0].message.content) router = LLMRouter() result = router.route("The API returns error 429, how do I increase my limit?") # {"route": "docs", "confidence": 0.95, "reasoning": "Technical question about API and limits"}
4. Hierarchical Routing
Combine multiple routing levels for finer decisions.
DEVELOPERpythonclass HierarchicalRouter: def __init__(self): self.keyword_router = KeywordRouter() self.embedding_router = EmbeddingRouter() self.llm_router = LLMRouter() def route(self, query: str, use_llm_fallback: bool = True) -> dict: # Level 1: Keywords (fast, free) keyword_route = self.keyword_router.route(query) if keyword_route != "default": return { "route": keyword_route, "method": "keyword", "confidence": 0.9 } # Level 2: Embedding (slower, free) embed_route, embed_confidence = self.embedding_router.route(query) if embed_confidence > 0.85: return { "route": embed_route, "method": "embedding", "confidence": embed_confidence } # Level 3: LLM (slow, paid) - only if necessary if use_llm_fallback: llm_result = self.llm_router.route(query) return { "route": llm_result["route"], "method": "llm", "confidence": llm_result["confidence"] } # Fallback: default route return { "route": embed_route, "method": "embedding_fallback", "confidence": embed_confidence }
Multi-Route Routing
Sometimes a query requires multiple sources.
DEVELOPERpythonclass MultiRouteRouter: def __init__(self): self.client = OpenAI() def route(self, query: str) -> list[dict]: response = self.client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": """Analyze the query and determine ALL relevant sources. A query may need multiple sources. Sources: faq, docs, products, tickets Respond in JSON format: {"routes": [{"name": "...", "relevance": 0.0-1.0}], "reasoning": "..."}""" }, {"role": "user", "content": query} ], temperature=0, response_format={"type": "json_object"} ) result = json.loads(response.choices[0].message.content) # Filter routes with relevance > 0.5 relevant_routes = [ r for r in result["routes"] if r["relevance"] > 0.5 ] return relevant_routes # Example router = MultiRouteRouter() routes = router.route("Why does the product API return error 500 on certain items?") # [ # {"name": "docs", "relevance": 0.9}, # {"name": "tickets", "relevance": 0.8}, # {"name": "products", "relevance": 0.6} # ]
Routing with Metadata
Enrich routing with user context.
DEVELOPERpythonclass ContextualRouter: def __init__(self): self.base_router = EmbeddingRouter() def route( self, query: str, user_context: dict ) -> dict: # Base routing base_route, confidence = self.base_router.route(query) # Contextual adjustments adjustments = self._compute_adjustments(user_context) # Apply adjustments route_scores = {base_route: confidence} for route, adjustment in adjustments.items(): if route in route_scores: route_scores[route] *= adjustment else: route_scores[route] = confidence * 0.5 * adjustment final_route = max(route_scores, key=route_scores.get) return { "route": final_route, "confidence": route_scores[final_route], "adjustments_applied": adjustments } def _compute_adjustments(self, user_context: dict) -> dict: adjustments = {} # Technical user → boost docs if user_context.get("is_developer"): adjustments["docs"] = 1.3 # Ticket history → boost tickets if user_context.get("has_open_tickets"): adjustments["tickets"] = 1.2 # Current page = product → boost products if "product" in user_context.get("current_page", ""): adjustments["products"] = 1.4 return adjustments # Example router = ContextualRouter() result = router.route( query="How do I fix this error?", user_context={ "is_developer": True, "has_open_tickets": True, "current_page": "/docs/api" } )
Complete Pipeline Implementation
DEVELOPERpythonclass RoutedRAGPipeline: def __init__(self): self.router = HierarchicalRouter() self.retrievers = { "faq": FAQRetriever(), "docs": DocsRetriever(), "products": ProductRetriever(), "tickets": TicketRetriever() } self.llm = OpenAI() def query(self, user_query: str, user_context: dict = None) -> dict: # 1. Routing route_result = self.router.route(user_query) selected_route = route_result["route"] # 2. Targeted retrieval retriever = self.retrievers[selected_route] documents = retriever.search(user_query, top_k=5) # 3. Generation context = "\n\n".join([d["content"] for d in documents]) response = self._generate_response(user_query, context) return { "answer": response, "route": selected_route, "routing_method": route_result["method"], "routing_confidence": route_result["confidence"], "sources": documents } def _generate_response(self, query: str, context: str) -> str: response = self.llm.chat.completions.create( model="gpt-4o", messages=[ { "role": "system", "content": f"Answer the question based on the following context:\n\n{context}" }, {"role": "user", "content": query} ] ) return response.choices[0].message.content
Monitoring and Improvement
Logging Routing Decisions
DEVELOPERpythonclass RoutingLogger: def __init__(self, analytics_client): self.analytics = analytics_client def log_routing_decision( self, query: str, route: str, method: str, confidence: float, user_feedback: str = None ): self.analytics.track("routing_decision", { "query": query, "route": route, "method": method, "confidence": confidence, "timestamp": datetime.now().isoformat(), "feedback": user_feedback }) def analyze_routing_accuracy(self, days: int = 7) -> dict: decisions = self.analytics.query( "routing_decision", filters={"timestamp": {"$gte": days_ago(days)}} ) # Calculate accuracy by method by_method = {} for d in decisions: method = d["method"] if method not in by_method: by_method[method] = {"correct": 0, "incorrect": 0, "unknown": 0} if d.get("feedback") == "correct": by_method[method]["correct"] += 1 elif d.get("feedback") == "incorrect": by_method[method]["incorrect"] += 1 else: by_method[method]["unknown"] += 1 return by_method
Feedback Loop for Improvement
DEVELOPERpythonclass AdaptiveRouter: def __init__(self): self.base_router = EmbeddingRouter() self.corrections = {} # query_hash -> correct_route def route(self, query: str) -> dict: query_hash = hash(query.lower().strip()) # Check for correction if query_hash in self.corrections: return { "route": self.corrections[query_hash], "method": "correction", "confidence": 1.0 } return self.base_router.route(query) def record_correction(self, query: str, correct_route: str): """Record a user correction""" query_hash = hash(query.lower().strip()) self.corrections[query_hash] = correct_route # Optional: periodically retrain the model if len(self.corrections) % 100 == 0: self._retrain_router()
Next Steps
Query routing optimizes your system by targeting the right sources. To go further:
- Self-Query Retrieval - Let the LLM structure the search
- Metadata Filtering - Refine with metadata
- Ensemble Retrieval - Combine multiple retrievers
Intelligent Query Routing with Ailog
Ailog implements query routing transparently:
- Automatic routing based on your data sources
- Real-time adaptation based on user feedback
- Smart multi-route when multiple sources are relevant
- Integrated monitoring for continuous optimization
Try for free and get automatically optimized routing.
Tags
Related Posts
Hybrid Fusion: Combining Dense and Sparse Retrieval
Master hybrid fusion to combine semantic and lexical search. RRF, weighted fusion, and optimal combination strategies explained.
Sparse Retrieval and BM25: When Lexical Search Wins
Discover sparse retrieval and BM25 for precise lexical search. Use cases, implementation, and comparison with dense retrieval explained.
Dense Retrieval: Semantic Search with Embeddings
Master dense retrieval for high-performance semantic search. Embeddings, models, vector indexing, and advanced optimizations explained.