Query Routing: Direct Queries to the Right Source

Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

Query routing is a technique that directs each query to the most relevant data source. Rather than searching all databases simultaneously, an intelligent router analyzes the query and decides where to search. This guide explores routing strategies, from simple heuristics to sophisticated LLM classifiers.

Why Query Routing?

In an enterprise RAG system, data comes from multiple sources:

┌─────────────────────────────────────────────────────────────┐
│                       Data Sources                           │
├──────────────┬──────────────┬──────────────┬────────────────┤
│     FAQ      │     Docs     │   Products   │   Tickets      │
│   Support    │  Technical   │   Catalog    │   Resolved     │
├──────────────┴──────────────┴──────────────┴────────────────┤
│                                                              │
│  "How to return?"       →  FAQ Support                      │
│  "API rate limits?"     →  Technical Docs                   │
│  "iPhone 15 price?"     →  Product Catalog                  │
│  "WiFi connection bug?" →  Resolved Tickets                 │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Routing Benefits

Without Routing	With Routing
Searches all sources	Targets relevant source
Diluted results	Precise results
High latency	Optimal latency
Costs proportional to sources	Optimized costs

Routing Strategies

1. Keyword Routing

The simplest method: detect patterns in the query.

DEVELOPERpython
import re

class KeywordRouter:
    def __init__(self):
        self.routes = {
            "faq": [
                r"how\s+(do|can)\s+i",
                r"is\s+it\s+possible",
                r"return|refund|cancel",
                r"delivery|shipping"
            ],
            "docs": [
                r"api|endpoint|webhook",
                r"integrat|configur|install",
                r"authentication|token|oauth",
                r"error\s+\d{3}"
            ],
            "products": [
                r"price|cost|pricing",
                r"available|stock|inventory",
                r"features|specifications|specs",
                r"compare|versus|vs"
            ],
            "tickets": [
                r"bug|issue|error",
                r"(not|doesn't|won't)\s+work",
                r"stuck|blocked",
                r"resolved|solution|fix"
            ]
        }

    def route(self, query: str) -> str:
        query_lower = query.lower()

        scores = {}
        for route, patterns in self.routes.items():
            scores[route] = sum(
                1 for p in patterns
                if re.search(p, query_lower)
            )

        if max(scores.values()) == 0:
            return "default"

        return max(scores, key=scores.get)


router = KeywordRouter()
print(router.route("How do I return a product?"))  # "faq"
print(router.route("API rate limit exceeded"))     # "docs"

Advantages: Fast, predictable, no LLM costs Disadvantages: Rigid, pattern maintenance

2. Embedding Routing

Classify queries by similarity with representative examples.

DEVELOPERpython
from sentence_transformers import SentenceTransformer
import numpy as np

class EmbeddingRouter:
    def __init__(self, model_name: str = "BAAI/bge-m3"):
        self.model = SentenceTransformer(model_name)
        self.route_embeddings = {}
        self.route_examples = {
            "faq": [
                "How do I make a return?",
                "What are the delivery times?",
                "Can I cancel my order?",
                "How to contact support?"
            ],
            "docs": [
                "How to integrate the API?",
                "What is the request limit?",
                "How to configure OAuth?",
                "Webhook documentation"
            ],
            "products": [
                "What is the price of this product?",
                "Is it available in stock?",
                "Compare features",
                "Best sellers right now"
            ]
        }
        self._build_route_embeddings()

    def _build_route_embeddings(self):
        for route, examples in self.route_examples.items():
            embeddings = self.model.encode(examples)
            # Centroid of examples
            self.route_embeddings[route] = np.mean(embeddings, axis=0)

    def route(self, query: str) -> tuple[str, float]:
        query_embedding = self.model.encode(query)

        best_route = None
        best_similarity = -1

        for route, centroid in self.route_embeddings.items():
            similarity = np.dot(query_embedding, centroid) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(centroid)
            )
            if similarity > best_similarity:
                best_similarity = similarity
                best_route = route

        return best_route, best_similarity


router = EmbeddingRouter()
route, confidence = router.route("What is the monthly rate?")
print(f"Route: {route}, Confidence: {confidence:.3f}")
# Route: products, Confidence: 0.847

3. LLM Routing

The LLM analyzes the query and decides the optimal route.

DEVELOPERpython
from openai import OpenAI

class LLMRouter:
    def __init__(self):
        self.client = OpenAI()
        self.routes = {
            "faq": "General questions about service, returns, shipping, user accounts",
            "docs": "Technical documentation, API, integration, configuration, webhooks",
            "products": "Product catalog, pricing, availability, features, comparisons",
            "tickets": "Technical issues, bugs, incidents, specific errors"
        }

    def route(self, query: str) -> dict:
        routes_description = "\n".join([
            f"- {name}: {desc}"
            for name, desc in self.routes.items()
        ])

        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": f"""You are a query router. Analyze the question and determine the best source.

Available sources:
{routes_description}

Respond ONLY in JSON format:
{{"route": "route_name", "confidence": 0.0-1.0, "reasoning": "brief explanation"}}"""
                },
                {"role": "user", "content": query}
            ],
            temperature=0,
            response_format={"type": "json_object"}
        )

        return json.loads(response.choices[0].message.content)


router = LLMRouter()
result = router.route("The API returns error 429, how do I increase my limit?")
# {"route": "docs", "confidence": 0.95, "reasoning": "Technical question about API and limits"}

4. Hierarchical Routing

Combine multiple routing levels for finer decisions.

DEVELOPERpython
class HierarchicalRouter:
    def __init__(self):
        self.keyword_router = KeywordRouter()
        self.embedding_router = EmbeddingRouter()
        self.llm_router = LLMRouter()

    def route(self, query: str, use_llm_fallback: bool = True) -> dict:
        # Level 1: Keywords (fast, free)
        keyword_route = self.keyword_router.route(query)

        if keyword_route != "default":
            return {
                "route": keyword_route,
                "method": "keyword",
                "confidence": 0.9
            }

        # Level 2: Embedding (slower, free)
        embed_route, embed_confidence = self.embedding_router.route(query)

        if embed_confidence > 0.85:
            return {
                "route": embed_route,
                "method": "embedding",
                "confidence": embed_confidence
            }

        # Level 3: LLM (slow, paid) - only if necessary
        if use_llm_fallback:
            llm_result = self.llm_router.route(query)
            return {
                "route": llm_result["route"],
                "method": "llm",
                "confidence": llm_result["confidence"]
            }

        # Fallback: default route
        return {
            "route": embed_route,
            "method": "embedding_fallback",
            "confidence": embed_confidence
        }

Multi-Route Routing

Sometimes a query requires multiple sources.

DEVELOPERpython
class MultiRouteRouter:
    def __init__(self):
        self.client = OpenAI()

    def route(self, query: str) -> list[dict]:
        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": """Analyze the query and determine ALL relevant sources.
A query may need multiple sources.

Sources: faq, docs, products, tickets

Respond in JSON format:
{"routes": [{"name": "...", "relevance": 0.0-1.0}], "reasoning": "..."}"""
                },
                {"role": "user", "content": query}
            ],
            temperature=0,
            response_format={"type": "json_object"}
        )

        result = json.loads(response.choices[0].message.content)

        # Filter routes with relevance > 0.5
        relevant_routes = [
            r for r in result["routes"]
            if r["relevance"] > 0.5
        ]

        return relevant_routes


# Example
router = MultiRouteRouter()
routes = router.route("Why does the product API return error 500 on certain items?")
# [
#   {"name": "docs", "relevance": 0.9},
#   {"name": "tickets", "relevance": 0.8},
#   {"name": "products", "relevance": 0.6}
# ]

Routing with Metadata

Enrich routing with user context.

DEVELOPERpython
class ContextualRouter:
    def __init__(self):
        self.base_router = EmbeddingRouter()

    def route(
        self,
        query: str,
        user_context: dict
    ) -> dict:
        # Base routing
        base_route, confidence = self.base_router.route(query)

        # Contextual adjustments
        adjustments = self._compute_adjustments(user_context)

        # Apply adjustments
        route_scores = {base_route: confidence}

        for route, adjustment in adjustments.items():
            if route in route_scores:
                route_scores[route] *= adjustment
            else:
                route_scores[route] = confidence * 0.5 * adjustment

        final_route = max(route_scores, key=route_scores.get)

        return {
            "route": final_route,
            "confidence": route_scores[final_route],
            "adjustments_applied": adjustments
        }

    def _compute_adjustments(self, user_context: dict) -> dict:
        adjustments = {}

        # Technical user → boost docs
        if user_context.get("is_developer"):
            adjustments["docs"] = 1.3

        # Ticket history → boost tickets
        if user_context.get("has_open_tickets"):
            adjustments["tickets"] = 1.2

        # Current page = product → boost products
        if "product" in user_context.get("current_page", ""):
            adjustments["products"] = 1.4

        return adjustments


# Example
router = ContextualRouter()
result = router.route(
    query="How do I fix this error?",
    user_context={
        "is_developer": True,
        "has_open_tickets": True,
        "current_page": "/docs/api"
    }
)

Complete Pipeline Implementation

DEVELOPERpython
class RoutedRAGPipeline:
    def __init__(self):
        self.router = HierarchicalRouter()
        self.retrievers = {
            "faq": FAQRetriever(),
            "docs": DocsRetriever(),
            "products": ProductRetriever(),
            "tickets": TicketRetriever()
        }
        self.llm = OpenAI()

    def query(self, user_query: str, user_context: dict = None) -> dict:
        # 1. Routing
        route_result = self.router.route(user_query)
        selected_route = route_result["route"]

        # 2. Targeted retrieval
        retriever = self.retrievers[selected_route]
        documents = retriever.search(user_query, top_k=5)

        # 3. Generation
        context = "\n\n".join([d["content"] for d in documents])
        response = self._generate_response(user_query, context)

        return {
            "answer": response,
            "route": selected_route,
            "routing_method": route_result["method"],
            "routing_confidence": route_result["confidence"],
            "sources": documents
        }

    def _generate_response(self, query: str, context: str) -> str:
        response = self.llm.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": f"Answer the question based on the following context:\n\n{context}"
                },
                {"role": "user", "content": query}
            ]
        )
        return response.choices[0].message.content

Monitoring and Improvement

Logging Routing Decisions

DEVELOPERpython
class RoutingLogger:
    def __init__(self, analytics_client):
        self.analytics = analytics_client

    def log_routing_decision(
        self,
        query: str,
        route: str,
        method: str,
        confidence: float,
        user_feedback: str = None
    ):
        self.analytics.track("routing_decision", {
            "query": query,
            "route": route,
            "method": method,
            "confidence": confidence,
            "timestamp": datetime.now().isoformat(),
            "feedback": user_feedback
        })

    def analyze_routing_accuracy(self, days: int = 7) -> dict:
        decisions = self.analytics.query(
            "routing_decision",
            filters={"timestamp": {"$gte": days_ago(days)}}
        )

        # Calculate accuracy by method
        by_method = {}
        for d in decisions:
            method = d["method"]
            if method not in by_method:
                by_method[method] = {"correct": 0, "incorrect": 0, "unknown": 0}

            if d.get("feedback") == "correct":
                by_method[method]["correct"] += 1
            elif d.get("feedback") == "incorrect":
                by_method[method]["incorrect"] += 1
            else:
                by_method[method]["unknown"] += 1

        return by_method

Feedback Loop for Improvement

DEVELOPERpython
class AdaptiveRouter:
    def __init__(self):
        self.base_router = EmbeddingRouter()
        self.corrections = {}  # query_hash -> correct_route

    def route(self, query: str) -> dict:
        query_hash = hash(query.lower().strip())

        # Check for correction
        if query_hash in self.corrections:
            return {
                "route": self.corrections[query_hash],
                "method": "correction",
                "confidence": 1.0
            }

        return self.base_router.route(query)

    def record_correction(self, query: str, correct_route: str):
        """Record a user correction"""
        query_hash = hash(query.lower().strip())
        self.corrections[query_hash] = correct_route

        # Optional: periodically retrain the model
        if len(self.corrections) % 100 == 0:
            self._retrain_router()

Next Steps

Query routing optimizes your system by targeting the right sources. To go further:

Self-Query Retrieval - Let the LLM structure the search
Metadata Filtering - Refine with metadata
Ensemble Retrieval - Combine multiple retrievers

Intelligent Query Routing with Ailog

Ailog implements query routing transparently:

Automatic routing based on your data sources
Real-time adaptation based on user feedback
Smart multi-route when multiple sources are relevant
Integrated monitoring for continuous optimization

Try for free and get automatically optimized routing.

Query Routing: Direct Queries to the Right Source

Query Routing: Direct Queries to the Right Source

Why Query Routing?

Routing Benefits

Routing Strategies

1. Keyword Routing

2. Embedding Routing

3. LLM Routing

4. Hierarchical Routing

Multi-Route Routing

Routing with Metadata

Complete Pipeline Implementation

Monitoring and Improvement

Logging Routing Decisions

Feedback Loop for Improvement

Next Steps

Intelligent Query Routing with Ailog

Tags

Related Posts

Metadata Filtering: Refine RAG Search

Ensemble Retrieval: Combining Multiple Retrievers

Hybrid Fusion: Combining Dense and Sparse Retrieval

Ailog Assistant