Query Optimization: Making Retrieval More Effective
Techniques to optimize user queries for better retrieval: query rewriting, expansion, decomposition, and routing strategies.
The Query Problem
Users don't always ask questions in the optimal format for retrieval:
- Too vague: "How does it work?"
- Too specific: "What's the RGB hex code for the blue used in the logo of our mobile app in dark mode?"
- Ambiguous: "What about the other one?"
- Misspelled: "How do I conifgure the setings?"
- Multi-intent: "What are the pricing plans and how do I upgrade and can I get a refund?"
Query optimization bridges the gap between how users ask and how the system searches.
Query Preprocessing
Normalization
DEVELOPERpythondef normalize_query(query: str) -> str: # Lowercase query = query.lower() # Remove extra whitespace query = ' '.join(query.split()) # Fix common typos (optional) query = spell_check(query) # Remove stop words (optional, be careful) # query = remove_stop_words(query) return query
Spell Checking
DEVELOPERpythonfrom symspellpy import SymSpell sym_spell = SymSpell(max_dictionary_edit_distance=2) sym_spell.load_dictionary("frequency_dictionary.txt", 0, 1) def correct_spelling(query: str) -> str: words = query.split() corrected_words = [] for word in words: suggestions = sym_spell.lookup(word, max_edit_distance=2) if suggestions: corrected_words.append(suggestions[0].term) else: corrected_words.append(word) return ' '.join(corrected_words)
Query Rewriting
Template-Based Rewriting
DEVELOPERpythonREWRITE_TEMPLATES = { r"how (?:do|can) i (.+)\?": "Steps to {}", r"what is (.+)\?": "{} definition and explanation", r"why (.+)\?": "Reasons and explanation for {}", } def template_rewrite(query: str) -> str: for pattern, template in REWRITE_TEMPLATES.items(): match = re.match(pattern, query, re.IGNORECASE) if match: return template.format(match.group(1)) return query # Example query = "How do I reset my password?" rewritten = template_rewrite(query) # "Steps to reset my password"
LLM-Based Rewriting
DEVELOPERpythonasync def llm_rewrite_query(query: str, llm) -> str: prompt = f"""Rewrite this question to be more specific and search-friendly. Original: {query} Rewritten:""" return await llm.generate(prompt, max_tokens=50) # Example query = "How does it work?" context = get_conversation_context() rewritten = await llm_rewrite_query(f"{context}\n{query}", llm) # "How does the password reset feature work?"
Query Expansion
Synonym Expansion
DEVELOPERpythonfrom nltk.corpus import wordnet def expand_with_synonyms(query: str, max_synonyms=2) -> List[str]: words = query.split() expanded_queries = [query] # Original for word in words: synsets = wordnet.synsets(word) for synset in synsets[:max_synonyms]: for lemma in synset.lemmas()[:1]: # One synonym per synset synonym = lemma.name().replace('_', ' ') if synonym.lower() != word.lower(): # Replace word with synonym new_query = query.replace(word, synonym) expanded_queries.append(new_query) return list(set(expanded_queries)) # Example queries = expand_with_synonyms("repair broken device") # ["repair broken device", "fix broken device", "repair damaged device"]
LLM-Based Expansion
DEVELOPERpythonasync def generate_query_variations(query: str, llm, num_variations=3) -> List[str]: prompt = f"""Generate {num_variations} different ways to ask this question: Original: {query} Variations: 1.""" response = await llm.generate(prompt) variations = parse_numbered_list(response) return [query] + variations # Include original # Example variations = await generate_query_variations("database performance issues", llm) # [ # "database performance issues", # "slow database queries", # "how to optimize database speed", # "database latency problems" # ]
Query Decomposition
Break complex queries into simpler sub-queries.
Rule-Based Decomposition
DEVELOPERpythondef decompose_query(query: str) -> List[str]: # Split by "and" if " and " in query.lower(): return [q.strip() for q in re.split(r'\s+and\s+', query, flags=re.IGNORECASE)] # Split by comma if ", " in query: return [q.strip() for q in query.split(", ")] # Single query return [query] # Example decompose_query("What are the pricing plans and how do I upgrade?") # ["What are the pricing plans", "how do I upgrade"]
LLM-Based Decomposition
DEVELOPERpythonasync def llm_decompose(complex_query: str, llm) -> List[str]: prompt = f"""Break this complex question into simpler sub-questions: Question: {complex_query} Sub-questions: 1.""" response = await llm.generate(prompt) return parse_numbered_list(response) # Example sub_questions = await llm_decompose( "What are the system requirements and how much does it cost and is there a free trial?", llm ) # [ # "What are the system requirements?", # "How much does it cost?", # "Is there a free trial?" # ]
Multi-Step Retrieval
DEVELOPERpythonasync def multi_step_retrieval(complex_query: str, llm, vector_db): # Decompose sub_queries = await llm_decompose(complex_query, llm) # Retrieve for each sub-query all_contexts = [] for sub_q in sub_queries: contexts = await vector_db.search(sub_q, k=3) all_contexts.extend(contexts) # Deduplicate unique_contexts = deduplicate_by_id(all_contexts) # Generate comprehensive answer answer = await llm.generate( query=complex_query, contexts=unique_contexts ) return answer
Query Routing
Direct different queries to different retrieval strategies.
Intent Classification
DEVELOPERpythonclass QueryRouter: def __init__(self, llm): self.llm = llm async def classify_intent(self, query: str) -> str: prompt = f"""Classify the intent of this query: Query: {query} Intent (choose one): - factual: Asking for specific facts - procedural: How to do something - troubleshooting: Fixing a problem - comparison: Comparing options - explanation: Understanding a concept Intent:""" intent = await self.llm.generate(prompt, max_tokens=10) return intent.strip().lower() async def route_query(self, query: str, retrievers: dict): intent = await self.classify_intent(query) # Route based on intent if intent == "procedural": return await retrievers['docs'].retrieve(query) elif intent == "troubleshooting": return await retrievers['tickets'].retrieve(query) elif intent == "factual": return await retrievers['knowledge_base'].retrieve(query) else: # Default: try all and merge return await self.ensemble_retrieve(query, retrievers)
Complexity-Based Routing
DEVELOPERpythondef estimate_complexity(query: str) -> str: # Simple heuristics word_count = len(query.split()) has_and_or = any(word in query.lower() for word in ['and', 'or', 'also']) has_multiple_questions = query.count('?') > 1 if word_count > 20 or has_and_or or has_multiple_questions: return 'complex' elif word_count > 10: return 'medium' else: return 'simple' async def complexity_based_retrieval(query: str): complexity = estimate_complexity(query) if complexity == 'simple': # Simple: vector search only return await vector_retrieve(query, k=3) elif complexity == 'medium': # Medium: hybrid search return await hybrid_retrieve(query, k=5) else: # Complex: decompose and multi-step return await multi_step_retrieval(query)
Contextual Query Enhancement
Use conversation history to improve queries.
Session Context
DEVELOPERpythonclass ContextualQueryEnhancer: def __init__(self): self.conversation_history = [] def add_turn(self, query: str, answer: str): self.conversation_history.append({ 'query': query, 'answer': answer }) async def enhance_query(self, current_query: str, llm) -> str: if not self.conversation_history: return current_query # Get recent context recent = self.conversation_history[-3:] # Last 3 turns context = "\n".join([ f"User: {turn['query']}\nAssistant: {turn['answer']}" for turn in recent ]) prompt = f"""Given the conversation history, rewrite the current query to be standalone and clear. Conversation: {context} Current query: {current_query} Standalone query:""" enhanced = await llm.generate(prompt, max_tokens=100) return enhanced.strip() # Example usage enhancer = ContextualQueryEnhancer() enhancer.add_turn( "What are the pricing plans?", "We offer Basic ($10/mo), Pro ($25/mo), and Enterprise (custom)." ) enhanced = await enhancer.enhance_query("What about the features?", llm) # "What are the features included in each pricing plan?"
Query Filtering
Inappropriate Query Detection
DEVELOPERpythonasync def filter_inappropriate(query: str, llm) -> bool: """ Check if query is appropriate for the RAG system """ prompt = f"""Is this query appropriate for a customer support system? Query: {query} Answer 'yes' or 'no':""" response = await llm.generate(prompt, max_tokens=5) return 'yes' in response.lower() # Usage if not await filter_inappropriate(user_query, llm): return "I can only help with product-related questions."
Out-of-Scope Detection
DEVELOPERpythonSCOPE_KEYWORDS = { 'in_scope': ['pricing', 'features', 'setup', 'troubleshooting'], 'out_of_scope': ['weather', 'news', 'politics', 'recipes'] } def is_in_scope(query: str) -> bool: query_lower = query.lower() # Check for out-of-scope keywords if any(keyword in query_lower for keyword in SCOPE_KEYWORDS['out_of_scope']): return False # Check for in-scope keywords if any(keyword in query_lower for keyword in SCOPE_KEYWORDS['in_scope']): return True # Default: assume in scope (can also use LLM for better accuracy) return True
Query Augmentation
Add context to improve retrieval.
Metadata Injection
DEVELOPERpythondef augment_with_metadata(query: str, user_context: dict) -> str: """ Add user-specific context to query """ plan = user_context.get('plan', 'basic') role = user_context.get('role', 'user') # Add metadata that might help retrieval augmented = f"{query} [user_plan:{plan}] [role:{role}]" return augmented # Example query = "How do I export data?" user_context = {'plan': 'enterprise', 'role': 'admin'} augmented = augment_with_metadata(query, user_context) # "How do I export data? [user_plan:enterprise] [role:admin]"
Temporal Context
DEVELOPERpythonfrom datetime import datetime def add_temporal_context(query: str) -> str: """ Add current date/time to query for time-sensitive retrieval """ now = datetime.now() temporal_query = f"{query} [date:{now.strftime('%Y-%m-%d')}]" return temporal_query # Useful for queries like: # "What's new?" → "What's new? [date:2025-02-25]" # "Latest features" → "Latest features [date:2025-02-25]"
Optimizing Multiple Queries
When using query expansion or multi-query approaches:
Parallel Retrieval
DEVELOPERpythonimport asyncio async def parallel_multi_query(queries: List[str], vector_db, k=5): """ Retrieve for multiple queries in parallel """ tasks = [vector_db.search(q, k=k) for q in queries] results = await asyncio.gather(*tasks) # Merge and deduplicate all_docs = [] for result in results: all_docs.extend(result) unique_docs = deduplicate_by_id(all_docs) # Re-rank by frequency (documents appearing in multiple queries) doc_counts = Counter([doc['id'] for doc in all_docs]) sorted_docs = sorted( unique_docs, key=lambda doc: doc_counts[doc['id']], reverse=True ) return sorted_docs[:k]
Score Fusion
DEVELOPERpythondef fuse_results(multi_query_results: List[List[dict]], method='rrf') -> List[dict]: """ Combine results from multiple queries """ if method == 'rrf': # Reciprocal Rank Fusion doc_scores = {} for results in multi_query_results: for rank, doc in enumerate(results, start=1): doc_id = doc['id'] if doc_id not in doc_scores: doc_scores[doc_id] = {'doc': doc, 'score': 0} doc_scores[doc_id]['score'] += 1 / (60 + rank) ranked = sorted( doc_scores.values(), key=lambda x: x['score'], reverse=True ) return [item['doc'] for item in ranked] elif method == 'max': # Take best score doc_scores = {} for results in multi_query_results: for doc in results: doc_id = doc['id'] score = doc.get('score', 0) if doc_id not in doc_scores or score > doc_scores[doc_id]['score']: doc_scores[doc_id] = {'doc': doc, 'score': score} ranked = sorted( doc_scores.values(), key=lambda x: x['score'], reverse=True ) return [item['doc'] for item in ranked]
Best Practices
- Start simple: Normalize and spell-check before complex optimizations
- Measure impact: A/B test query optimizations
- Don't over-optimize: Sometimes simple queries work best
- Preserve original: Keep original query for fallback
- User feedback: Track which optimizations improve satisfaction
- Context matters: Use conversation history when available
- Async everywhere: Parallelize multiple query variants
When to Use Each Technique
| Technique | Use When | Impact |
|---|---|---|
| Normalization | Always | Low (foundation) |
| Spell checking | User-facing apps | Medium |
| Query rewriting | Vague queries common | Medium |
| Query expansion | Recall is priority | High |
| Decomposition | Complex multi-part queries | High |
| Routing | Multiple data sources | Medium-High |
| Contextual | Chat/conversation | High |
Next Steps
After optimizing queries, managing the context window effectively is crucial for staying within token limits and optimizing costs. The final guide covers context window optimization strategies.
Tags
Articles connexes
Hybrid Search: Combine Semantic and Keyword Search
Boost retrieval accuracy by 20-30%: combine vector search with BM25 keyword matching for superior RAG performance.
Query Expansion: Retrieve More Relevant Results
Improve recall by 40%: expand user queries with synonyms, sub-queries, and LLM-generated variations.
Parent Document Retrieval: Context Without Noise
Search small chunks, retrieve full documents: the best of both precision and context for RAG systems.