Query Optimization: Making Retrieval More Effective
Techniques to optimize user queries for better retrieval: query rewriting, expansion, decomposition, and routing strategies.
- Author
- Ailog Research Team
- Published
- Reading time
- 10 min read
- Level
- intermediate
The Query Problem
Users don't always ask questions in the optimal format for retrieval: • Too vague: "How does it work?" • Too specific: "What's the RGB hex code for the blue used in the logo of our mobile app in dark mode?" • Ambiguous: "What about the other one?" • Misspelled: "How do I conifgure the setings?" • Multi-intent: "What are the pricing plans and how do I upgrade and can I get a refund?"
Query optimization bridges the gap between how users ask and how the system searches.
Query Preprocessing
Normalization
``python def normalize_query(query: str) -> str: Lowercase query = query.lower()
Remove extra whitespace query = ' '.join(query.split())
Fix common typos (optional) query = spell_check(query)
Remove stop words (optional, be careful) query = remove_stop_words(query)
return query `
Spell Checking
`python from symspellpy import SymSpell
sym_spell = SymSpell(max_dictionary_edit_distance=2) sym_spell.load_dictionary("frequency_dictionary.txt", 0, 1)
def correct_spelling(query: str) -> str: words = query.split() corrected_words = []
for word in words: suggestions = sym_spell.lookup(word, max_edit_distance=2)
if suggestions: corrected_words.append(suggestions[0].term) else: corrected_words.append(word)
return ' '.join(corrected_words) `
Query Rewriting
Template-Based Rewriting
`python REWRITE_TEMPLATES = { r"how (?:do|can) i (.+)\?": "Steps to {}", r"what is (.+)\?": "{} definition and explanation", r"why (.+)\?": "Reasons and explanation for {}", }
def template_rewrite(query: str) -> str: for pattern, template in REWRITE_TEMPLATES.items(): match = re.match(pattern, query, re.IGNORECASE) if match: return template.format(match.group(1))
return query
Example query = "How do I reset my password?" rewritten = template_rewrite(query) "Steps to reset my password" `
LLM-Based Rewriting
`python async def llm_rewrite_query(query: str, llm) -> str: prompt = f"""Rewrite this question to be more specific and search-friendly.
Original: {query}
Rewritten:"""
return await llm.generate(prompt, max_tokens=50)
Example query = "How does it work?" context = get_conversation_context()
rewritten = await llm_rewrite_query(f"{context}\n{query}", llm) "How does the password reset feature work?" `
Query Expansion
Synonym Expansion
`python from nltk.corpus import wordnet
def expand_with_synonyms(query: str, max_synonyms=2) -> List[str]: words = query.split() expanded_queries = [query] Original
for word in words: synsets = wordnet.synsets(word)
for synset in synsets[:max_synonyms]: for lemma in synset.lemmas()[:1]: One synonym per synset synonym = lemma.name().replace('_', ' ')
if synonym.lower() != word.lower(): Replace word with synonym new_query = query.replace(word, synonym) expanded_queries.append(new_query)
return list(set(expanded_queries))
Example queries = expand_with_synonyms("repair broken device") ["repair broken device", "fix broken device", "repair damaged device"] `
LLM-Based Expansion
`python async def generate_query_variations(query: str, llm, num_variations=3) -> List[str]: prompt = f"""Generate {num_variations} different ways to ask this question:
Original: {query}
Variations: 1."""
response = await llm.generate(prompt) variations = parse_numbered_list(response)
return [query] + variations Include original
Example variations = await generate_query_variations("database performance issues", llm) [ "database performance issues", "slow database queries", "how to optimize database speed", "database latency problems" ] `
Query Decomposition
Break complex queries into simpler sub-queries.
Rule-Based Decomposition
`python def decompose_query(query: str) -> List[str]: Split by "and" if " and " in query.lower(): return [q.strip() for q in re.split(r'\s+and\s+', query, flags=re.IGNORECASE)]
Split by comma if ", " in query: return [q.strip() for q in query.split(", ")]
Single query return [query]
Example decompose_query("What are the pricing plans and how do I upgrade?") ["What are the pricing plans", "how do I upgrade"] `
LLM-Based Decomposition
`python async def llm_decompose(complex_query: str, llm) -> List[str]: prompt = f"""Break this complex question into simpler sub-questions:
Question: {complex_query}
Sub-questions: 1."""
response = await llm.generate(prompt) return parse_numbered_list(response)
Example sub_questions = await llm_decompose( "What are the system requirements and how much does it cost and is there a free trial?", llm ) [ "What are the system requirements?", "How much does it cost?", "Is there a free trial?" ] `
Multi-Step Retrieval
`python async def multi_step_retrieval(complex_query: str, llm, vector_db): Decompose sub_queries = await llm_decompose(complex_query, llm)
Retrieve for each sub-query all_contexts = [] for sub_q in sub_queries: contexts = await vector_db.search(sub_q, k=3) all_contexts.extend(contexts)
Deduplicate unique_contexts = deduplicate_by_id(all_contexts)
Generate comprehensive answer answer = await llm.generate( query=complex_query, contexts=unique_contexts )
return answer `
Query Routing
Direct different queries to different retrieval strategies.
Intent Classification
`python class QueryRouter: def __init__(self, llm): self.llm = llm
async def classify_intent(self, query: str) -> str: prompt = f"""Classify the intent of this query:
Query: {query}
Intent (choose one): • factual: Asking for specific facts • procedural: How to do something • troubleshooting: Fixing a problem • comparison: Comparing options • explanation: Understanding a concept
Intent:"""
intent = await self.llm.generate(prompt, max_tokens=10) return intent.strip().lower()
async def route_query(self, query: str, retrievers: dict): intent = await self.classify_intent(query)
Route based on intent if intent == "procedural": return await retrievers['docs'].retrieve(query) elif intent == "troubleshooting": return await retrievers['tickets'].retrieve(query) elif intent == "factual": return await retrievers['knowledge_base'].retrieve(query) else: Default: try all and merge return await self.ensemble_retrieve(query, retrievers) `
Complexity-Based Routing
`python def estimate_complexity(query: str) -> str: Simple heuristics word_count = len(query.split()) has_and_or = any(word in query.lower() for word in ['and', 'or', 'also']) has_multiple_questions = query.count('?') > 1
if word_count > 20 or has_and_or or has_multiple_questions: return 'complex' elif word_count > 10: return 'medium' else: return 'simple'
async def complexity_based_retrieval(query: str): complexity = estimate_complexity(query)
if complexity == 'simple': Simple: vector search only return await vector_retrieve(query, k=3)
elif complexity == 'medium': Medium: hybrid search return await hybrid_retrieve(query, k=5)
else: Complex: decompose and multi-step return await multi_step_retrieval(query) `
Contextual Query Enhancement
Use conversation history to improve queries.
Session Context
`python class ContextualQueryEnhancer: def __init__(self): self.conversation_history = []
def add_turn(self, query: str, answer: str): self.conversation_history.append({ 'query': query, 'answer': answer })
async def enhance_query(self, current_query: str, llm) -> str: if not self.conversation_history: return current_query
Get recent context recent = self.conversation_history[-3:] Last 3 turns
context = "\n".join([ f"User: {turn['query']}\nAssistant: {turn['answer']}" for turn in recent ])
prompt = f"""Given the conversation history, rewrite the current query to be standalone and clear.
Conversation: {context}
Current query: {current_query}
Standalone query:"""
enhanced = await llm.generate(prompt, max_tokens=100) return enhanced.strip()
Example usage enhancer = ContextualQueryEnhancer()
enhancer.add_turn( "What are the pricing plans?", "We offer Basic ($10/mo), Pro ($25/mo), and Enterprise (custom)." )
enhanced = await enhancer.enhance_query("What about the features?", llm) "What are the features included in each pricing plan?" `
Query Filtering
Inappropriate Query Detection
`python async def filter_inappropriate(query: str, llm) -> bool: """ Check if query is appropriate for the RAG system """ prompt = f"""Is this query appropriate for a customer support system?
Query: {query}
Answer 'yes' or 'no':"""
response = await llm.generate(prompt, max_tokens=5)
return 'yes' in response.lower()
Usage if not await filter_inappropriate(user_query, llm): return "I can only help with product-related questions." `
Out-of-Scope Detection
`python SCOPE_KEYWORDS = { 'in_scope': ['pricing', 'features', 'setup', 'troubleshooting'], 'out_of_scope': ['weather', 'news', 'politics', 'recipes'] }
def is_in_scope(query: str) -> bool: query_lower = query.lower()
Check for out-of-scope keywords if any(keyword in query_lower for keyword in SCOPE_KEYWORDS['out_of_scope']): return False
Check for in-scope keywords if any(keyword in query_lower for keyword in SCOPE_KEYWORDS['in_scope']): return True
Default: assume in scope (can also use LLM for better accuracy) return True `
Query Augmentation
Add context to improve retrieval.
Metadata Injection
`python def augment_with_metadata(query: str, user_context: dict) -> str: """ Add user-specific context to query """ plan = user_context.get('plan', 'basic') role = user_context.get('role', 'user')
Add metadata that might help retrieval augmented = f"{query} [user_plan:{plan}] [role:{role}]"
return augmented
Example query = "How do I export data?" user_context = {'plan': 'enterprise', 'role': 'admin'}
augmented = augment_with_metadata(query, user_context) "How do I export data? [user_plan:enterprise] [role:admin]" `
Temporal Context
`python from datetime import datetime
def add_temporal_context(query: str) -> str: """ Add current date/time to query for time-sensitive retrieval """ now = datetime.now() temporal_query = f"{query} [date:{now.strftime('%Y-%m-%d')}]"
return temporal_query
Useful for queries like: "What's new?" → "What's new? [date:2025-02-25]" "Latest features" → "Latest features [date:2025-02-25]" `
Optimizing Multiple Queries
When using query expansion or multi-query approaches:
Parallel Retrieval
`python import asyncio
async def parallel_multi_query(queries: List[str], vector_db, k=5): """ Retrieve for multiple queries in parallel """ tasks = [vector_db.search(q, k=k) for q in queries] results = await asyncio.gather(*tasks)
Merge and deduplicate all_docs = [] for result in results: all_docs.extend(result)
unique_docs = deduplicate_by_id(all_docs)
Re-rank by frequency (documents appearing in multiple queries) doc_counts = Counter([doc['id'] for doc in all_docs])
sorted_docs = sorted( unique_docs, key=lambda doc: doc_counts[doc['id']], reverse=True )
return sorted_docs[:k] `
Score Fusion
`python def fuse_results(multi_query_results: List[List[dict]], method='rrf') -> List[dict]: """ Combine results from multiple queries """ if method == 'rrf': Reciprocal Rank Fusion doc_scores = {}
for results in multi_query_results: for rank, doc in enumerate(results, start=1): doc_id = doc['id'] if doc_id not in doc_scores: doc_scores[doc_id] = {'doc': doc, 'score': 0}
doc_scores[doc_id]['score'] += 1 / (60 + rank)
ranked = sorted( doc_scores.values(), key=lambda x: x['score'], reverse=True )
return [item['doc'] for item in ranked]
elif method == 'max': Take best score doc_scores = {}
for results in multi_query_results: for doc in results: doc_id = doc['id'] score = doc.get('score', 0)
if doc_id not in doc_scores or score > doc_scores[doc_id]['score']: doc_scores[doc_id] = {'doc': doc, 'score': score}
ranked = sorted( doc_scores.values(), key=lambda x: x['score'], reverse=True )
return [item['doc'] for item in ranked] ``
Best Practices Start simple: Normalize and spell-check before complex optimizations Measure impact: A/B test query optimizations Don't over-optimize: Sometimes simple queries work best Preserve original: Keep original query for fallback User feedback: Track which optimizations improve satisfaction Context matters: Use conversation history when available Async everywhere: Parallelize multiple query variants
When to Use Each Technique
| Technique | Use When | Impact | |-----------|----------|--------| | Normalization | Always | Low (foundation) | | Spell checking | User-facing apps | Medium | | Query rewriting | Vague queries common | Medium | | Query expansion | Recall is priority | High | | Decomposition | Complex multi-part queries | High | | Routing | Multiple data sources | Medium-High | | Contextual | Chat/conversation | High |
Next Steps
After optimizing queries, managing the context window effectively is crucial for staying within token limits and optimizing costs. The final guide covers context window optimization strategies.