5. Retrieval

Self-Query Retrieval: Let the LLM Structure the Search

March 8, 2026
Ailog Team

Implement self-query retrieval to transform natural language queries into structured filters. LLM, filter extraction, and optimization.

Self-Query Retrieval: Let the LLM Structure the Search

Self-query retrieval uses an LLM to transform a natural language query into a combination of semantic search and structured filters. Instead of searching "Samsung smartphones under 500 dollars", the system automatically understands: search for "Samsung smartphones" + filter price < 500. This guide explores this powerful technique and its implementation.

The Problem with Complex Queries

User queries often mix semantic intent with factual constraints:

"Machine learning articles published in 2024 by French authors"

Decomposition:
├── Semantic search: "machine learning"
├── Date filter: year = 2024
└── Author filter: country = "France"

Pure vector search cannot efficiently handle these constraints. Self-query retrieval solves this problem.

How Self-Query Works

┌─────────────────────────────────────────────────────────────┐
│                   Self-Query Pipeline                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  User Query                                                  │
│  "Apple products < $1000 released this year"                │
│                        │                                     │
│                        ▼                                     │
│                 ┌─────────────┐                             │
│                 │     LLM     │                             │
│                 │  Extractor  │                             │
│                 └─────────────┘                             │
│                        │                                     │
│          ┌─────────────┴─────────────┐                      │
│          ▼                           ▼                      │
│   ┌─────────────┐           ┌─────────────┐                │
│   │  Semantic   │           │  Structured │                │
│   │   Query     │           │   Filters   │                │
│   │ "Apple      │           │ brand=Apple │                │
│   │  products"  │           │ price<1000  │                │
│   └─────────────┘           │ year=2024   │                │
│          │                  └─────────────┘                │
│          │                         │                        │
│          └───────────┬─────────────┘                        │
│                      ▼                                      │
│              ┌─────────────┐                                │
│              │  Combined   │                                │
│              │   Search    │                                │
│              └─────────────┘                                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Basic Implementation

1. Define the Metadata Schema

DEVELOPERpython
from pydantic import BaseModel, Field from typing import Optional, Literal from enum import Enum class ProductCategory(str, Enum): ELECTRONICS = "electronics" CLOTHING = "clothing" HOME = "home" SPORTS = "sports" class ProductMetadata(BaseModel): """Schema of available metadata for filtering""" brand: Optional[str] = Field( None, description="Product brand (e.g., Apple, Samsung, Nike)" ) category: Optional[ProductCategory] = Field( None, description="Product category" ) price_min: Optional[float] = Field( None, description="Minimum price in dollars" ) price_max: Optional[float] = Field( None, description="Maximum price in dollars" ) year: Optional[int] = Field( None, description="Product release year" ) in_stock: Optional[bool] = Field( None, description="Stock availability" ) rating_min: Optional[float] = Field( None, description="Minimum rating (1-5)" ) class SelfQueryOutput(BaseModel): """LLM self-query output""" semantic_query: str = Field( description="The part of the query for semantic search" ) filters: ProductMetadata = Field( default_factory=ProductMetadata, description="Structured filters extracted from query" )

2. Create the LLM Extractor

DEVELOPERpython
from openai import OpenAI import json class SelfQueryExtractor: def __init__(self): self.client = OpenAI() self.schema = ProductMetadata.model_json_schema() def extract(self, query: str) -> SelfQueryOutput: system_prompt = f"""You are a query extractor. Analyze the user question and extract: 1. The semantic part (what we're conceptually searching for) 2. Structured filters (factual constraints) Available filter schema: {json.dumps(self.schema, indent=2)} Rules: - Don't invent filters not present in the query - For prices, use price_min and/or price_max - For "this year", use year: 2024 - For "recent", use year: 2023 or 2024 Respond in JSON format: {{ "semantic_query": "description of what we're looking for", "filters": {{ "brand": "...", "category": "...", ... }} }}""" response = self.client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0, response_format={"type": "json_object"} ) result = json.loads(response.choices[0].message.content) return SelfQueryOutput(**result) # Test extractor = SelfQueryExtractor() result = extractor.extract("Samsung smartphones under $500 with good ratings") print(f"Semantic search: {result.semantic_query}") # "Samsung smartphones" print(f"Filters: {result.filters}") # brand="Samsung", price_max=500, rating_min=4.0, category="electronics"

3. Integrate with the Retriever

DEVELOPERpython
from qdrant_client import QdrantClient from qdrant_client.models import Filter, FieldCondition, MatchValue, Range class SelfQueryRetriever: def __init__(self, collection: str): self.client = QdrantClient("localhost", port=6333) self.collection = collection self.extractor = SelfQueryExtractor() self.embedder = SentenceTransformer("BAAI/bge-m3") def search(self, query: str, top_k: int = 5) -> list[dict]: # 1. Extract semantic query and filters extracted = self.extractor.extract(query) # 2. Build Qdrant filters qdrant_filter = self._build_filter(extracted.filters) # 3. Encode semantic query query_embedding = self.embedder.encode(extracted.semantic_query) # 4. Combined search results = self.client.search( collection_name=self.collection, query_vector=query_embedding.tolist(), query_filter=qdrant_filter, limit=top_k ) return [ { "content": hit.payload["content"], "metadata": hit.payload, "score": hit.score, "extracted_filters": extracted.filters.model_dump(exclude_none=True) } for hit in results ] def _build_filter(self, filters: ProductMetadata) -> Filter: conditions = [] if filters.brand: conditions.append( FieldCondition(key="brand", match=MatchValue(value=filters.brand)) ) if filters.category: conditions.append( FieldCondition(key="category", match=MatchValue(value=filters.category)) ) if filters.price_max: conditions.append( FieldCondition(key="price", range=Range(lte=filters.price_max)) ) if filters.price_min: conditions.append( FieldCondition(key="price", range=Range(gte=filters.price_min)) ) if filters.year: conditions.append( FieldCondition(key="year", match=MatchValue(value=filters.year)) ) if filters.in_stock is not None: conditions.append( FieldCondition(key="in_stock", match=MatchValue(value=filters.in_stock)) ) if filters.rating_min: conditions.append( FieldCondition(key="rating", range=Range(gte=filters.rating_min)) ) return Filter(must=conditions) if conditions else None

Handling Complex Queries

Logical Operators (OR, NOT)

DEVELOPERpython
class AdvancedSelfQueryOutput(BaseModel): semantic_query: str must_filters: list[dict] = Field(default_factory=list, description="AND conditions") should_filters: list[dict] = Field(default_factory=list, description="OR conditions") must_not_filters: list[dict] = Field(default_factory=list, description="Exclusions") class AdvancedSelfQueryExtractor: def extract(self, query: str) -> AdvancedSelfQueryOutput: system_prompt = """Analyze the query and extract filters with their logic: - must_filters: mandatory conditions (AND) - should_filters: optional conditions (OR) - must_not_filters: exclusions (NOT) Example for "Apple or Dell laptops, not gaming, under $1500": { "semantic_query": "Laptops", "must_filters": [{"price_max": 1500}], "should_filters": [{"brand": "Apple"}, {"brand": "Dell"}], "must_not_filters": [{"category": "gaming"}] }""" # ... similar LLM call def build_advanced_filter(extracted: AdvancedSelfQueryOutput) -> Filter: """Build a Qdrant filter with logical operators""" must_conditions = [ _condition_to_qdrant(f) for f in extracted.must_filters ] should_conditions = [ _condition_to_qdrant(f) for f in extracted.should_filters ] must_not_conditions = [ _condition_to_qdrant(f) for f in extracted.must_not_filters ] return Filter( must=must_conditions if must_conditions else None, should=should_conditions if should_conditions else None, must_not=must_not_conditions if must_not_conditions else None )

Relative Temporal Filters

DEVELOPERpython
from datetime import datetime, timedelta class TemporalFilters(BaseModel): """Handling relative temporal expressions""" after_date: Optional[datetime] = None before_date: Optional[datetime] = None relative_period: Optional[str] = Field( None, description="Relative period: 'last_week', 'last_month', 'last_year', 'this_year'" ) def resolve_temporal_filter(period: str) -> tuple[datetime, datetime]: """Convert relative period to absolute dates""" now = datetime.now() periods = { "today": (now.replace(hour=0, minute=0), now), "yesterday": (now - timedelta(days=1), now - timedelta(days=1)), "last_week": (now - timedelta(weeks=1), now), "last_month": (now - timedelta(days=30), now), "last_year": (now - timedelta(days=365), now), "this_year": (datetime(now.year, 1, 1), now), "this_month": (datetime(now.year, now.month, 1), now), } return periods.get(period, (None, None))

Advanced Optimizations

Extraction Caching

DEVELOPERpython
import hashlib from functools import lru_cache class CachedSelfQueryExtractor: def __init__(self, cache_size: int = 1000): self.base_extractor = SelfQueryExtractor() self._cache = {} self.cache_size = cache_size def extract(self, query: str) -> SelfQueryOutput: # Normalize query for caching normalized = self._normalize(query) cache_key = hashlib.md5(normalized.encode()).hexdigest() if cache_key in self._cache: return self._cache[cache_key] result = self.base_extractor.extract(query) # Manage cache size if len(self._cache) >= self.cache_size: # Remove oldest entries oldest_key = next(iter(self._cache)) del self._cache[oldest_key] self._cache[cache_key] = result return result def _normalize(self, query: str) -> str: return query.lower().strip()

Local Extraction with Fine-tuned Model

To reduce latency and costs, use a local model:

DEVELOPERpython
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer class LocalSelfQueryExtractor: def __init__(self, model_path: str = "your-org/self-query-extractor"): self.tokenizer = AutoTokenizer.from_pretrained(model_path) self.model = AutoModelForSeq2SeqLM.from_pretrained(model_path) def extract(self, query: str) -> SelfQueryOutput: inputs = self.tokenizer( f"extract filters: {query}", return_tensors="pt", max_length=256, truncation=True ) outputs = self.model.generate( **inputs, max_length=128, num_beams=4 ) result_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True) return self._parse_output(result_text) def _parse_output(self, text: str) -> SelfQueryOutput: # Parse the structured format generated by the model # Expected format: "query: ... | brand: ... | price_max: ..." parts = dict(p.split(": ") for p in text.split(" | ")) return SelfQueryOutput( semantic_query=parts.get("query", ""), filters=ProductMetadata(**{k: v for k, v in parts.items() if k != "query"}) )

Validation and Fallback

DEVELOPERpython
class RobustSelfQueryRetriever: def __init__(self, collection: str): self.extractor = SelfQueryExtractor() self.retriever = VectorRetriever(collection) def search(self, query: str, top_k: int = 5) -> list[dict]: try: # Attempt self-query extraction extracted = self.extractor.extract(query) # Validate extracted filters if not self._validate_filters(extracted.filters): raise ValueError("Invalid filters") # Search with filters results = self.retriever.search( query=extracted.semantic_query, filters=extracted.filters, top_k=top_k ) # Check we have results if len(results) < 2: # Fallback: relax filters results = self._search_with_relaxed_filters( extracted, top_k ) return results except Exception as e: # Fallback: pure vector search print(f"Self-query failed: {e}, falling back to vector search") return self.retriever.search(query=query, top_k=top_k) def _validate_filters(self, filters: ProductMetadata) -> bool: """Validate filter consistency""" if filters.price_min and filters.price_max: if filters.price_min > filters.price_max: return False if filters.rating_min and (filters.rating_min < 1 or filters.rating_min > 5): return False return True def _search_with_relaxed_filters( self, extracted: SelfQueryOutput, top_k: int ) -> list[dict]: """Progressively relax filters if no results""" filters = extracted.filters.model_copy() # Relaxation order: price → date → rating → brand relaxation_order = ["price_max", "price_min", "year", "rating_min", "brand"] for field in relaxation_order: setattr(filters, field, None) results = self.retriever.search( query=extracted.semantic_query, filters=filters, top_k=top_k ) if len(results) >= 2: return results # Last resort: no filters return self.retriever.search(query=extracted.semantic_query, top_k=top_k)

Monitoring and Improvement

DEVELOPERpython
class SelfQueryAnalytics: def __init__(self, analytics_client): self.analytics = analytics_client def log_extraction( self, original_query: str, extracted: SelfQueryOutput, results_count: int, latency_ms: float ): self.analytics.track("self_query_extraction", { "original_query": original_query, "semantic_query": extracted.semantic_query, "filters_count": len([f for f in extracted.filters.model_dump().values() if f]), "results_count": results_count, "latency_ms": latency_ms, "timestamp": datetime.now().isoformat() }) def get_common_filters(self, days: int = 7) -> dict: """Identify most used filters""" extractions = self.analytics.query("self_query_extraction", days=days) filter_counts = {} for e in extractions: for field, value in e.get("filters", {}).items(): if value: filter_counts[field] = filter_counts.get(field, 0) + 1 return sorted(filter_counts.items(), key=lambda x: x[1], reverse=True)

Next Steps

Self-query retrieval transforms complex queries into structured searches. To go further:


Intelligent Self-Query with Ailog

Ailog implements self-query retrieval automatically:

  • Intelligent filter extraction based on your metadata
  • Automatic fallback if filters are too restrictive
  • Optimized caching for frequent queries
  • Integrated monitoring to improve extraction

Try for free and transform your complex queries into precise searches.

Tags

ragretrievalself-queryllmstructured filters

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !