Self-Query Retrieval: Let the LLM Structure the Search

Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

Self-query retrieval uses an LLM to transform a natural language query into a combination of semantic search and structured filters. Instead of searching "Samsung smartphones under 500 dollars", the system automatically understands: search for "Samsung smartphones" + filter price < 500. This guide explores this powerful technique and its implementation.

The Problem with Complex Queries

User queries often mix semantic intent with factual constraints:

"Machine learning articles published in 2024 by French authors"

Decomposition:
├── Semantic search: "machine learning"
├── Date filter: year = 2024
└── Author filter: country = "France"

Pure vector search cannot efficiently handle these constraints. Self-query retrieval solves this problem.

How Self-Query Works

┌─────────────────────────────────────────────────────────────┐
│                   Self-Query Pipeline                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  User Query                                                  │
│  "Apple products < $1000 released this year"                │
│                        │                                     │
│                        ▼                                     │
│                 ┌─────────────┐                             │
│                 │     LLM     │                             │
│                 │  Extractor  │                             │
│                 └─────────────┘                             │
│                        │                                     │
│          ┌─────────────┴─────────────┐                      │
│          ▼                           ▼                      │
│   ┌─────────────┐           ┌─────────────┐                │
│   │  Semantic   │           │  Structured │                │
│   │   Query     │           │   Filters   │                │
│   │ "Apple      │           │ brand=Apple │                │
│   │  products"  │           │ price<1000  │                │
│   └─────────────┘           │ year=2024   │                │
│          │                  └─────────────┘                │
│          │                         │                        │
│          └───────────┬─────────────┘                        │
│                      ▼                                      │
│              ┌─────────────┐                                │
│              │  Combined   │                                │
│              │   Search    │                                │
│              └─────────────┘                                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Basic Implementation

1. Define the Metadata Schema

DEVELOPERpython
from pydantic import BaseModel, Field
from typing import Optional, Literal
from enum import Enum

class ProductCategory(str, Enum):
    ELECTRONICS = "electronics"
    CLOTHING = "clothing"
    HOME = "home"
    SPORTS = "sports"

class ProductMetadata(BaseModel):
    """Schema of available metadata for filtering"""

    brand: Optional[str] = Field(
        None,
        description="Product brand (e.g., Apple, Samsung, Nike)"
    )
    category: Optional[ProductCategory] = Field(
        None,
        description="Product category"
    )
    price_min: Optional[float] = Field(
        None,
        description="Minimum price in dollars"
    )
    price_max: Optional[float] = Field(
        None,
        description="Maximum price in dollars"
    )
    year: Optional[int] = Field(
        None,
        description="Product release year"
    )
    in_stock: Optional[bool] = Field(
        None,
        description="Stock availability"
    )
    rating_min: Optional[float] = Field(
        None,
        description="Minimum rating (1-5)"
    )


class SelfQueryOutput(BaseModel):
    """LLM self-query output"""

    semantic_query: str = Field(
        description="The part of the query for semantic search"
    )
    filters: ProductMetadata = Field(
        default_factory=ProductMetadata,
        description="Structured filters extracted from query"
    )

2. Create the LLM Extractor

DEVELOPERpython
from openai import OpenAI
import json

class SelfQueryExtractor:
    def __init__(self):
        self.client = OpenAI()
        self.schema = ProductMetadata.model_json_schema()

    def extract(self, query: str) -> SelfQueryOutput:
        system_prompt = f"""You are a query extractor. Analyze the user question and extract:
1. The semantic part (what we're conceptually searching for)
2. Structured filters (factual constraints)

Available filter schema:
{json.dumps(self.schema, indent=2)}

Rules:
- Don't invent filters not present in the query
- For prices, use price_min and/or price_max
- For "this year", use year: 2024
- For "recent", use year: 2023 or 2024

Respond in JSON format:
{{
  "semantic_query": "description of what we're looking for",
  "filters": {{
    "brand": "...",
    "category": "...",
    ...
  }}
}}"""

        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ],
            temperature=0,
            response_format={"type": "json_object"}
        )

        result = json.loads(response.choices[0].message.content)
        return SelfQueryOutput(**result)


# Test
extractor = SelfQueryExtractor()
result = extractor.extract("Samsung smartphones under $500 with good ratings")

print(f"Semantic search: {result.semantic_query}")
# "Samsung smartphones"

print(f"Filters: {result.filters}")
# brand="Samsung", price_max=500, rating_min=4.0, category="electronics"

3. Integrate with the Retriever

DEVELOPERpython
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

class SelfQueryRetriever:
    def __init__(self, collection: str):
        self.client = QdrantClient("localhost", port=6333)
        self.collection = collection
        self.extractor = SelfQueryExtractor()
        self.embedder = SentenceTransformer("BAAI/bge-m3")

    def search(self, query: str, top_k: int = 5) -> list[dict]:
        # 1. Extract semantic query and filters
        extracted = self.extractor.extract(query)

        # 2. Build Qdrant filters
        qdrant_filter = self._build_filter(extracted.filters)

        # 3. Encode semantic query
        query_embedding = self.embedder.encode(extracted.semantic_query)

        # 4. Combined search
        results = self.client.search(
            collection_name=self.collection,
            query_vector=query_embedding.tolist(),
            query_filter=qdrant_filter,
            limit=top_k
        )

        return [
            {
                "content": hit.payload["content"],
                "metadata": hit.payload,
                "score": hit.score,
                "extracted_filters": extracted.filters.model_dump(exclude_none=True)
            }
            for hit in results
        ]

    def _build_filter(self, filters: ProductMetadata) -> Filter:
        conditions = []

        if filters.brand:
            conditions.append(
                FieldCondition(key="brand", match=MatchValue(value=filters.brand))
            )

        if filters.category:
            conditions.append(
                FieldCondition(key="category", match=MatchValue(value=filters.category))
            )

        if filters.price_max:
            conditions.append(
                FieldCondition(key="price", range=Range(lte=filters.price_max))
            )

        if filters.price_min:
            conditions.append(
                FieldCondition(key="price", range=Range(gte=filters.price_min))
            )

        if filters.year:
            conditions.append(
                FieldCondition(key="year", match=MatchValue(value=filters.year))
            )

        if filters.in_stock is not None:
            conditions.append(
                FieldCondition(key="in_stock", match=MatchValue(value=filters.in_stock))
            )

        if filters.rating_min:
            conditions.append(
                FieldCondition(key="rating", range=Range(gte=filters.rating_min))
            )

        return Filter(must=conditions) if conditions else None

Handling Complex Queries

Logical Operators (OR, NOT)

DEVELOPERpython
class AdvancedSelfQueryOutput(BaseModel):
    semantic_query: str
    must_filters: list[dict] = Field(default_factory=list, description="AND conditions")
    should_filters: list[dict] = Field(default_factory=list, description="OR conditions")
    must_not_filters: list[dict] = Field(default_factory=list, description="Exclusions")


class AdvancedSelfQueryExtractor:
    def extract(self, query: str) -> AdvancedSelfQueryOutput:
        system_prompt = """Analyze the query and extract filters with their logic:
- must_filters: mandatory conditions (AND)
- should_filters: optional conditions (OR)
- must_not_filters: exclusions (NOT)

Example for "Apple or Dell laptops, not gaming, under $1500":
{
  "semantic_query": "Laptops",
  "must_filters": [{"price_max": 1500}],
  "should_filters": [{"brand": "Apple"}, {"brand": "Dell"}],
  "must_not_filters": [{"category": "gaming"}]
}"""

        # ... similar LLM call


def build_advanced_filter(extracted: AdvancedSelfQueryOutput) -> Filter:
    """Build a Qdrant filter with logical operators"""

    must_conditions = [
        _condition_to_qdrant(f) for f in extracted.must_filters
    ]

    should_conditions = [
        _condition_to_qdrant(f) for f in extracted.should_filters
    ]

    must_not_conditions = [
        _condition_to_qdrant(f) for f in extracted.must_not_filters
    ]

    return Filter(
        must=must_conditions if must_conditions else None,
        should=should_conditions if should_conditions else None,
        must_not=must_not_conditions if must_not_conditions else None
    )

Relative Temporal Filters

DEVELOPERpython
from datetime import datetime, timedelta

class TemporalFilters(BaseModel):
    """Handling relative temporal expressions"""

    after_date: Optional[datetime] = None
    before_date: Optional[datetime] = None
    relative_period: Optional[str] = Field(
        None,
        description="Relative period: 'last_week', 'last_month', 'last_year', 'this_year'"
    )


def resolve_temporal_filter(period: str) -> tuple[datetime, datetime]:
    """Convert relative period to absolute dates"""
    now = datetime.now()

    periods = {
        "today": (now.replace(hour=0, minute=0), now),
        "yesterday": (now - timedelta(days=1), now - timedelta(days=1)),
        "last_week": (now - timedelta(weeks=1), now),
        "last_month": (now - timedelta(days=30), now),
        "last_year": (now - timedelta(days=365), now),
        "this_year": (datetime(now.year, 1, 1), now),
        "this_month": (datetime(now.year, now.month, 1), now),
    }

    return periods.get(period, (None, None))

Advanced Optimizations

Extraction Caching

DEVELOPERpython
import hashlib
from functools import lru_cache

class CachedSelfQueryExtractor:
    def __init__(self, cache_size: int = 1000):
        self.base_extractor = SelfQueryExtractor()
        self._cache = {}
        self.cache_size = cache_size

    def extract(self, query: str) -> SelfQueryOutput:
        # Normalize query for caching
        normalized = self._normalize(query)
        cache_key = hashlib.md5(normalized.encode()).hexdigest()

        if cache_key in self._cache:
            return self._cache[cache_key]

        result = self.base_extractor.extract(query)

        # Manage cache size
        if len(self._cache) >= self.cache_size:
            # Remove oldest entries
            oldest_key = next(iter(self._cache))
            del self._cache[oldest_key]

        self._cache[cache_key] = result
        return result

    def _normalize(self, query: str) -> str:
        return query.lower().strip()

Local Extraction with Fine-tuned Model

To reduce latency and costs, use a local model:

DEVELOPERpython
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

class LocalSelfQueryExtractor:
    def __init__(self, model_path: str = "your-org/self-query-extractor"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

    def extract(self, query: str) -> SelfQueryOutput:
        inputs = self.tokenizer(
            f"extract filters: {query}",
            return_tensors="pt",
            max_length=256,
            truncation=True
        )

        outputs = self.model.generate(
            **inputs,
            max_length=128,
            num_beams=4
        )

        result_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return self._parse_output(result_text)

    def _parse_output(self, text: str) -> SelfQueryOutput:
        # Parse the structured format generated by the model
        # Expected format: "query: ... | brand: ... | price_max: ..."
        parts = dict(p.split(": ") for p in text.split(" | "))
        return SelfQueryOutput(
            semantic_query=parts.get("query", ""),
            filters=ProductMetadata(**{k: v for k, v in parts.items() if k != "query"})
        )

Validation and Fallback

DEVELOPERpython
class RobustSelfQueryRetriever:
    def __init__(self, collection: str):
        self.extractor = SelfQueryExtractor()
        self.retriever = VectorRetriever(collection)

    def search(self, query: str, top_k: int = 5) -> list[dict]:
        try:
            # Attempt self-query extraction
            extracted = self.extractor.extract(query)

            # Validate extracted filters
            if not self._validate_filters(extracted.filters):
                raise ValueError("Invalid filters")

            # Search with filters
            results = self.retriever.search(
                query=extracted.semantic_query,
                filters=extracted.filters,
                top_k=top_k
            )

            # Check we have results
            if len(results) < 2:
                # Fallback: relax filters
                results = self._search_with_relaxed_filters(
                    extracted, top_k
                )

            return results

        except Exception as e:
            # Fallback: pure vector search
            print(f"Self-query failed: {e}, falling back to vector search")
            return self.retriever.search(query=query, top_k=top_k)

    def _validate_filters(self, filters: ProductMetadata) -> bool:
        """Validate filter consistency"""
        if filters.price_min and filters.price_max:
            if filters.price_min > filters.price_max:
                return False

        if filters.rating_min and (filters.rating_min < 1 or filters.rating_min > 5):
            return False

        return True

    def _search_with_relaxed_filters(
        self,
        extracted: SelfQueryOutput,
        top_k: int
    ) -> list[dict]:
        """Progressively relax filters if no results"""
        filters = extracted.filters.model_copy()

        # Relaxation order: price → date → rating → brand
        relaxation_order = ["price_max", "price_min", "year", "rating_min", "brand"]

        for field in relaxation_order:
            setattr(filters, field, None)
            results = self.retriever.search(
                query=extracted.semantic_query,
                filters=filters,
                top_k=top_k
            )
            if len(results) >= 2:
                return results

        # Last resort: no filters
        return self.retriever.search(query=extracted.semantic_query, top_k=top_k)

Monitoring and Improvement

DEVELOPERpython
class SelfQueryAnalytics:
    def __init__(self, analytics_client):
        self.analytics = analytics_client

    def log_extraction(
        self,
        original_query: str,
        extracted: SelfQueryOutput,
        results_count: int,
        latency_ms: float
    ):
        self.analytics.track("self_query_extraction", {
            "original_query": original_query,
            "semantic_query": extracted.semantic_query,
            "filters_count": len([f for f in extracted.filters.model_dump().values() if f]),
            "results_count": results_count,
            "latency_ms": latency_ms,
            "timestamp": datetime.now().isoformat()
        })

    def get_common_filters(self, days: int = 7) -> dict:
        """Identify most used filters"""
        extractions = self.analytics.query("self_query_extraction", days=days)

        filter_counts = {}
        for e in extractions:
            for field, value in e.get("filters", {}).items():
                if value:
                    filter_counts[field] = filter_counts.get(field, 0) + 1

        return sorted(filter_counts.items(), key=lambda x: x[1], reverse=True)

Next Steps

Self-query retrieval transforms complex queries into structured searches. To go further:

Metadata Filtering - Master advanced filters
Query Routing - Route to the right sources
Contextual Compression - Extract the essential

Intelligent Self-Query with Ailog

Ailog implements self-query retrieval automatically:

Intelligent filter extraction based on your metadata
Automatic fallback if filters are too restrictive
Optimized caching for frequent queries
Integrated monitoring to improve extraction

Try for free and transform your complex queries into precise searches.

Self-Query Retrieval: Let the LLM Structure the Search

Self-Query Retrieval: Let the LLM Structure the Search

The Problem with Complex Queries

How Self-Query Works

Basic Implementation

1. Define the Metadata Schema

2. Create the LLM Extractor

3. Integrate with the Retriever

Handling Complex Queries

Logical Operators (OR, NOT)

Relative Temporal Filters

Advanced Optimizations

Extraction Caching

Local Extraction with Fine-tuned Model

Validation and Fallback

Monitoring and Improvement

Next Steps

Intelligent Self-Query with Ailog

Tags

Related Posts

Contextual Compression: Extract the Essential from Documents

Query Routing: Direct Queries to the Right Source

Metadata Filtering: Refine RAG Search

Ailog Assistant