Sensitive Data: Filtering and Protecting Information

Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

RAG systems handle large volumes of data, some of which may be sensitive. This guide presents techniques to identify, filter, and protect this critical information throughout the RAG pipeline.

Prerequisites: First consult the fundamentals of RAG and our guide on RAG security and compliance.

What is Sensitive Data?

Definition and Categories

Sensitive data in a RAG context is divided into several categories:

Category	Examples	Disclosure Risk
PII (Personally Identifiable Information)	Name, email, phone, address	High
Financial data	Card numbers, IBAN, salaries	Very high
Health data	Diagnoses, treatments, allergies	Very high
Government identifiers	Social security number, passport	Critical
Authentication data	Passwords, API tokens, keys	Critical
Business data	Strategies, contracts, pricing	High
Biometric data	Fingerprints, facial recognition	Very high

Exposure Points in a RAG Pipeline

┌─────────────────────────────────────────────────────────────────┐
│                    RAG PIPELINE - EXPOSURE POINTS               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. INGESTION        2. STORAGE          3. RETRIEVAL          │
│  ┌─────────┐         ┌─────────┐         ┌─────────┐          │
│  │   Raw   │ ──────► │Embeddings│ ──────► │ Relevant│          │
│  │  docs   │         │Vector DB │         │ chunks  │          │
│  └────┬────┘         └────┬────┘         └────┬────┘          │
│       │                   │                   │                │
│   RISK:                RISK:               RISK:              │
│   PII in               Leak via            Exposure           │
│   unfiltered docs      embedding           in context         │
│                        inversion                              │
│                                                                │
│  4. GENERATION       5. OUTPUT           6. LOGS              │
│  ┌─────────┐         ┌─────────┐         ┌─────────┐          │
│  │   LLM   │ ──────► │  Final  │ ──────► │ Session │          │
│  │ prompt  │         │ response│         │ history │          │
│  └────┬────┘         └────┬────┘         └────┬────┘          │
│       │                   │                   │                │
│   RISK:                RISK:               RISK:              │
│   Prompt               PII                 Unencrypted        │
│   injection            hallucination       storage            │
│                                                                │
└─────────────────────────────────────────────────────────────────┘

Detecting Sensitive Data

Regular Expression Detection

DEVELOPERpython
import re
from typing import Dict, List, Tuple, Any
from dataclasses import dataclass
from enum import Enum

class SensitiveDataType(Enum):
    """Detectable sensitive data types."""
    EMAIL = "email"
    PHONE_US = "phone_us"
    PHONE_INTL = "phone_intl"
    CREDIT_CARD = "credit_card"
    IBAN = "iban"
    SSN_US = "ssn_us"
    SSN_UK = "ssn_uk"  # National Insurance Number
    PASSPORT = "passport"
    IP_ADDRESS = "ip_address"
    API_KEY = "api_key"
    PASSWORD = "password"
    DATE_OF_BIRTH = "date_of_birth"
    ADDRESS = "address"

@dataclass
class DetectionResult:
    """Sensitive data detection result."""
    data_type: SensitiveDataType
    value: str
    start_pos: int
    end_pos: int
    confidence: float
    context: str

class RegexSensitiveDetector:
    """Regex-based sensitive data detector."""

    PATTERNS = {
        SensitiveDataType.EMAIL: {
            "pattern": r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
            "confidence": 0.95
        },
        SensitiveDataType.PHONE_US: {
            "pattern": r'\b(?:\+1\s?)?(?:\([0-9]{3}\)|[0-9]{3})[\s.-]?[0-9]{3}[\s.-]?[0-9]{4}\b',
            "confidence": 0.90
        },
        SensitiveDataType.PHONE_INTL: {
            "pattern": r'\b\+?[1-9]\d{1,14}\b',
            "confidence": 0.70
        },
        SensitiveDataType.CREDIT_CARD: {
            "pattern": r'\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})\b',
            "confidence": 0.98
        },
        SensitiveDataType.IBAN: {
            "pattern": r'\b[A-Z]{2}[0-9]{2}(?:\s?[A-Z0-9]{4}){4,7}\b',
            "confidence": 0.95
        },
        SensitiveDataType.SSN_US: {
            "pattern": r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b',
            "confidence": 0.90
        },
        SensitiveDataType.SSN_UK: {
            "pattern": r'\b[A-CEGHJ-PR-TW-Z]{2}\s?\d{2}\s?\d{2}\s?\d{2}\s?[A-D]?\b',
            "confidence": 0.90
        },
        SensitiveDataType.IP_ADDRESS: {
            "pattern": r'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b',
            "confidence": 0.99
        },
        SensitiveDataType.API_KEY: {
            "pattern": r'\b(?:sk|pk|api|key|token|secret|auth)[_-]?[A-Za-z0-9]{20,}\b',
            "confidence": 0.85
        },
        SensitiveDataType.DATE_OF_BIRTH: {
            "pattern": r'\b(?:0[1-9]|1[0-2])[-/.](?:0[1-9]|[12][0-9]|3[01])[-/.](?:19|20)\d{2}\b',
            "confidence": 0.80
        }
    }

    def __init__(self, enabled_types: List[SensitiveDataType] = None):
        """Initialize detector with types to detect."""
        self.enabled_types = enabled_types or list(SensitiveDataType)
        self._compile_patterns()

    def _compile_patterns(self):
        """Compile regex patterns for performance."""
        self.compiled_patterns = {}
        for data_type, config in self.PATTERNS.items():
            if data_type in self.enabled_types:
                self.compiled_patterns[data_type] = {
                    "regex": re.compile(config["pattern"], re.IGNORECASE),
                    "confidence": config["confidence"]
                }

    def detect(self, text: str, context_chars: int = 50) -> List[DetectionResult]:
        """Detect sensitive data in text."""
        results = []

        for data_type, config in self.compiled_patterns.items():
            for match in config["regex"].finditer(text):
                # Extract context around detection
                start_ctx = max(0, match.start() - context_chars)
                end_ctx = min(len(text), match.end() + context_chars)

                results.append(DetectionResult(
                    data_type=data_type,
                    value=match.group(),
                    start_pos=match.start(),
                    end_pos=match.end(),
                    confidence=config["confidence"],
                    context=text[start_ctx:end_ctx]
                ))

        return results

    def detect_in_documents(
        self,
        documents: List[Dict[str, Any]]
    ) -> Dict[str, List[DetectionResult]]:
        """Detect sensitive data in a list of documents."""
        results = {}

        for doc in documents:
            doc_id = doc.get("id", doc.get("_id", str(hash(str(doc)))))
            content = doc.get("content", doc.get("text", ""))

            detections = self.detect(content)
            if detections:
                results[doc_id] = detections

        return results

Machine Learning Detection (NER)

DEVELOPERpython
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
from typing import List, Dict, Any
import torch

class NERSensitiveDetector:
    """NER-based sensitive data detector (Named Entity Recognition)."""

    # NER entity to type mapping
    ENTITY_MAPPING = {
        "PER": SensitiveDataType.EMAIL,  # Persons -> potentially sensitive
        "LOC": SensitiveDataType.ADDRESS,
        "ORG": None,  # Organizations not always sensitive
        "MISC": None
    }

    def __init__(self, model_name: str = "dslim/bert-base-NER"):
        """
        Initialize NER detector.

        Args:
            model_name: NER model to use
        """
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForTokenClassification.from_pretrained(model_name)
        self.ner_pipeline = pipeline(
            "ner",
            model=self.model,
            tokenizer=self.tokenizer,
            aggregation_strategy="simple"
        )

    def detect(self, text: str, min_confidence: float = 0.8) -> List[Dict[str, Any]]:
        """Detect sensitive named entities."""
        entities = self.ner_pipeline(text)

        sensitive_entities = []
        for entity in entities:
            if entity["score"] >= min_confidence:
                entity_type = entity["entity_group"]

                # Check if it's a sensitive entity
                if entity_type in ["PER", "LOC"]:  # Persons and locations
                    sensitive_entities.append({
                        "text": entity["word"],
                        "type": entity_type,
                        "confidence": entity["score"],
                        "start": entity["start"],
                        "end": entity["end"],
                        "is_sensitive": True
                    })

        return sensitive_entities

    def detect_pii_names(self, text: str) -> List[str]:
        """Specifically detect person names."""
        entities = self.ner_pipeline(text)
        return [
            e["word"] for e in entities
            if e["entity_group"] == "PER" and e["score"] > 0.85
        ]


class HybridSensitiveDetector:
    """Combines regex and NER detection for better coverage."""

    def __init__(self):
        self.regex_detector = RegexSensitiveDetector()
        self.ner_detector = NERSensitiveDetector()

    def detect(self, text: str) -> Dict[str, Any]:
        """Hybrid detection of sensitive data."""
        # Regex detection (fast, precise for known patterns)
        regex_results = self.regex_detector.detect(text)

        # NER detection (for names, locations, organizations)
        ner_results = self.ner_detector.detect(text)

        # Merge and deduplicate
        all_detections = self._merge_results(regex_results, ner_results)

        return {
            "total_detections": len(all_detections),
            "by_type": self._group_by_type(all_detections),
            "detections": all_detections,
            "risk_level": self._calculate_risk_level(all_detections)
        }

    def _calculate_risk_level(self, detections: List) -> str:
        """Calculate overall risk level."""
        if not detections:
            return "low"

        high_risk_types = {
            SensitiveDataType.CREDIT_CARD,
            SensitiveDataType.SSN_US,
            SensitiveDataType.SSN_UK,
            SensitiveDataType.API_KEY
        }

        for detection in detections:
            if hasattr(detection, 'data_type') and detection.data_type in high_risk_types:
                return "critical"

        if len(detections) > 10:
            return "high"
        elif len(detections) > 3:
            return "medium"
        return "low"

Filtering and Protection Techniques

Redaction (Masking)

DEVELOPERpython
from typing import Dict, List, Callable
import hashlib

class SensitiveDataRedactor:
    """Masks detected sensitive data."""

    REDACTION_STRATEGIES = {
        "full": lambda x, t: f"[{t.value.upper()}_REDACTED]",
        "partial": lambda x, t: x[:2] + "*" * (len(x) - 4) + x[-2:] if len(x) > 4 else "****",
        "hash": lambda x, t: hashlib.sha256(x.encode()).hexdigest()[:12],
        "placeholder": lambda x, t: f"<{t.value}>",
        "category": lambda x, t: f"[{t.value.upper()}_DATA]"
    }

    def __init__(
        self,
        detector: RegexSensitiveDetector,
        default_strategy: str = "full"
    ):
        self.detector = detector
        self.default_strategy = default_strategy
        self.type_strategies: Dict[SensitiveDataType, str] = {}

    def set_strategy(self, data_type: SensitiveDataType, strategy: str):
        """Set masking strategy for a specific type."""
        if strategy not in self.REDACTION_STRATEGIES:
            raise ValueError(f"Unknown strategy: {strategy}")
        self.type_strategies[data_type] = strategy

    def redact(
        self,
        text: str,
        return_mapping: bool = False
    ) -> Dict[str, Any]:
        """
        Mask sensitive data in text.

        Args:
            text: Text to process
            return_mapping: If True, return original -> masked mapping

        Returns:
            Dict with masked text and metadata
        """
        detections = self.detector.detect(text)

        if not detections:
            return {
                "redacted_text": text,
                "redaction_count": 0,
                "mapping": {} if return_mapping else None
            }

        # Sort by descending position to replace from end to beginning
        detections_sorted = sorted(detections, key=lambda x: x.start_pos, reverse=True)

        redacted_text = text
        mapping = {}

        for detection in detections_sorted:
            strategy_name = self.type_strategies.get(
                detection.data_type,
                self.default_strategy
            )
            strategy = self.REDACTION_STRATEGIES[strategy_name]

            replacement = strategy(detection.value, detection.data_type)

            if return_mapping:
                mapping[detection.value] = replacement

            redacted_text = (
                redacted_text[:detection.start_pos] +
                replacement +
                redacted_text[detection.end_pos:]
            )

        return {
            "redacted_text": redacted_text,
            "redaction_count": len(detections),
            "types_redacted": list(set(d.data_type.value for d in detections)),
            "mapping": mapping if return_mapping else None
        }

    def redact_documents(
        self,
        documents: List[Dict[str, Any]],
        content_field: str = "content"
    ) -> List[Dict[str, Any]]:
        """Mask sensitive data in a list of documents."""
        redacted_docs = []

        for doc in documents:
            redacted_doc = doc.copy()
            content = doc.get(content_field, "")

            result = self.redact(content)
            redacted_doc[content_field] = result["redacted_text"]
            redacted_doc["_redaction_metadata"] = {
                "count": result["redaction_count"],
                "types": result["types_redacted"]
            }

            redacted_docs.append(redacted_doc)

        return redacted_docs

Selective Encryption

DEVELOPERpython
from cryptography.fernet import Fernet
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
import base64
import json
from typing import Dict, Any, Optional

class SelectiveEncryption:
    """Selectively encrypts sensitive data."""

    def __init__(self, master_key: str, salt: bytes = None):
        """
        Initialize encryption system.

        Args:
            master_key: Master key for key derivation
            salt: Salt for derivation (generated if not provided)
        """
        self.salt = salt or b'ailog_sensitive_v1'
        self.cipher = self._derive_cipher(master_key)
        self.detector = RegexSensitiveDetector()

    def _derive_cipher(self, master_key: str) -> Fernet:
        """Derive Fernet key from master key."""
        kdf = PBKDF2HMAC(
            algorithm=hashes.SHA256(),
            length=32,
            salt=self.salt,
            iterations=100000,
        )
        key = base64.urlsafe_b64encode(kdf.derive(master_key.encode()))
        return Fernet(key)

    def encrypt_sensitive_fields(
        self,
        document: Dict[str, Any],
        fields_to_check: List[str]
    ) -> Dict[str, Any]:
        """
        Encrypt fields containing sensitive data.

        Args:
            document: Document to process
            fields_to_check: Fields to analyze

        Returns:
            Document with sensitive fields encrypted
        """
        encrypted_doc = document.copy()
        encrypted_fields = []

        for field in fields_to_check:
            if field not in document:
                continue

            value = document[field]
            if isinstance(value, str):
                detections = self.detector.detect(value)

                if detections:
                    # Encrypt entire field
                    encrypted_value = self.cipher.encrypt(value.encode()).decode()
                    encrypted_doc[field] = encrypted_value
                    encrypted_fields.append({
                        "field": field,
                        "detection_count": len(detections),
                        "types": [d.data_type.value for d in detections]
                    })

        encrypted_doc["_encryption_metadata"] = {
            "encrypted_fields": encrypted_fields,
            "algorithm": "Fernet (AES-128-CBC)",
            "encrypted_at": datetime.utcnow().isoformat()
        }

        return encrypted_doc

    def decrypt_field(self, encrypted_value: str) -> str:
        """Decrypt an encrypted field."""
        return self.cipher.decrypt(encrypted_value.encode()).decode()

    def encrypt_inline(self, text: str) -> Dict[str, Any]:
        """
        Encrypt sensitive data inline in text.

        Useful for preserving context while protecting data.
        """
        detections = self.detector.detect(text)

        if not detections:
            return {"text": text, "encrypted_values": {}}

        detections_sorted = sorted(detections, key=lambda x: x.start_pos, reverse=True)

        encrypted_text = text
        encrypted_mapping = {}

        for i, detection in enumerate(detections_sorted):
            placeholder = f"[[ENCRYPTED_{i}]]"
            encrypted_value = self.cipher.encrypt(detection.value.encode()).decode()
            encrypted_mapping[placeholder] = {
                "encrypted": encrypted_value,
                "type": detection.data_type.value
            }

            encrypted_text = (
                encrypted_text[:detection.start_pos] +
                placeholder +
                encrypted_text[detection.end_pos:]
            )

        return {
            "text": encrypted_text,
            "encrypted_values": encrypted_mapping
        }

Anonymization (K-anonymity)

DEVELOPERpython
from typing import List, Dict, Any
from collections import defaultdict
import random

class DataAnonymizer:
    """Anonymizes data to guarantee k-anonymity."""

    def __init__(self, k: int = 5):
        """
        Initialize anonymizer.

        Args:
            k: K-anonymity level (each record must be
               indistinguishable from at least k-1 others)
        """
        self.k = k
        self.generalization_rules = self._default_generalization_rules()

    def _default_generalization_rules(self) -> Dict[str, Callable]:
        """Default generalization rules."""
        return {
            "age": lambda x: f"{(int(x) // 10) * 10}-{(int(x) // 10) * 10 + 9}",
            "zip_code": lambda x: x[:3] + "**" if len(x) >= 3 else "***",
            "date": lambda x: x[:7] if len(x) >= 7 else x[:4],  # YYYY-MM
            "city": lambda x: "Region " + self._get_region(x),
            "salary": lambda x: self._salary_range(float(x))
        }

    def _salary_range(self, salary: float) -> str:
        """Generalize salary to range."""
        ranges = [
            (0, 25000, "< 25k"),
            (25000, 35000, "25k-35k"),
            (35000, 50000, "35k-50k"),
            (50000, 75000, "50k-75k"),
            (75000, 100000, "75k-100k"),
            (100000, float('inf'), "> 100k")
        ]
        for low, high, label in ranges:
            if low <= salary < high:
                return label
        return "Unspecified"

    def generalize(
        self,
        data: List[Dict[str, Any]],
        quasi_identifiers: List[str]
    ) -> List[Dict[str, Any]]:
        """
        Generalize quasi-identifiers to achieve k-anonymity.

        Args:
            data: List of records
            quasi_identifiers: Fields that, combined, could identify

        Returns:
            Anonymized data
        """
        anonymized = []

        for record in data:
            anon_record = record.copy()

            for qi in quasi_identifiers:
                if qi in record and qi in self.generalization_rules:
                    try:
                        anon_record[qi] = self.generalization_rules[qi](record[qi])
                    except (ValueError, TypeError):
                        anon_record[qi] = "[GENERALIZED]"

            anonymized.append(anon_record)

        # Verify k-anonymity
        if not self._verify_k_anonymity(anonymized, quasi_identifiers):
            # Apply more aggressive generalization if needed
            anonymized = self._increase_generalization(anonymized, quasi_identifiers)

        return anonymized

    def _verify_k_anonymity(
        self,
        data: List[Dict[str, Any]],
        quasi_identifiers: List[str]
    ) -> bool:
        """Verify if data respects k-anonymity."""
        equivalence_classes = defaultdict(list)

        for record in data:
            # Create key based on quasi-identifiers
            key = tuple(str(record.get(qi, "")) for qi in quasi_identifiers)
            equivalence_classes[key].append(record)

        # Verify each class has at least k elements
        return all(len(records) >= self.k for records in equivalence_classes.values())

    def suppress_outliers(
        self,
        data: List[Dict[str, Any]],
        quasi_identifiers: List[str]
    ) -> List[Dict[str, Any]]:
        """Remove records that violate k-anonymity."""
        equivalence_classes = defaultdict(list)

        for record in data:
            key = tuple(str(record.get(qi, "")) for qi in quasi_identifiers)
            equivalence_classes[key].append(record)

        # Keep only classes with k+ elements
        valid_data = []
        for key, records in equivalence_classes.items():
            if len(records) >= self.k:
                valid_data.extend(records)

        return valid_data

Protection in the RAG Pipeline

Filtering at Ingestion

DEVELOPERpython
class SecureDocumentIngestion:
    """Secure ingestion pipeline with sensitive data filtering."""

    def __init__(
        self,
        vector_store,
        detector: HybridSensitiveDetector,
        redactor: SensitiveDataRedactor,
        policy: str = "redact"  # "redact", "reject", "encrypt", "warn"
    ):
        self.vector_store = vector_store
        self.detector = detector
        self.redactor = redactor
        self.policy = policy

    async def ingest_document(
        self,
        document: Dict[str, Any],
        force: bool = False
    ) -> Dict[str, Any]:
        """
        Ingest document applying security policies.

        Args:
            document: Document to ingest
            force: If True, ignore warnings

        Returns:
            Ingestion result
        """
        content = document.get("content", "")

        # Step 1: Detection
        detection_result = self.detector.detect(content)

        if detection_result["total_detections"] == 0:
            # No sensitive data, normal ingestion
            return await self._ingest_clean(document)

        # Step 2: Apply policy
        if self.policy == "reject":
            return {
                "status": "rejected",
                "reason": "Sensitive data detected",
                "detections": detection_result["total_detections"],
                "risk_level": detection_result["risk_level"]
            }

        elif self.policy == "redact":
            redacted = self.redactor.redact(content)
            document_clean = document.copy()
            document_clean["content"] = redacted["redacted_text"]
            document_clean["_original_had_sensitive"] = True

            result = await self._ingest_clean(document_clean)
            result["redaction_applied"] = True
            result["redaction_count"] = redacted["redaction_count"]
            return result

        elif self.policy == "warn":
            if not force:
                return {
                    "status": "warning",
                    "message": "Sensitive data detected, use force=True to continue",
                    "detections": detection_result
                }
            return await self._ingest_clean(document)

        elif self.policy == "encrypt":
            # Encrypt sensitive parts
            encrypted = self.encryption.encrypt_inline(content)
            document_enc = document.copy()
            document_enc["content"] = encrypted["text"]
            document_enc["_encrypted_values"] = encrypted["encrypted_values"]

            return await self._ingest_clean(document_enc)

    async def _ingest_clean(self, document: Dict[str, Any]) -> Dict[str, Any]:
        """Ingest cleaned document."""
        doc_id = await self.vector_store.add_document(document)
        return {
            "status": "success",
            "document_id": doc_id,
            "indexed_at": datetime.utcnow().isoformat()
        }

    async def bulk_ingest(
        self,
        documents: List[Dict[str, Any]],
        on_sensitive: str = "skip"  # "skip", "redact", "fail"
    ) -> Dict[str, Any]:
        """Bulk ingestion with sensitive document handling."""
        results = {
            "successful": 0,
            "skipped": 0,
            "redacted": 0,
            "failed": 0,
            "details": []
        }

        for doc in documents:
            try:
                doc_result = await self.ingest_document(doc)

                if doc_result["status"] == "success":
                    results["successful"] += 1
                    if doc_result.get("redaction_applied"):
                        results["redacted"] += 1
                elif doc_result["status"] == "rejected":
                    results["skipped"] += 1
                else:
                    results["failed"] += 1

                results["details"].append({
                    "doc_id": doc.get("id"),
                    "result": doc_result["status"]
                })

            except Exception as e:
                results["failed"] += 1
                results["details"].append({
                    "doc_id": doc.get("id"),
                    "result": "error",
                    "error": str(e)
                })

        return results

Filtering in LLM Responses

DEVELOPERpython
class OutputSanitizer:
    """Filters sensitive data in LLM responses."""

    def __init__(self, detector: RegexSensitiveDetector):
        self.detector = detector
        self.hallucination_patterns = self._compile_hallucination_patterns()

    def _compile_hallucination_patterns(self) -> Dict[str, re.Pattern]:
        """Patterns for potentially hallucinated data."""
        return {
            "fake_email": re.compile(r'[a-z]+\.(example|test|demo)@'),
            "placeholder_phone": re.compile(r'555[-\s]?\d{3}[-\s]?\d{4}'),
            "example_data": re.compile(r'(example|sample|test|dummy|fake)', re.I)
        }

    def sanitize_response(
        self,
        response: str,
        context_documents: List[str] = None
    ) -> Dict[str, Any]:
        """
        Clean LLM response of sensitive data.

        Args:
            response: LLM response
            context_documents: Source documents for validation

        Returns:
            Cleaned response with metadata
        """
        # Step 1: Detect sensitive data
        detections = self.detector.detect(response)

        # Step 2: Check if data comes from context
        if context_documents and detections:
            hallucinated = self._identify_hallucinated_data(
                detections,
                context_documents
            )
        else:
            hallucinated = detections

        # Step 3: Filter hallucinated or sensitive data
        sanitized = response
        for detection in hallucinated:
            # Replace with informative placeholder
            replacement = self._get_safe_replacement(detection)
            sanitized = (
                sanitized[:detection.start_pos] +
                replacement +
                sanitized[detection.end_pos:]
            )

        # Step 4: Additional checks
        sanitized = self._remove_potential_hallucinations(sanitized)

        return {
            "sanitized_response": sanitized,
            "original_response": response,
            "modifications": len(detections),
            "hallucinated_data_removed": len(hallucinated),
            "is_modified": sanitized != response
        }

    def _identify_hallucinated_data(
        self,
        detections: List[DetectionResult],
        context_docs: List[str]
    ) -> List[DetectionResult]:
        """Identify data that doesn't appear in context."""
        hallucinated = []
        context_combined = " ".join(context_docs).lower()

        for detection in detections:
            # If data is not in context, it's potentially hallucinated
            if detection.value.lower() not in context_combined:
                hallucinated.append(detection)

        return hallucinated

    def _get_safe_replacement(self, detection: DetectionResult) -> str:
        """Generate safe replacement for sensitive data."""
        replacements = {
            SensitiveDataType.EMAIL: "[email not available]",
            SensitiveDataType.PHONE_US: "[number not available]",
            SensitiveDataType.CREDIT_CARD: "[card data masked]",
            SensitiveDataType.SSN_US: "[confidential number]",
            SensitiveDataType.API_KEY: "[key masked]"
        }
        return replacements.get(detection.data_type, "[data masked]")

    def _remove_potential_hallucinations(self, text: str) -> str:
        """Remove common hallucinated data patterns."""
        for pattern_name, pattern in self.hallucination_patterns.items():
            # Just warn, don't automatically remove
            if pattern.search(text):
                # Log for analysis
                pass
        return text

Best Practices and Checklist

Secure Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    SECURE RAG ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  DATA LAYER                                                    │
│  ├── Original documents (encrypted, restricted access)        │
│  ├── Masked documents (for indexing)                          │
│  └── Masking metadata (reversible mapping if needed)          │
│                                                                 │
│  PROCESSING LAYER                                              │
│  ├── Detection at ingestion (mandatory)                       │
│  ├── Detection at retrieval (optional)                        │
│  └── Detection at generation (mandatory)                      │
│                                                                 │
│  ACCESS LAYER                                                  │
│  ├── Strong authentication                                     │
│  ├── Role-based authorization                                  │
│  └── Complete audit trail                                      │
│                                                                 │
│  MONITORING LAYER                                              │
│  ├── Alerts on sensitive data detection                       │
│  ├── Compliance dashboard                                      │
│  └── Periodic reports                                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Sensitive Data Security Checklist

Detection:

Regex detector for known patterns (email, phone, card, etc.)
NER detector for names and locations
Hybrid detection in production
Custom patterns for business data

Ingestion Protection:

Mandatory scan before indexing
Defined processing policy (redact/reject/encrypt)
Masking metadata retention
Rejected document logs

Generation Protection:

LLM response filtering
PII hallucination detection
Validation against source documents

Compliance:

Documentation of processed data types
Access request response procedures
Automatic retention and purging
Audit trail for sensitive data access

Conclusion

Protecting sensitive data in a RAG system requires a multi-layered approach: robust detection, context-appropriate filtering, and continuous monitoring. The techniques presented in this guide allow you to build a secure pipeline while preserving your assistant's usefulness.

Key points:

Hybrid detection - Combine regex and ML for maximum coverage
Clear policy - Define how to handle each data type
Filter at every step - Ingestion, retrieval, and generation
Continuous audit - Monitor and constantly improve

Sensitive Data: Filtering and Protecting Information

Sensitive Data: Filtering and Protecting Information

What is Sensitive Data?

Definition and Categories

Exposure Points in a RAG Pipeline

Detecting Sensitive Data

Regular Expression Detection

Machine Learning Detection (NER)

Filtering and Protection Techniques

Redaction (Masking)

Selective Encryption

Anonymization (K-anonymity)

Protection in the RAG Pipeline

Filtering at Ingestion

Filtering in LLM Responses

Best Practices and Checklist

Secure Architecture

Sensitive Data Security Checklist

Conclusion

Further Reading

Tags

Related Posts

RAG Security and Compliance: GDPR, AI Act, and Best Practices

RAG for SMBs: Complete Guide Without a Data Team

Sovereign RAG: France Hosting and European Data

Ailog Assistant