Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

TL;DR

Hallucination = réponse non supportée par le contexte fourni
2 types : intrinsèques (contradictions) et extrinsèques (inventions)
Détection : NLI, LLM-as-judge, métriques de grounding
Prévention : meilleur retrieval, prompts stricts, guardrails
Surveillez les hallucinations en temps réel sur Ailog

Qu'est-ce qu'une Hallucination RAG ?

Dans le contexte RAG, une hallucination est une information générée par le LLM qui n'est pas présente dans les documents récupérés.

Types d'Hallucinations

1. Hallucinations Extrinsèques (Inventions)

Contexte: "Notre entreprise a été fondée en 2010 à Paris."
Question: "Quand et où a été fondée l'entreprise ?"
Réponse: "L'entreprise a été fondée en 2010 à Paris par Jean Dupont."
                                            ^^^^^^^^^^^^^^^^
                                            Inventé - pas dans le contexte

2. Hallucinations Intrinsèques (Contradictions)

Contexte: "Le produit coûte 99€ et est disponible en bleu."
Question: "Quel est le prix du produit ?"
Réponse: "Le produit coûte 89€."
                          ^^^^
                          Contredit le contexte (99€)

3. Hallucinations par Extrapolation

Contexte: "Les ventes ont augmenté de 20% au Q1."
Question: "Comment vont les ventes ?"
Réponse: "Les ventes sont excellentes et devraient atteindre un record cette année."
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                      Extrapolation non justifiée

Détection par Natural Language Inference (NLI)

L'approche NLI vérifie si le contexte implique la réponse.

DEVELOPERpython
from transformers import pipeline

nli_classifier = pipeline(
    "text-classification",
    model="facebook/bart-large-mnli"
)

def check_entailment(context: str, claim: str) -> dict:
    """
    Vérifie si le contexte implique la claim.

    Labels: entailment, contradiction, neutral
    """
    # Formater pour NLI
    input_text = f"{context}</s></s>{claim}"

    result = nli_classifier(input_text)

    label = result[0]['label']
    score = result[0]['score']

    return {
        "label": label,
        "confidence": score,
        "is_grounded": label == "entailment",
        "is_contradiction": label == "contradiction"
    }

# Exemple
context = "La livraison prend 3 à 5 jours ouvrés."
claim = "La livraison prend une semaine."

result = check_entailment(context, claim)
# {"label": "contradiction", "confidence": 0.92, ...}

Décomposition en Claims

Pour une détection précise, décomposer la réponse en claims atomiques :

DEVELOPERpython
def extract_claims(response: str, llm_client) -> list:
    """
    Extrait les claims atomiques d'une réponse.
    """
    prompt = f"""Extract all factual claims from this text.
Each claim should be a single, verifiable statement.

Text: {response}

Output as a numbered list:
1. [First claim]
2. [Second claim]
..."""

    result = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )

    claims_text = result.choices[0].message.content
    claims = [line.split('. ', 1)[1] for line in claims_text.strip().split('\n') if '. ' in line]

    return claims

def check_all_claims(context: str, response: str, llm_client) -> dict:
    """
    Vérifie chaque claim de la réponse contre le contexte.
    """
    claims = extract_claims(response, llm_client)

    results = []
    for claim in claims:
        check = check_entailment(context, claim)
        results.append({
            "claim": claim,
            **check
        })

    hallucinated = [r for r in results if not r["is_grounded"]]
    contradictions = [r for r in results if r["is_contradiction"]]

    return {
        "total_claims": len(claims),
        "grounded_claims": len(claims) - len(hallucinated),
        "hallucinations": hallucinated,
        "contradictions": contradictions,
        "hallucination_rate": len(hallucinated) / len(claims) if claims else 0
    }

Détection par LLM-as-Judge

Utiliser un LLM pour évaluer le grounding :

DEVELOPERpython
def llm_judge_hallucination(
    context: str,
    question: str,
    response: str,
    llm_client
) -> dict:
    """
    Utilise un LLM comme juge pour détecter les hallucinations.
    """
    prompt = f"""You are a fact-checking expert. Analyze if the response contains hallucinations.

Context (source of truth):
{context}

Question: {question}

Response to check: {response}

For each piece of information in the response, classify as:
- SUPPORTED: Directly stated or clearly implied by context
- HALLUCINATION: Not in context (made up)
- CONTRADICTION: Conflicts with context
- EXTRAPOLATION: Goes beyond what context states

Output format:
VERDICT: [CLEAN / HAS_HALLUCINATIONS / HAS_CONTRADICTIONS]
ANALYSIS:
- [Quote from response]: [SUPPORTED/HALLUCINATION/etc] - [reason]
SUMMARY: Brief explanation"""

    result = llm_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )

    analysis = result.choices[0].message.content

    has_issues = "HAS_HALLUCINATIONS" in analysis or "HAS_CONTRADICTIONS" in analysis

    return {
        "has_hallucinations": has_issues,
        "analysis": analysis,
        "should_regenerate": has_issues
    }

Métriques de Détection

ROUGE-L pour Overlap

DEVELOPERpython
from rouge_score import rouge_scorer

scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True)

def check_overlap(context: str, response: str, threshold: float = 0.3) -> dict:
    """
    Vérifie l'overlap textuel entre contexte et réponse.
    Un score très bas peut indiquer des hallucinations.
    """
    scores = scorer.score(context, response)
    rouge_l = scores['rougeL'].fmeasure

    return {
        "rouge_l": rouge_l,
        "potential_hallucination": rouge_l < threshold,
        "interpretation": (
            "High overlap - likely grounded" if rouge_l > 0.5
            else "Low overlap - potential hallucinations" if rouge_l < threshold
            else "Moderate overlap - review recommended"
        )
    }

BERTScore pour Similarité Sémantique

DEVELOPERpython
from bert_score import score as bert_score

def semantic_similarity_check(
    context: str,
    response: str,
    threshold: float = 0.7
) -> dict:
    """
    Vérifie la similarité sémantique entre contexte et réponse.
    """
    P, R, F1 = bert_score(
        [response],
        [context],
        lang="fr",
        rescale_with_baseline=True
    )

    f1 = F1[0].item()

    return {
        "bert_score": f1,
        "potential_hallucination": f1 < threshold,
        "precision": P[0].item(),
        "recall": R[0].item()
    }

SelfCheckGPT

Technique qui utilise la cohérence entre plusieurs réponses :

DEVELOPERpython
def selfcheck_hallucination(
    question: str,
    context: str,
    llm_client,
    num_samples: int = 5
) -> dict:
    """
    Génère plusieurs réponses et vérifie leur cohérence.
    Les hallucinations sont inconsistantes entre les samples.
    """
    # Générer plusieurs réponses
    responses = []
    for _ in range(num_samples):
        result = llm_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": f"Answer based on: {context}"},
                {"role": "user", "content": question}
            ],
            temperature=0.7  # Variation pour voir l'inconsistance
        )
        responses.append(result.choices[0].message.content)

    # Extraire les claims de la première réponse
    main_claims = extract_claims(responses[0], llm_client)

    # Vérifier chaque claim dans les autres réponses
    claim_consistency = []
    for claim in main_claims:
        present_count = 0
        for other_response in responses[1:]:
            if is_claim_present(claim, other_response, llm_client):
                present_count += 1

        consistency = present_count / (num_samples - 1)
        claim_consistency.append({
            "claim": claim,
            "consistency": consistency,
            "likely_hallucination": consistency < 0.5
        })

    # Les claims inconsistants sont probablement des hallucinations
    hallucinations = [c for c in claim_consistency if c["likely_hallucination"]]

    return {
        "claims_checked": len(main_claims),
        "consistent_claims": len(main_claims) - len(hallucinations),
        "potential_hallucinations": hallucinations,
        "overall_reliability": 1 - (len(hallucinations) / len(main_claims)) if main_claims else 1
    }

def is_claim_present(claim: str, text: str, llm_client) -> bool:
    """
    Vérifie si une claim est présente dans un texte.
    """
    prompt = f"""Does this text contain or imply this claim?

Claim: {claim}

Text: {text}

Answer only YES or NO."""

    result = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=3,
        temperature=0
    )

    return "YES" in result.choices[0].message.content.upper()

Prévention des Hallucinations

1. Améliorer le Retrieval

DEVELOPERpython
def enhanced_retrieval(query: str, retriever, threshold: float = 0.7) -> list:
    """
    Retrieval avec seuil de confiance.
    Mieux vaut ne rien retourner que retourner du bruit.
    """
    results = retriever.retrieve(query, k=10)

    # Filtrer par score
    confident_results = [
        r for r in results
        if r['score'] > threshold
    ]

    if not confident_results:
        return {
            "docs": [],
            "confidence": "low",
            "should_fallback": True
        }

    return {
        "docs": confident_results,
        "confidence": "high"
    }

2. Prompts Stricts

DEVELOPERpython
ANTI_HALLUCINATION_PROMPT = """You are a precise assistant that ONLY uses information from the provided context.

STRICT RULES:
1. ONLY state facts that are EXPLICITLY written in the context
2. If the context doesn't contain the answer, say "Je n'ai pas cette information dans mes sources"
3. NEVER add information from your general knowledge
4. NEVER extrapolate or make assumptions
5. When uncertain, express uncertainty

Context:
{context}

Question: {question}

Answer based ONLY on the context above:"""

3. Citation des Sources

DEVELOPERpython
def generate_with_citations(
    question: str,
    docs: list,
    llm_client
) -> dict:
    """
    Force le LLM à citer ses sources, réduisant les hallucinations.
    """
    numbered_docs = "\n\n".join([
        f"[{i+1}] {doc['content']}"
        for i, doc in enumerate(docs)
    ])

    prompt = f"""Answer the question using ONLY the numbered sources below.
For each fact, add a citation like [1] or [2].
If a fact isn't in any source, don't mention it.

Sources:
{numbered_docs}

Question: {question}

Answer with citations:"""

    result = llm_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )

    response = result.choices[0].message.content

    # Vérifier que les citations existent
    import re
    citations = re.findall(r'\[(\d+)\]', response)
    valid_citations = [c for c in citations if int(c) <= len(docs)]

    return {
        "response": response,
        "citations_found": len(set(citations)),
        "all_citations_valid": len(valid_citations) == len(citations)
    }

Pipeline de Détection Complet

DEVELOPERpython
class HallucinationDetector:
    def __init__(self, llm_client, nli_model=None):
        self.llm = llm_client
        self.nli = nli_model

    def analyze(
        self,
        context: str,
        question: str,
        response: str
    ) -> dict:
        """
        Analyse complète des hallucinations.
        """
        results = {
            "response": response,
            "checks": {}
        }

        # 1. NLI Check (rapide)
        if self.nli:
            claims = extract_claims(response, self.llm)
            nli_results = []
            for claim in claims:
                check = check_entailment(context, claim)
                nli_results.append(check)

            results["checks"]["nli"] = {
                "claims_count": len(claims),
                "grounded": sum(1 for r in nli_results if r["is_grounded"]),
                "hallucinations": sum(1 for r in nli_results if not r["is_grounded"])
            }

        # 2. LLM Judge (précis mais lent)
        judge_result = llm_judge_hallucination(
            context, question, response, self.llm
        )
        results["checks"]["llm_judge"] = judge_result

        # 3. Semantic overlap
        overlap = check_overlap(context, response)
        results["checks"]["overlap"] = overlap

        # 4. Verdict final
        hallucination_signals = 0
        total_signals = 0

        if "nli" in results["checks"]:
            if results["checks"]["nli"]["hallucinations"] > 0:
                hallucination_signals += 1
            total_signals += 1

        if results["checks"]["llm_judge"]["has_hallucinations"]:
            hallucination_signals += 1
        total_signals += 1

        if results["checks"]["overlap"]["potential_hallucination"]:
            hallucination_signals += 1
        total_signals += 1

        results["verdict"] = {
            "has_hallucinations": hallucination_signals >= 2,
            "confidence": hallucination_signals / total_signals,
            "recommendation": (
                "REJECT" if hallucination_signals >= 2
                else "REVIEW" if hallucination_signals == 1
                else "ACCEPT"
            )
        }

        return results

# Usage
detector = HallucinationDetector(llm_client=openai_client)

analysis = detector.analyze(
    context="Notre produit coûte 99€ et est livré en 3-5 jours.",
    question="Quel est le prix ?",
    response="Le produit premium coûte 99€ avec livraison express gratuite."
)

if analysis["verdict"]["recommendation"] == "REJECT":
    # Régénérer la réponse
    pass

Benchmarks de Détection

Méthode	Précision	Recall	Latence	Coût
ROUGE-L	60%	75%	5ms	Gratuit
NLI	78%	82%	50ms	Gratuit
BERTScore	72%	70%	100ms	Gratuit
GPT-4o Judge	92%	88%	500ms	$$$
SelfCheckGPT	85%	80%	2s	$$
Ensemble	94%	90%	600ms	$$

Guides connexes

Évaluation et Qualité :

Évaluation RAG - Métriques complètes
Guardrails RAG - Sécurité en production
Monitoring RAG - Supervision continue

Retrieval :

Stratégies de Récupération - Améliorer le retrieval
Reranking - Meilleurs résultats

Vos utilisateurs rencontrent des hallucinations ? Analysons votre pipeline ensemble →

Détection des Hallucinations dans les Systèmes RAG