News

Anthropic API: New RAG Features

April 29, 2026
6 min read
Ailog Team

Anthropic enriches its Claude API with native RAG features: automatic citations, extended context, and improved tool use.

Anthropic Strengthens RAG Capabilities

Anthropic has just announced a major update to its Claude API, with particular focus on RAG use cases. New features include automatic citations, extended context, and improved tool use capabilities.

"RAG is the number one use case for Claude in enterprise," explains Dario Amodei, CEO of Anthropic. "These new features directly address our customers' needs."

New Features

Automatic Citations

Claude can now generate inline citations automatically:

DEVELOPERpython
import anthropic client = anthropic.Client() response = client.messages.create( model="claude-3-opus-20240229", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": document_base64 }, "citation_mode": "inline" # New parameter }, { "type": "text", "text": "Summarize this document with citations." } ] } ] ) # Response with automatic citations # "According to the document [1], revenue increased by 15%..."

Citations include:

  • Reference to source document
  • Page number (for PDFs)
  • Confidence score

This feature is essential for applications where traceability is critical. Check our guide on hallucination detection.

400K Context Window

Claude Opus 4 extends its context window to 400K tokens:

ModelPrevious ContextCurrent Context
Claude 3 Opus200K200K
Claude 4 Opus200K400K
Claude 4 Sonnet200K300K

This extension allows processing:

  • Documents of 300+ pages in a single request
  • Entire codebases for analysis
  • Very long conversations with history

For longer documents, our chunking strategies remain necessary.

Improved Tool Use

Tool use becomes more robust:

1. Parallel Execution

DEVELOPERpython
tools = [ {"name": "search_database", ...}, {"name": "fetch_user_profile", ...} ] # Claude can now call multiple tools in parallel response = client.messages.create( model="claude-3-opus-20240229", tools=tools, tool_choice={"type": "parallel"} # New )

2. Automatic Retry

When a tool fails, Claude can:

  • Reformulate the request
  • Try an alternative tool
  • Ask for clarifications

3. Tool Call Streaming

DEVELOPERpython
with client.messages.stream(...) as stream: for event in stream: if event.type == "tool_use_start": print(f"Calling {event.tool_name}...") elif event.type == "tool_use_result": print(f"Result: {event.result}")

These improvements directly benefit agentic RAG systems.

Guaranteed Structured Outputs

New mode to guarantee output format:

DEVELOPERpython
from pydantic import BaseModel class ProductInfo(BaseModel): name: str price: float in_stock: bool response = client.messages.create( model="claude-3-opus-20240229", messages=[...], response_format={ "type": "json_schema", "schema": ProductInfo.model_json_schema() } ) # Guarantee: response always respects the schema

Performance and Pricing

RAG Benchmarks

Anthropic publishes RAG-specific benchmarks:

MetricClaude 3 OpusClaude 4 OpusImprovement
Attribution accuracy89%96%+7.9%
Hallucination rate4.2%1.8%-57%
Context utilization78%92%+18%
Multi-doc reasoning72%88%+22%

New Pricing

ModelInput/1M tokensOutput/1M tokens
Claude 4 Opus$15$75
Claude 4 Sonnet$3$15
Claude 4 Haiku$0.25$1.25

To optimize costs, check our guide on RAG cost optimization.

Integration with RAG Pipelines

Complete Example

DEVELOPERpython
import anthropic from qdrant_client import QdrantClient # 1. Search in vector database qdrant = QdrantClient(host="localhost") search_results = qdrant.search( collection_name="documents", query_vector=query_embedding, limit=5 ) # 2. Context construction context = "\n\n".join([ f"Document {i+1}:\n{r.payload['content']}" for i, r in enumerate(search_results) ]) # 3. Generation with Claude client = anthropic.Client() response = client.messages.create( model="claude-3-opus-20240229", messages=[ { "role": "system", "content": "You are an assistant that responds by citing sources." }, { "role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}" } ], extra_headers={ "anthropic-beta": "citations-2024-05-01" } )

Best Practices

1. Use the Right Model

  • Opus: Complex reasoning, long documents
  • Sonnet: Quality/cost balance
  • Haiku: High volume, simple tasks

2. Structure the Context

  • Clearly separate documents
  • Include metadata (title, date, source)
  • Limit to 5-10 relevant documents

3. Leverage Citations

  • Enable citation mode for traceability
  • Validate citations on backend
  • Display sources to user

Comparison with Competition

Claude vs GPT-4

AspectClaude 4 OpusGPT-4 Turbo
Context400K128K
Native citationsYesPartial
Pricing (input)$15/M$10/M
Hallucinations1.8%2.4%
Multi-docExcellentGood

Claude Advantages for RAG

  • Larger context window
  • Native automatic citations
  • Better handling of long documents
  • More reliable system instructions

Our Take

These updates make Claude an even more relevant choice for RAG:

Strengths:

  • Automatic citations (game changer)
  • 400K context
  • Reduced hallucinations

Points of attention:

  • Higher price than GPT-4 Turbo
  • Slightly higher latency
  • Fewer third-party integrations

For production RAG applications, Claude 4 Opus becomes our recommendation for cases requiring precision and traceability.

Platforms like Ailog automatically integrate the latest Claude models, allowing you to benefit from these improvements effortlessly.

Check our RAG introduction guide to get started.

Tags

RAGAnthropicClaudeAPILLM

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !