OpenAI Assistants v2: Improved Integrated RAG
OpenAI launches Assistants v2 with enhanced native RAG capabilities: improved file search, source annotations, and integrated vector stores.
OpenAI Strengthens Native RAG Offering
OpenAI launches version 2 of its Assistants API with significant improvements to RAG capabilities. File search becomes more powerful, source annotations more precise, and vector stores more flexible.
"Assistants v2 represents our vision of turnkey RAG," explains Sam Altman during the keynote. "Developers can build production-ready RAG applications in just a few lines of code."
Assistants v2 API Updates
Improved File Search
File search v2 brings major improvements:
| Feature | v1 | v2 |
|---|---|---|
| Files per vector store | 100 | 10,000 |
| Max size per file | 512MB | 2GB |
| Supported formats | 12 | 25+ |
| Table parsing | Basic | Advanced |
| Image parsing | No | Yes (OCR) |
DEVELOPERpythonfrom openai import OpenAI client = OpenAI() # Create a vector store vector_store = client.beta.vector_stores.create( name="knowledge-base", chunking_strategy={ "type": "semantic", # New: semantic chunking "min_chunk_size": 100, "max_chunk_size": 800 } ) # Upload files client.beta.vector_stores.files.upload( vector_store_id=vector_store.id, file=open("document.pdf", "rb") ) # Create an assistant with RAG assistant = client.beta.assistants.create( name="RAG Assistant", model="gpt-4-turbo", tools=[{"type": "file_search"}], tool_resources={ "file_search": { "vector_store_ids": [vector_store.id] } } )
Chunking strategies are now configurable directly in the API.
Source Annotations
Responses now include precise annotations:
DEVELOPERpython# Response with annotations { "content": "Revenue increased by 15%[1].", "annotations": [ { "type": "file_citation", "text": "[1]", "file_id": "file-abc123", "quote": "Annual revenue shows 15% growth", "page": 12, "confidence": 0.94 } ] }
Annotations include:
- Exact quote from source document
- Page number (for PDFs)
- Confidence score
- Link to source file
This feature is crucial for hallucination detection.
Shared Vector Stores
Vector stores can now be shared between assistants:
DEVELOPERpython# Create a shared vector store shared_store = client.beta.vector_stores.create( name="company-knowledge", sharing="organization" # New ) # Use in multiple assistants for assistant_id in [assistant1, assistant2, assistant3]: client.beta.assistants.update( assistant_id, tool_resources={ "file_search": { "vector_store_ids": [shared_store.id] } } )
Improved Streaming
RAG response streaming is more granular:
DEVELOPERpythonwith client.beta.threads.runs.stream( thread_id=thread.id, assistant_id=assistant.id ) as stream: for event in stream: if event.event == "thread.message.delta": print(event.data.delta.content[0].text.value, end="") elif event.event == "file_search.start": print(f"\n[Searching in {len(event.data.files)} files...]") elif event.event == "file_search.results": print(f"\n[{len(event.data.results)} results found]")
Performance and Limits
Benchmarks
OpenAI publishes benchmarks on standard RAG tasks:
| Metric | Assistants v1 | Assistants v2 |
|---|---|---|
| Recall@5 | 72% | 86% |
| Precision@5 | 68% | 81% |
| Median latency | 2.1s | 1.4s |
| Citation accuracy | 78% | 91% |
Current Limits
| Limit | Value |
|---|---|
| Vector stores per organization | 100 |
| Files per vector store | 10,000 |
| Tokens per file | 5M |
| Parallel requests | 50 |
| Vector store retention | 30 days (configurable) |
Pricing
New Pricing Model
| Component | Price |
|---|---|
| Vector store (GB/day) | $0.10 |
| File search (1K requests) | $0.03 |
| Input tokens | $10/M |
| Output tokens | $30/M |
Comparison with Custom Solutions
| Approach | Estimated Monthly Cost* |
|---|---|
| Assistants v2 | $200-500 |
| Pinecone + GPT-4 | $300-700 |
| Qdrant self-hosted + GPT-4 | $150-400 |
| Ailog RAG-as-a-Service | $50-200 |
*For 100K requests/month, 1000 documents
Check our guide on RAG cost optimization.
Use Cases
When to Use Assistants v2
Ideal for:
- Rapid prototypes
- Teams without RAG expertise
- Moderate traffic applications
- All-in-one integration
Less suitable for:
- Very high volume (> 1M requests/month)
- Advanced customization needs
- Data sovereignty constraints
- Multi-LLM architectures
Complete Example
DEVELOPERpythonfrom openai import OpenAI client = OpenAI() # 1. Create vector store with documents vector_store = client.beta.vector_stores.create(name="docs") client.beta.vector_stores.file_batches.upload_and_poll( vector_store_id=vector_store.id, files=[open(f, "rb") for f in ["doc1.pdf", "doc2.pdf"]] ) # 2. Create the assistant assistant = client.beta.assistants.create( name="Support Bot", model="gpt-4-turbo", instructions="You are a support assistant. Always cite your sources.", tools=[{"type": "file_search"}], tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}} ) # 3. Create a conversation thread = client.beta.threads.create() client.beta.threads.messages.create( thread_id=thread.id, role="user", content="How do I configure product X?" ) # 4. Execute and stream with client.beta.threads.runs.stream( thread_id=thread.id, assistant_id=assistant.id ) as stream: for text in stream.text_deltas: print(text, end="")
Migration from v1
Breaking Changes
retrievalrenamed tofile_search- New annotation structure
- Vector stores mandatory (no more direct file attachments)
Migration Guide
DEVELOPERpython# Before (v1) assistant = client.beta.assistants.create( tools=[{"type": "retrieval"}], file_ids=["file-123"] ) # After (v2) vector_store = client.beta.vector_stores.create() client.beta.vector_stores.files.create( vector_store_id=vector_store.id, file_id="file-123" ) assistant = client.beta.assistants.create( tools=[{"type": "file_search"}], tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}} )
Our Take
Assistants v2 represents a significant improvement:
Strengths:
- Simplified turnkey RAG
- Precise source annotations
- Good integration with OpenAI ecosystem
Points of attention:
- OpenAI lock-in
- Limited customization
- Data hosted at OpenAI
For projects requiring more control or sovereignty, solutions like Ailog offer an alternative with French hosting and advanced customization.
Check our guide to best RAG platforms to compare.
Tags
Related Posts
GPT-5 and RAG: What It Changes for Developers
OpenAI launches GPT-5 with revolutionary native RAG capabilities. Complete analysis of new features and their impact on retrieval-augmented architectures.
Anthropic API: New RAG Features
Anthropic enriches its Claude API with native RAG features: automatic citations, extended context, and improved tool use.
Hugging Face: New Open-Source RAG Models
Hugging Face releases a new family of models optimized for RAG: embeddings, rerankers, and specialized LLMs. Complete overview.