Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

OpenAI Strengthens Native RAG Offering

OpenAI launches version 2 of its Assistants API with significant improvements to RAG capabilities. File search becomes more powerful, source annotations more precise, and vector stores more flexible.

"Assistants v2 represents our vision of turnkey RAG," explains Sam Altman during the keynote. "Developers can build production-ready RAG applications in just a few lines of code."

Assistants v2 API Updates

Improved File Search

File search v2 brings major improvements:

Feature	v1	v2
Files per vector store	100	10,000
Max size per file	512MB	2GB
Supported formats	12	25+
Table parsing	Basic	Advanced
Image parsing	No	Yes (OCR)

DEVELOPERpython
from openai import OpenAI

client = OpenAI()

# Create a vector store
vector_store = client.beta.vector_stores.create(
    name="knowledge-base",
    chunking_strategy={
        "type": "semantic",  # New: semantic chunking
        "min_chunk_size": 100,
        "max_chunk_size": 800
    }
)

# Upload files
client.beta.vector_stores.files.upload(
    vector_store_id=vector_store.id,
    file=open("document.pdf", "rb")
)

# Create an assistant with RAG
assistant = client.beta.assistants.create(
    name="RAG Assistant",
    model="gpt-4-turbo",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)

Chunking strategies are now configurable directly in the API.

Source Annotations

Responses now include precise annotations:

DEVELOPERpython
# Response with annotations
{
    "content": "Revenue increased by 15%[1].",
    "annotations": [
        {
            "type": "file_citation",
            "text": "[1]",
            "file_id": "file-abc123",
            "quote": "Annual revenue shows 15% growth",
            "page": 12,
            "confidence": 0.94
        }
    ]
}

Annotations include:

Exact quote from source document
Page number (for PDFs)
Confidence score
Link to source file

This feature is crucial for hallucination detection.

Shared Vector Stores

Vector stores can now be shared between assistants:

DEVELOPERpython
# Create a shared vector store
shared_store = client.beta.vector_stores.create(
    name="company-knowledge",
    sharing="organization"  # New
)

# Use in multiple assistants
for assistant_id in [assistant1, assistant2, assistant3]:
    client.beta.assistants.update(
        assistant_id,
        tool_resources={
            "file_search": {
                "vector_store_ids": [shared_store.id]
            }
        }
    )

Improved Streaming

RAG response streaming is more granular:

DEVELOPERpython
with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id
) as stream:
    for event in stream:
        if event.event == "thread.message.delta":
            print(event.data.delta.content[0].text.value, end="")
        elif event.event == "file_search.start":
            print(f"\n[Searching in {len(event.data.files)} files...]")
        elif event.event == "file_search.results":
            print(f"\n[{len(event.data.results)} results found]")

Performance and Limits

Benchmarks

OpenAI publishes benchmarks on standard RAG tasks:

Metric	Assistants v1	Assistants v2
Recall@5	72%	86%
Precision@5	68%	81%
Median latency	2.1s	1.4s
Citation accuracy	78%	91%

Current Limits

Limit	Value
Vector stores per organization	100
Files per vector store	10,000
Tokens per file	5M
Parallel requests	50
Vector store retention	30 days (configurable)

Pricing

New Pricing Model

Component	Price
Vector store (GB/day)	$0.10
File search (1K requests)	$0.03
Input tokens	$10/M
Output tokens	$30/M

Comparison with Custom Solutions

Approach	Estimated Monthly Cost*
Assistants v2	$200-500
Pinecone + GPT-4	$300-700
Qdrant self-hosted + GPT-4	$150-400
Ailog RAG-as-a-Service	$50-200

*For 100K requests/month, 1000 documents

Check our guide on RAG cost optimization.

Use Cases

When to Use Assistants v2

Ideal for:

Rapid prototypes
Teams without RAG expertise
Moderate traffic applications
All-in-one integration

Less suitable for:

Very high volume (> 1M requests/month)
Advanced customization needs
Data sovereignty constraints
Multi-LLM architectures

Complete Example

DEVELOPERpython
from openai import OpenAI

client = OpenAI()

# 1. Create vector store with documents
vector_store = client.beta.vector_stores.create(name="docs")
client.beta.vector_stores.file_batches.upload_and_poll(
    vector_store_id=vector_store.id,
    files=[open(f, "rb") for f in ["doc1.pdf", "doc2.pdf"]]
)

# 2. Create the assistant
assistant = client.beta.assistants.create(
    name="Support Bot",
    model="gpt-4-turbo",
    instructions="You are a support assistant. Always cite your sources.",
    tools=[{"type": "file_search"}],
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}
)

# 3. Create a conversation
thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="How do I configure product X?"
)

# 4. Execute and stream
with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id
) as stream:
    for text in stream.text_deltas:
        print(text, end="")

Migration from v1

Breaking Changes

retrieval renamed to file_search
New annotation structure
Vector stores mandatory (no more direct file attachments)

Migration Guide

DEVELOPERpython
# Before (v1)
assistant = client.beta.assistants.create(
    tools=[{"type": "retrieval"}],
    file_ids=["file-123"]
)

# After (v2)
vector_store = client.beta.vector_stores.create()
client.beta.vector_stores.files.create(
    vector_store_id=vector_store.id,
    file_id="file-123"
)
assistant = client.beta.assistants.create(
    tools=[{"type": "file_search"}],
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}
)

Our Take

Assistants v2 represents a significant improvement:

Strengths:

Simplified turnkey RAG
Precise source annotations
Good integration with OpenAI ecosystem

Points of attention:

OpenAI lock-in
Limited customization
Data hosted at OpenAI

For projects requiring more control or sovereignty, solutions like Ailog offer an alternative with French hosting and advanced customization.

Check our guide to best RAG platforms to compare.

OpenAI Assistants v2: Improved Integrated RAG