2. ChunkingAvancé

Semantic Chunking for Better Retrieval

8 novembre 2025
12 min read
Ailog Research Team

Split documents intelligently based on meaning, not just length. Learn semantic chunking techniques for RAG.

The Problem with Fixed-Size Chunking

Traditional chunking splits text every N characters or tokens:

  • ❌ Breaks sentences mid-thought
  • ❌ Separates related content
  • ❌ No context awareness

Semantic chunking splits based on meaning, not length.

How Semantic Chunking Works

  1. Embed each sentence using a sentence encoder
  2. Calculate similarity between consecutive sentences
  3. Split where similarity drops (topic change)
DEVELOPERpython
from sentence_transformers import SentenceTransformer import numpy as np model = SentenceTransformer('all-MiniLM-L6-v2') def semantic_chunk(text, similarity_threshold=0.5): # Split into sentences sentences = text.split('. ') # Embed all sentences embeddings = model.encode(sentences) # Calculate cosine similarity between consecutive sentences chunks = [] current_chunk = [sentences[0]] for i in range(1, len(sentences)): similarity = np.dot(embeddings[i-1], embeddings[i]) / ( np.linalg.norm(embeddings[i-1]) * np.linalg.norm(embeddings[i]) ) if similarity < similarity_threshold: # Topic changed - start new chunk chunks.append('. '.join(current_chunk)) current_chunk = [sentences[i]] else: current_chunk.append(sentences[i]) # Add final chunk chunks.append('. '.join(current_chunk)) return chunks

LangChain Semantic Chunking (2025)

LangChain now includes built-in semantic chunking:

DEVELOPERpython
from langchain.text_splitter import SemanticChunker from langchain_openai.embeddings import OpenAIEmbeddings text_splitter = SemanticChunker( OpenAIEmbeddings(), breakpoint_threshold_type="percentile", # or "standard_deviation" breakpoint_threshold_amount=95 ) chunks = text_splitter.create_documents([long_text])

Advanced: Multi-Level Semantic Chunking

Combine semantic splits with size constraints:

DEVELOPERpython
def smart_semantic_chunk(text, max_chunk_size=1000, min_chunk_size=200): # First: semantic split semantic_chunks = semantic_chunk(text) final_chunks = [] for chunk in semantic_chunks: # If chunk too large, further split if len(chunk) > max_chunk_size: # Split by paragraphs within this semantic section paragraphs = chunk.split('\n\n') sub_chunk = "" for para in paragraphs: if len(sub_chunk) + len(para) < max_chunk_size: sub_chunk += para + "\n\n" else: final_chunks.append(sub_chunk.strip()) sub_chunk = para + "\n\n" if sub_chunk: final_chunks.append(sub_chunk.strip()) # If chunk too small, merge with next elif len(chunk) < min_chunk_size and final_chunks: final_chunks[-1] += "\n\n" + chunk else: final_chunks.append(chunk) return final_chunks

Llamaindex Semantic Splitter

DEVELOPERpython
from llama_index.node_parser import SemanticSplitterNodeParser from llama_index.embeddings import OpenAIEmbedding embed_model = OpenAIEmbedding() splitter = SemanticSplitterNodeParser( buffer_size=1, # Sentences to group breakpoint_percentile_threshold=95, embed_model=embed_model ) nodes = splitter.get_nodes_from_documents(documents)

When to Use Semantic Chunking

Use semantic chunking when:

  • Documents have clear topic transitions
  • You need high precision retrieval
  • Content is narrative or explanatory
  • You can afford the compute cost

Stick to fixed-size when:

  • Speed is critical
  • Documents are very uniform
  • Budget is limited
  • Content is tabular or structured

Performance Considerations

Embedding cost:

  • Semantic chunking requires embedding every sentence
  • For a 10,000-word document: ~300 sentences to embed
  • Consider caching embeddings

Speed comparison (November 2025):

  • Fixed-size: ~1ms per document
  • Semantic: ~100-500ms per document (depending on model)

Hybrid Approach: Best of Both Worlds

DEVELOPERpython
def hybrid_chunk(text, target_size=500): # 1. Semantic split first semantic_chunks = semantic_chunk(text, similarity_threshold=0.6) # 2. Merge small chunks, split large ones final_chunks = [] buffer = "" for chunk in semantic_chunks: if len(buffer) + len(chunk) < target_size * 1.5: buffer += "\n\n" + chunk if buffer else chunk else: if buffer: final_chunks.append(buffer) buffer = chunk if buffer: final_chunks.append(buffer) return final_chunks

Evaluation

Test retrieval quality with semantic vs fixed chunking:

DEVELOPERpython
# Your test queries queries = [ "How does photosynthesis work?", "What are the benefits of exercise?" ] # Compare retrieval accuracy semantic_results = evaluate_chunking(semantic_chunks, queries) fixed_results = evaluate_chunking(fixed_chunks, queries) print(f"Semantic MRR: {semantic_results['mrr']}") print(f"Fixed MRR: {fixed_results['mrr']}")

Semantic chunking typically improves retrieval by 15-30% but costs 100x more compute. Choose based on your accuracy/cost tradeoff.

Tags

chunkingsemanticnlpembeddings

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !