Semantic Chunking for Better Retrieval

Split documents intelligently based on meaning, not just length. Learn semantic chunking techniques for RAG.

Author
Ailog Research Team
Published
Reading time
12 min read
Level
advanced
RAG Pipeline Step
Chunking

The Problem with Fixed-Size Chunking

Traditional chunking splits text every N characters or tokens: • ❌ Breaks sentences mid-thought • ❌ Separates related content • ❌ No context awareness

Semantic chunking splits based on meaning, not length.

How Semantic Chunking Works Embed each sentence using a sentence encoder Calculate similarity between consecutive sentences Split where similarity drops (topic change)

``python from sentence_transformers import SentenceTransformer import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')

def semantic_chunk(text, similarity_threshold=0.5): Split into sentences sentences = text.split('. ')

Embed all sentences embeddings = model.encode(sentences)

Calculate cosine similarity between consecutive sentences chunks = [] current_chunk = [sentences[0]]

for i in range(1, len(sentences)): similarity = np.dot(embeddings[i-1], embeddings[i]) / ( np.linalg.norm(embeddings[i-1]) np.linalg.norm(embeddings[i]) )

if similarity < similarity_threshold: Topic changed - start new chunk chunks.append('. '.join(current_chunk)) current_chunk = [sentences[i]] else: current_chunk.append(sentences[i])

Add final chunk chunks.append('. '.join(current_chunk))

return chunks `

LangChain Semantic Chunking (2025)

LangChain now includes built-in semantic chunking:

`python from langchain.text_splitter import SemanticChunker from langchain_openai.embeddings import OpenAIEmbeddings

text_splitter = SemanticChunker( OpenAIEmbeddings(), breakpoint_threshold_type="percentile", or "standard_deviation" breakpoint_threshold_amount=95 )

chunks = text_splitter.create_documents([long_text]) `

Advanced: Multi-Level Semantic Chunking

Combine semantic splits with size constraints:

`python def smart_semantic_chunk(text, max_chunk_size=1000, min_chunk_size=200): First: semantic split semantic_chunks = semantic_chunk(text)

final_chunks = []

for chunk in semantic_chunks: If chunk too large, further split if len(chunk) > max_chunk_size: Split by paragraphs within this semantic section paragraphs = chunk.split('\n\n') sub_chunk = ""

for para in paragraphs: if len(sub_chunk) + len(para) < max_chunk_size: sub_chunk += para + "\n\n" else: final_chunks.append(sub_chunk.strip()) sub_chunk = para + "\n\n"

if sub_chunk: final_chunks.append(sub_chunk.strip())

If chunk too small, merge with next elif len(chunk) < min_chunk_size and final_chunks: final_chunks[-1] += "\n\n" + chunk else: final_chunks.append(chunk)

return final_chunks `

Llamaindex Semantic Splitter

`python from llama_index.node_parser import SemanticSplitterNodeParser from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding()

splitter = SemanticSplitterNodeParser( buffer_size=1, Sentences to group breakpoint_percentile_threshold=95, embed_model=embed_model )

nodes = splitter.get_nodes_from_documents(documents) `

When to Use Semantic Chunking

Use semantic chunking when: • Documents have clear topic transitions • You need high precision retrieval • Content is narrative or explanatory • You can afford the compute cost

Stick to fixed-size when: • Speed is critical • Documents are very uniform • Budget is limited • Content is tabular or structured

Performance Considerations

Embedding cost: • Semantic chunking requires embedding every sentence • For a 10,000-word document: ~300 sentences to embed • Consider caching embeddings

Speed comparison (November 2025): • Fixed-size: ~1ms per document • Semantic: ~100-500ms per document (depending on model)

Hybrid Approach: Best of Both Worlds

`python def hybrid_chunk(text, target_size=500): Semantic split first semantic_chunks = semantic_chunk(text, similarity_threshold=0.6) Merge small chunks, split large ones final_chunks = [] buffer = ""

for chunk in semantic_chunks: if len(buffer) + len(chunk) < target_size 1.5: buffer += "\n\n" + chunk if buffer else chunk else: if buffer: final_chunks.append(buffer) buffer = chunk

if buffer: final_chunks.append(buffer)

return final_chunks `

Evaluation

Test retrieval quality with semantic vs fixed chunking:

`python Your test queries queries = [ "How does photosynthesis work?", "What are the benefits of exercise?" ]

Compare retrieval accuracy semantic_results = evaluate_chunking(semantic_chunks, queries) fixed_results = evaluate_chunking(fixed_chunks, queries)

print(f"Semantic MRR: {semantic_results['mrr']}") print(f"Fixed MRR: {fixed_results['mrr']}") ``

Semantic chunking typically improves retrieval by 15-30% but costs 100x more compute. Choose based on your accuracy/cost tradeoff.

Tags

  • chunking
  • semantic
  • nlp
  • embeddings
2. ChunkingAvancé

Semantic Chunking for Better Retrieval

8 novembre 2025
12 min read
Ailog Research Team

Split documents intelligently based on meaning, not just length. Learn semantic chunking techniques for RAG.

The Problem with Fixed-Size Chunking

Traditional chunking splits text every N characters or tokens:

  • ❌ Breaks sentences mid-thought
  • ❌ Separates related content
  • ❌ No context awareness

Semantic chunking splits based on meaning, not length.

How Semantic Chunking Works

  1. Embed each sentence using a sentence encoder
  2. Calculate similarity between consecutive sentences
  3. Split where similarity drops (topic change)
DEVELOPERpython
from sentence_transformers import SentenceTransformer import numpy as np model = SentenceTransformer('all-MiniLM-L6-v2') def semantic_chunk(text, similarity_threshold=0.5): # Split into sentences sentences = text.split('. ') # Embed all sentences embeddings = model.encode(sentences) # Calculate cosine similarity between consecutive sentences chunks = [] current_chunk = [sentences[0]] for i in range(1, len(sentences)): similarity = np.dot(embeddings[i-1], embeddings[i]) / ( np.linalg.norm(embeddings[i-1]) * np.linalg.norm(embeddings[i]) ) if similarity < similarity_threshold: # Topic changed - start new chunk chunks.append('. '.join(current_chunk)) current_chunk = [sentences[i]] else: current_chunk.append(sentences[i]) # Add final chunk chunks.append('. '.join(current_chunk)) return chunks

LangChain Semantic Chunking (2025)

LangChain now includes built-in semantic chunking:

DEVELOPERpython
from langchain.text_splitter import SemanticChunker from langchain_openai.embeddings import OpenAIEmbeddings text_splitter = SemanticChunker( OpenAIEmbeddings(), breakpoint_threshold_type="percentile", # or "standard_deviation" breakpoint_threshold_amount=95 ) chunks = text_splitter.create_documents([long_text])

Advanced: Multi-Level Semantic Chunking

Combine semantic splits with size constraints:

DEVELOPERpython
def smart_semantic_chunk(text, max_chunk_size=1000, min_chunk_size=200): # First: semantic split semantic_chunks = semantic_chunk(text) final_chunks = [] for chunk in semantic_chunks: # If chunk too large, further split if len(chunk) > max_chunk_size: # Split by paragraphs within this semantic section paragraphs = chunk.split('\n\n') sub_chunk = "" for para in paragraphs: if len(sub_chunk) + len(para) < max_chunk_size: sub_chunk += para + "\n\n" else: final_chunks.append(sub_chunk.strip()) sub_chunk = para + "\n\n" if sub_chunk: final_chunks.append(sub_chunk.strip()) # If chunk too small, merge with next elif len(chunk) < min_chunk_size and final_chunks: final_chunks[-1] += "\n\n" + chunk else: final_chunks.append(chunk) return final_chunks

Llamaindex Semantic Splitter

DEVELOPERpython
from llama_index.node_parser import SemanticSplitterNodeParser from llama_index.embeddings import OpenAIEmbedding embed_model = OpenAIEmbedding() splitter = SemanticSplitterNodeParser( buffer_size=1, # Sentences to group breakpoint_percentile_threshold=95, embed_model=embed_model ) nodes = splitter.get_nodes_from_documents(documents)

When to Use Semantic Chunking

Use semantic chunking when:

  • Documents have clear topic transitions
  • You need high precision retrieval
  • Content is narrative or explanatory
  • You can afford the compute cost

Stick to fixed-size when:

  • Speed is critical
  • Documents are very uniform
  • Budget is limited
  • Content is tabular or structured

Performance Considerations

Embedding cost:

  • Semantic chunking requires embedding every sentence
  • For a 10,000-word document: ~300 sentences to embed
  • Consider caching embeddings

Speed comparison (November 2025):

  • Fixed-size: ~1ms per document
  • Semantic: ~100-500ms per document (depending on model)

Hybrid Approach: Best of Both Worlds

DEVELOPERpython
def hybrid_chunk(text, target_size=500): # 1. Semantic split first semantic_chunks = semantic_chunk(text, similarity_threshold=0.6) # 2. Merge small chunks, split large ones final_chunks = [] buffer = "" for chunk in semantic_chunks: if len(buffer) + len(chunk) < target_size * 1.5: buffer += "\n\n" + chunk if buffer else chunk else: if buffer: final_chunks.append(buffer) buffer = chunk if buffer: final_chunks.append(buffer) return final_chunks

Evaluation

Test retrieval quality with semantic vs fixed chunking:

DEVELOPERpython
# Your test queries queries = [ "How does photosynthesis work?", "What are the benefits of exercise?" ] # Compare retrieval accuracy semantic_results = evaluate_chunking(semantic_chunks, queries) fixed_results = evaluate_chunking(fixed_chunks, queries) print(f"Semantic MRR: {semantic_results['mrr']}") print(f"Fixed MRR: {fixed_results['mrr']}")

Semantic chunking typically improves retrieval by 15-30% but costs 100x more compute. Choose based on your accuracy/cost tradeoff.

Tags

chunkingsemanticnlpembeddings

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !