Guide

Notion + RAG: Connect Your Company Wiki

March 24, 2026
Ailog Team

Complete guide to integrating Notion as a knowledge source for a RAG chatbot. Synchronization, indexing, semantic search, and practical use cases.

Notion + RAG: Connect Your Company Wiki

Notion has become the go-to wiki for thousands of companies. Its flexibility, intuitive interface, and collaboration features make it an essential tool for centralizing team knowledge. But as your workspace grows, a problem emerges: finding information becomes a nightmare. With hundreds of pages, sub-pages, and databases, even the most experienced users spend precious minutes searching for what they know exists somewhere.

A RAG chatbot connected to Notion transforms this document mass into an intelligent assistant. Instead of navigating, you ask a question in natural language and get a synthesized, sourced, and contextualized answer. This guide shows you how to achieve this integration step by step.

Why Connect Notion to RAG?

Limitations of Native Notion Search

Notion's built-in search, while useful, has significant limitations for large organizations:

ProblemConcrete Impact
Keyword search only"How to request vacation" doesn't find "absence procedure"
No database searchProperties and fields are not indexed
Results not ranked by relevanceRecent pages prioritized over most relevant
No synthesisUser must open and read each page
No conversational contextEach search starts from scratch

What RAG Provides

The RAG (Retrieval-Augmented Generation) approach solves these limitations by combining semantic search and language generation:

  • Semantic search: Finds information even when phrased differently. "How to request PTO" will match "Vacation day request procedure"
  • Intelligent synthesis: Answers directly without forcing navigation through 5 pages
  • Multi-page aggregation: Combines information from multiple sources for a complete answer
  • Conversational memory: Each question benefits from the context of previous exchanges
  • Sourced citations: Each claim links back to the original page

Notion + RAG Architecture

The integration follows a three-layer architecture that separates extraction, indexing, and querying:

+-------------------------------------------------------------------------+
|                        Notion + RAG Architecture                         |
+-------------------------------------------------------------------------+
|                                                                         |
|   EXTRACTION                   INDEXING                  QUERYING       |
|   +--------------+            +--------------+          +------------+  |
|   |  Notion API  |----------->|   Chunking   |--------->|   Qdrant   |  |
|   |              |            |              |          |            |  |
|   |  - Pages     |            |  - Sections  |          |  Vectors   |  |
|   |  - DBs       |            |  - 500 tokens|          |            |  |
|   |  - Blocks    |            |  - Overlap   |          +------+-----+  |
|   +--------------+            +--------------+                 |        |
|                                      |                         |        |
|                               +------+------+                  |        |
|                               |  Embeddings |                  |        |
|                               |   BGE-M3    |                  |        |
|                               +-------------+                  |        |
|                                                                |        |
|   CHATBOT                                                      |        |
|   +--------------+     +-------------+     +--------------+    |        |
|   |   User       |---->|  Retrieval  |<----|   Reranker   |<---+        |
|   |  Question    |     |   Top-20    |     |   Top-5      |             |
|   +--------------+     +-------------+     +------+-------+             |
|                                                   |                     |
|                                            +------+------+              |
|                                            |     LLM     |              |
|                                            |   Response  |              |
|                                            +-------------+              |
|                                                                         |
+-------------------------------------------------------------------------+

Key Components

  1. Extraction: The Notion connector uses the official API to retrieve pages and databases
  2. Chunking: Long documents are split into 500-token segments with overlap
  3. Embeddings: Each chunk is transformed into a semantic vector (BGE-M3 for multilingual)
  4. Vector database: Qdrant stores and indexes vectors for fast search
  5. Reranking: A second model reorders results by relevance
  6. Generation: The LLM synthesizes a response from relevant chunks

Complete Notion Connector

Here's a reference implementation for extracting Notion content:

DEVELOPERpython
from notion_client import Client from datetime import datetime import hashlib class NotionConnector: def __init__(self, token: str): """Initialize connector with integration token.""" self.client = Client(auth=token) self.processed_ids = set() def get_all_pages(self, filter_by_parent: str = None) -> list: """ Retrieve all pages accessible by the integration. Args: filter_by_parent: Parent page ID to filter (optional) Returns: List of documents formatted for RAG """ pages = [] has_more = True cursor = None while has_more: results = self.client.search( filter={"property": "object", "value": "page"}, start_cursor=cursor, page_size=100 ) for page in results['results']: # Avoid duplicates if page['id'] in self.processed_ids: continue # Filter by parent if specified if filter_by_parent: parent = page.get('parent', {}) if parent.get('page_id') != filter_by_parent: continue doc = self._format_page(page) if doc and len(doc['content']) > 50: # Ignore empty pages pages.append(doc) self.processed_ids.add(page['id']) has_more = results['has_more'] cursor = results.get('next_cursor') return pages def _format_page(self, page: dict) -> dict: """Format a Notion page as a RAG document.""" title = self._extract_title(page) content = self._extract_content(page['id']) # Generate hash to detect changes content_hash = hashlib.md5(content.encode()).hexdigest() return { "id": f"notion_{page['id']}", "title": title, "content": f"# {title}\n\n{content}", "metadata": { "source": "notion", "source_type": "wiki", "page_id": page['id'], "url": page.get('url', ''), "last_edited": page['last_edited_time'], "created_time": page['created_time'], "content_hash": content_hash, "parent_type": page.get('parent', {}).get('type'), "icon": self._extract_icon(page) } } def _extract_title(self, page: dict) -> str: """Extract page title.""" props = page.get('properties', {}) # Look in 'title' or 'Name' properties for key in ['title', 'Title', 'Name', 'name']: if key in props and props[key].get('title'): title_parts = props[key]['title'] return ''.join([t['plain_text'] for t in title_parts]) return "Untitled" def _extract_content(self, page_id: str) -> str: """Extract full textual content of a page.""" content_parts = [] def process_blocks(block_id: str, depth: int = 0): """Recursive to handle nested blocks.""" if depth > 5: # Depth limit return blocks = self.client.blocks.children.list(block_id=block_id) for block in blocks['results']: text = self._block_to_text(block, depth) if text: content_parts.append(text) # Process children if block has any if block.get('has_children'): process_blocks(block['id'], depth + 1) process_blocks(page_id) return "\n\n".join(content_parts) def _block_to_text(self, block: dict, depth: int = 0) -> str: """Convert a Notion block to Markdown.""" block_type = block['type'] indent = " " * depth handlers = { 'paragraph': lambda b: self._rich_text(b['paragraph']['rich_text']), 'heading_1': lambda b: f"# {self._rich_text(b['heading_1']['rich_text'])}", 'heading_2': lambda b: f"## {self._rich_text(b['heading_2']['rich_text'])}", 'heading_3': lambda b: f"### {self._rich_text(b['heading_3']['rich_text'])}", 'bulleted_list_item': lambda b: f"{indent}- {self._rich_text(b['bulleted_list_item']['rich_text'])}", 'numbered_list_item': lambda b: f"{indent}1. {self._rich_text(b['numbered_list_item']['rich_text'])}", 'to_do': lambda b: f"{indent}- [{'x' if b['to_do']['checked'] else ' '}] {self._rich_text(b['to_do']['rich_text'])}", 'toggle': lambda b: f"{indent}> {self._rich_text(b['toggle']['rich_text'])}", 'quote': lambda b: f"> {self._rich_text(b['quote']['rich_text'])}", 'callout': lambda b: f"> {b['callout'].get('icon', {}).get('emoji', '')} {self._rich_text(b['callout']['rich_text'])}", 'code': lambda b: f"```{b['code']['language']}\n{self._rich_text(b['code']['rich_text'])}\n```", 'divider': lambda b: "---", 'table_row': lambda b: self._table_row_to_text(b), } handler = handlers.get(block_type) return handler(block) if handler else "" def _rich_text(self, rich_text: list) -> str: """Convert Notion rich text to text with Markdown formatting.""" parts = [] for rt in rich_text: text = rt['plain_text'] annotations = rt.get('annotations', {}) if annotations.get('bold'): text = f"**{text}**" if annotations.get('italic'): text = f"*{text}*" if annotations.get('code'): text = f"`{text}`" if rt.get('href'): text = f"[{text}]({rt['href']})" parts.append(text) return ''.join(parts) def _table_row_to_text(self, block: dict) -> str: """Convert a table row.""" cells = block['table_row']['cells'] row = [self._rich_text(cell) for cell in cells] return "| " + " | ".join(row) + " |" def _extract_icon(self, page: dict) -> str: """Extract page icon.""" icon = page.get('icon', {}) if icon.get('type') == 'emoji': return icon.get('emoji', '') return '' class NotionDatabaseConnector(NotionConnector): """Extension for extracting Notion databases.""" def get_database_entries(self, database_id: str) -> list: """ Retrieve all entries from a database. Each entry becomes a document with its properties as structured metadata. """ entries = [] has_more = True cursor = None while has_more: results = self.client.databases.query( database_id=database_id, start_cursor=cursor, page_size=100 ) for entry in results['results']: doc = self._format_database_entry(entry, database_id) if doc: entries.append(doc) has_more = results['has_more'] cursor = results.get('next_cursor') return entries def _format_database_entry(self, entry: dict, db_id: str) -> dict: """Format a database entry.""" props = entry.get('properties', {}) # Extract all properties as structured text prop_texts = [] metadata_props = {} for name, prop in props.items(): value = self._extract_property_value(prop) if value: prop_texts.append(f"**{name}**: {value}") metadata_props[name] = value title = metadata_props.get('Name', metadata_props.get('Title', 'Entry')) content = "\n".join(prop_texts) # Add page content if it has any page_content = self._extract_content(entry['id']) if page_content: content += f"\n\n{page_content}" return { "id": f"notion_db_{entry['id']}", "title": title, "content": f"# {title}\n\n{content}", "metadata": { "source": "notion", "source_type": "database", "database_id": db_id, "entry_id": entry['id'], "url": entry.get('url', ''), "last_edited": entry['last_edited_time'], **metadata_props } } def _extract_property_value(self, prop: dict) -> str: """Extract value from a Notion property.""" prop_type = prop.get('type') extractors = { 'title': lambda p: self._rich_text(p.get('title', [])), 'rich_text': lambda p: self._rich_text(p.get('rich_text', [])), 'number': lambda p: str(p.get('number', '')), 'select': lambda p: p.get('select', {}).get('name', '') if p.get('select') else '', 'multi_select': lambda p: ', '.join([s['name'] for s in p.get('multi_select', [])]), 'date': lambda p: p.get('date', {}).get('start', '') if p.get('date') else '', 'checkbox': lambda p: 'Yes' if p.get('checkbox') else 'No', 'url': lambda p: p.get('url', ''), 'email': lambda p: p.get('email', ''), 'phone_number': lambda p: p.get('phone_number', ''), 'status': lambda p: p.get('status', {}).get('name', '') if p.get('status') else '', } extractor = extractors.get(prop_type) return extractor(prop) if extractor else ''

Intelligent Synchronization

Synchronization can be triggered in several ways depending on your needs:

Polling Synchronization

DEVELOPERpython
from datetime import datetime, timedelta class NotionSyncManager: def __init__(self, connector: NotionConnector, indexer): self.connector = connector self.indexer = indexer self.last_sync = None def sync_incremental(self): """ Incremental synchronization: only processes pages modified since the last synchronization. """ pages = self.connector.get_all_pages() updated = [] for page in pages: last_edited = datetime.fromisoformat( page['metadata']['last_edited'].replace('Z', '+00:00') ) if self.last_sync is None or last_edited > self.last_sync: updated.append(page) if updated: self.indexer.upsert_documents(updated) print(f"Synchronized {len(updated)} pages") self.last_sync = datetime.now() def sync_full(self): """Full synchronization: re-indexes everything.""" pages = self.connector.get_all_pages() self.indexer.replace_all(pages) self.last_sync = datetime.now() print(f"Indexed {len(pages)} pages")

Real-Time Synchronization

For real-time synchronization, use Notion webhooks (available via API) or a worker that polls regularly with fine granularity:

DEVELOPERpython
import schedule import time def start_sync_worker(sync_manager: NotionSyncManager): """Start synchronization worker.""" # Incremental sync every 5 minutes schedule.every(5).minutes.do(sync_manager.sync_incremental) # Daily full sync (cleanup) schedule.every().day.at("03:00").do(sync_manager.sync_full) while True: schedule.run_pending() time.sleep(60)

Optimized System Prompt for Notion

The system prompt is crucial for quality responses. Here's an optimized version for company wikis:

DEVELOPERpython
NOTION_KB_SYSTEM_PROMPT = """You are the knowledge base assistant for {company_name}. You help employees quickly find information in our Notion wiki. ## Your Mission - Answer questions based ONLY on wiki content - Systematically cite sources with the page title - Direct to the right person/team if the question goes beyond the wiki ## Strict Rules 1. Never invent information absent from the wiki 2. If you're not sure, say so clearly 3. Prioritize the most recent information 4. Adapt detail level to the question (summary vs detail) ## Response Format - Start by directly answering the question - Add details if relevant - End with "Source: [Page Title]" for each piece of information ## Available Context {context} ## Conversation History {chat_history} """ def build_notion_prompt(company: str, docs: list, history: list) -> str: """Build prompt with context.""" context_parts = [] for doc in docs: context_parts.append(f"### {doc['title']}\n{doc['content'][:1000]}") history_parts = [] for msg in history[-5:]: # Last 5 messages role = "User" if msg['role'] == 'user' else "Assistant" history_parts.append(f"{role}: {msg['content']}") return NOTION_KB_SYSTEM_PROMPT.format( company_name=company, context="\n\n---\n\n".join(context_parts), chat_history="\n".join(history_parts) if history_parts else "None" )

Concrete Use Cases

1. New Employee Onboarding

Question: "How do I set up my development environment?"

RAG Response:

To set up your dev environment, here are the steps:

  1. Clone the main repo: git clone [email protected]:company/main.git
  2. Install dependencies: npm install
  3. Copy .env.example to .env
  4. Request credentials from the DevOps team on Slack (#devops-support)

Source: [Dev Setup Guide] | [Tech Onboarding]

2. Frequent HR Questions

Question: "How do I request time off?"

RAG Response:

To request time off:

  1. Go to Payfit > Absences > New request
  2. Select the type (PTO, comp time, unpaid)
  3. Enter dates and submit

Approval delay: 48h by your manager. For emergencies, contact HR.

Source: [Vacation and Absence Procedure]

3. Technical Documentation

Question: "What's the payment service architecture?"

RAG Response:

The payment service follows an event-driven architecture:

  • API Gateway: Authentication and routing
  • Payment Service: Transaction orchestration
  • Stripe Adapter: Stripe integration
  • Event Bus: Kafka for notifications

Full diagram available on the dedicated page.

Source: [Payment Service Architecture] | [Technical Diagrams]

Best Practices

Structure Notion for RAG

PracticeWhy
Descriptive titlesImproves search
Clear hierarchical structureFacilitates chunking
Updated datesAllows prioritizing recent content
Tags and categoriesEnriches metadata
Internal linksHelps context

Manage Permissions

RAG inherits permissions from the Notion integration. For granular control:

  1. Create a dedicated integration for RAG
  2. Share only public pages with the integration
  3. Manage access by workspace if multi-tenant

Monitor Quality

  • Track unanswered questions
  • Collect user feedback
  • Identify most cited pages
  • Detect outdated content

Related Resources


Connect Notion with Ailog

Transform your Notion wiki into an intelligent assistant without writing a line of code. Ailog simplifies integration:

  • Native Notion connector: Automatic synchronization in a few clicks
  • Semantic search: Find info with your words, not the wiki's
  • Multi-workspace: Manage multiple Notion spaces in one interface
  • Access control: Respect your organization's permissions
  • French hosting: Data on French servers, native GDPR compliance

Try Ailog for free and deploy your Notion assistant in 10 minutes.

Tags

ragnotionknowledge basecompany wikiintegrationinternal chatbot

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !