Notion + RAG: Connect Your Company Wiki
Complete guide to integrating Notion as a knowledge source for a RAG chatbot. Synchronization, indexing, semantic search, and practical use cases.
Notion + RAG: Connect Your Company Wiki
Notion has become the go-to wiki for thousands of companies. Its flexibility, intuitive interface, and collaboration features make it an essential tool for centralizing team knowledge. But as your workspace grows, a problem emerges: finding information becomes a nightmare. With hundreds of pages, sub-pages, and databases, even the most experienced users spend precious minutes searching for what they know exists somewhere.
A RAG chatbot connected to Notion transforms this document mass into an intelligent assistant. Instead of navigating, you ask a question in natural language and get a synthesized, sourced, and contextualized answer. This guide shows you how to achieve this integration step by step.
Why Connect Notion to RAG?
Limitations of Native Notion Search
Notion's built-in search, while useful, has significant limitations for large organizations:
| Problem | Concrete Impact |
|---|---|
| Keyword search only | "How to request vacation" doesn't find "absence procedure" |
| No database search | Properties and fields are not indexed |
| Results not ranked by relevance | Recent pages prioritized over most relevant |
| No synthesis | User must open and read each page |
| No conversational context | Each search starts from scratch |
What RAG Provides
The RAG (Retrieval-Augmented Generation) approach solves these limitations by combining semantic search and language generation:
- Semantic search: Finds information even when phrased differently. "How to request PTO" will match "Vacation day request procedure"
- Intelligent synthesis: Answers directly without forcing navigation through 5 pages
- Multi-page aggregation: Combines information from multiple sources for a complete answer
- Conversational memory: Each question benefits from the context of previous exchanges
- Sourced citations: Each claim links back to the original page
Notion + RAG Architecture
The integration follows a three-layer architecture that separates extraction, indexing, and querying:
+-------------------------------------------------------------------------+
| Notion + RAG Architecture |
+-------------------------------------------------------------------------+
| |
| EXTRACTION INDEXING QUERYING |
| +--------------+ +--------------+ +------------+ |
| | Notion API |----------->| Chunking |--------->| Qdrant | |
| | | | | | | |
| | - Pages | | - Sections | | Vectors | |
| | - DBs | | - 500 tokens| | | |
| | - Blocks | | - Overlap | +------+-----+ |
| +--------------+ +--------------+ | |
| | | |
| +------+------+ | |
| | Embeddings | | |
| | BGE-M3 | | |
| +-------------+ | |
| | |
| CHATBOT | |
| +--------------+ +-------------+ +--------------+ | |
| | User |---->| Retrieval |<----| Reranker |<---+ |
| | Question | | Top-20 | | Top-5 | |
| +--------------+ +-------------+ +------+-------+ |
| | |
| +------+------+ |
| | LLM | |
| | Response | |
| +-------------+ |
| |
+-------------------------------------------------------------------------+
Key Components
- Extraction: The Notion connector uses the official API to retrieve pages and databases
- Chunking: Long documents are split into 500-token segments with overlap
- Embeddings: Each chunk is transformed into a semantic vector (BGE-M3 for multilingual)
- Vector database: Qdrant stores and indexes vectors for fast search
- Reranking: A second model reorders results by relevance
- Generation: The LLM synthesizes a response from relevant chunks
Complete Notion Connector
Here's a reference implementation for extracting Notion content:
DEVELOPERpythonfrom notion_client import Client from datetime import datetime import hashlib class NotionConnector: def __init__(self, token: str): """Initialize connector with integration token.""" self.client = Client(auth=token) self.processed_ids = set() def get_all_pages(self, filter_by_parent: str = None) -> list: """ Retrieve all pages accessible by the integration. Args: filter_by_parent: Parent page ID to filter (optional) Returns: List of documents formatted for RAG """ pages = [] has_more = True cursor = None while has_more: results = self.client.search( filter={"property": "object", "value": "page"}, start_cursor=cursor, page_size=100 ) for page in results['results']: # Avoid duplicates if page['id'] in self.processed_ids: continue # Filter by parent if specified if filter_by_parent: parent = page.get('parent', {}) if parent.get('page_id') != filter_by_parent: continue doc = self._format_page(page) if doc and len(doc['content']) > 50: # Ignore empty pages pages.append(doc) self.processed_ids.add(page['id']) has_more = results['has_more'] cursor = results.get('next_cursor') return pages def _format_page(self, page: dict) -> dict: """Format a Notion page as a RAG document.""" title = self._extract_title(page) content = self._extract_content(page['id']) # Generate hash to detect changes content_hash = hashlib.md5(content.encode()).hexdigest() return { "id": f"notion_{page['id']}", "title": title, "content": f"# {title}\n\n{content}", "metadata": { "source": "notion", "source_type": "wiki", "page_id": page['id'], "url": page.get('url', ''), "last_edited": page['last_edited_time'], "created_time": page['created_time'], "content_hash": content_hash, "parent_type": page.get('parent', {}).get('type'), "icon": self._extract_icon(page) } } def _extract_title(self, page: dict) -> str: """Extract page title.""" props = page.get('properties', {}) # Look in 'title' or 'Name' properties for key in ['title', 'Title', 'Name', 'name']: if key in props and props[key].get('title'): title_parts = props[key]['title'] return ''.join([t['plain_text'] for t in title_parts]) return "Untitled" def _extract_content(self, page_id: str) -> str: """Extract full textual content of a page.""" content_parts = [] def process_blocks(block_id: str, depth: int = 0): """Recursive to handle nested blocks.""" if depth > 5: # Depth limit return blocks = self.client.blocks.children.list(block_id=block_id) for block in blocks['results']: text = self._block_to_text(block, depth) if text: content_parts.append(text) # Process children if block has any if block.get('has_children'): process_blocks(block['id'], depth + 1) process_blocks(page_id) return "\n\n".join(content_parts) def _block_to_text(self, block: dict, depth: int = 0) -> str: """Convert a Notion block to Markdown.""" block_type = block['type'] indent = " " * depth handlers = { 'paragraph': lambda b: self._rich_text(b['paragraph']['rich_text']), 'heading_1': lambda b: f"# {self._rich_text(b['heading_1']['rich_text'])}", 'heading_2': lambda b: f"## {self._rich_text(b['heading_2']['rich_text'])}", 'heading_3': lambda b: f"### {self._rich_text(b['heading_3']['rich_text'])}", 'bulleted_list_item': lambda b: f"{indent}- {self._rich_text(b['bulleted_list_item']['rich_text'])}", 'numbered_list_item': lambda b: f"{indent}1. {self._rich_text(b['numbered_list_item']['rich_text'])}", 'to_do': lambda b: f"{indent}- [{'x' if b['to_do']['checked'] else ' '}] {self._rich_text(b['to_do']['rich_text'])}", 'toggle': lambda b: f"{indent}> {self._rich_text(b['toggle']['rich_text'])}", 'quote': lambda b: f"> {self._rich_text(b['quote']['rich_text'])}", 'callout': lambda b: f"> {b['callout'].get('icon', {}).get('emoji', '')} {self._rich_text(b['callout']['rich_text'])}", 'code': lambda b: f"```{b['code']['language']}\n{self._rich_text(b['code']['rich_text'])}\n```", 'divider': lambda b: "---", 'table_row': lambda b: self._table_row_to_text(b), } handler = handlers.get(block_type) return handler(block) if handler else "" def _rich_text(self, rich_text: list) -> str: """Convert Notion rich text to text with Markdown formatting.""" parts = [] for rt in rich_text: text = rt['plain_text'] annotations = rt.get('annotations', {}) if annotations.get('bold'): text = f"**{text}**" if annotations.get('italic'): text = f"*{text}*" if annotations.get('code'): text = f"`{text}`" if rt.get('href'): text = f"[{text}]({rt['href']})" parts.append(text) return ''.join(parts) def _table_row_to_text(self, block: dict) -> str: """Convert a table row.""" cells = block['table_row']['cells'] row = [self._rich_text(cell) for cell in cells] return "| " + " | ".join(row) + " |" def _extract_icon(self, page: dict) -> str: """Extract page icon.""" icon = page.get('icon', {}) if icon.get('type') == 'emoji': return icon.get('emoji', '') return '' class NotionDatabaseConnector(NotionConnector): """Extension for extracting Notion databases.""" def get_database_entries(self, database_id: str) -> list: """ Retrieve all entries from a database. Each entry becomes a document with its properties as structured metadata. """ entries = [] has_more = True cursor = None while has_more: results = self.client.databases.query( database_id=database_id, start_cursor=cursor, page_size=100 ) for entry in results['results']: doc = self._format_database_entry(entry, database_id) if doc: entries.append(doc) has_more = results['has_more'] cursor = results.get('next_cursor') return entries def _format_database_entry(self, entry: dict, db_id: str) -> dict: """Format a database entry.""" props = entry.get('properties', {}) # Extract all properties as structured text prop_texts = [] metadata_props = {} for name, prop in props.items(): value = self._extract_property_value(prop) if value: prop_texts.append(f"**{name}**: {value}") metadata_props[name] = value title = metadata_props.get('Name', metadata_props.get('Title', 'Entry')) content = "\n".join(prop_texts) # Add page content if it has any page_content = self._extract_content(entry['id']) if page_content: content += f"\n\n{page_content}" return { "id": f"notion_db_{entry['id']}", "title": title, "content": f"# {title}\n\n{content}", "metadata": { "source": "notion", "source_type": "database", "database_id": db_id, "entry_id": entry['id'], "url": entry.get('url', ''), "last_edited": entry['last_edited_time'], **metadata_props } } def _extract_property_value(self, prop: dict) -> str: """Extract value from a Notion property.""" prop_type = prop.get('type') extractors = { 'title': lambda p: self._rich_text(p.get('title', [])), 'rich_text': lambda p: self._rich_text(p.get('rich_text', [])), 'number': lambda p: str(p.get('number', '')), 'select': lambda p: p.get('select', {}).get('name', '') if p.get('select') else '', 'multi_select': lambda p: ', '.join([s['name'] for s in p.get('multi_select', [])]), 'date': lambda p: p.get('date', {}).get('start', '') if p.get('date') else '', 'checkbox': lambda p: 'Yes' if p.get('checkbox') else 'No', 'url': lambda p: p.get('url', ''), 'email': lambda p: p.get('email', ''), 'phone_number': lambda p: p.get('phone_number', ''), 'status': lambda p: p.get('status', {}).get('name', '') if p.get('status') else '', } extractor = extractors.get(prop_type) return extractor(prop) if extractor else ''
Intelligent Synchronization
Synchronization can be triggered in several ways depending on your needs:
Polling Synchronization
DEVELOPERpythonfrom datetime import datetime, timedelta class NotionSyncManager: def __init__(self, connector: NotionConnector, indexer): self.connector = connector self.indexer = indexer self.last_sync = None def sync_incremental(self): """ Incremental synchronization: only processes pages modified since the last synchronization. """ pages = self.connector.get_all_pages() updated = [] for page in pages: last_edited = datetime.fromisoformat( page['metadata']['last_edited'].replace('Z', '+00:00') ) if self.last_sync is None or last_edited > self.last_sync: updated.append(page) if updated: self.indexer.upsert_documents(updated) print(f"Synchronized {len(updated)} pages") self.last_sync = datetime.now() def sync_full(self): """Full synchronization: re-indexes everything.""" pages = self.connector.get_all_pages() self.indexer.replace_all(pages) self.last_sync = datetime.now() print(f"Indexed {len(pages)} pages")
Real-Time Synchronization
For real-time synchronization, use Notion webhooks (available via API) or a worker that polls regularly with fine granularity:
DEVELOPERpythonimport schedule import time def start_sync_worker(sync_manager: NotionSyncManager): """Start synchronization worker.""" # Incremental sync every 5 minutes schedule.every(5).minutes.do(sync_manager.sync_incremental) # Daily full sync (cleanup) schedule.every().day.at("03:00").do(sync_manager.sync_full) while True: schedule.run_pending() time.sleep(60)
Optimized System Prompt for Notion
The system prompt is crucial for quality responses. Here's an optimized version for company wikis:
DEVELOPERpythonNOTION_KB_SYSTEM_PROMPT = """You are the knowledge base assistant for {company_name}. You help employees quickly find information in our Notion wiki. ## Your Mission - Answer questions based ONLY on wiki content - Systematically cite sources with the page title - Direct to the right person/team if the question goes beyond the wiki ## Strict Rules 1. Never invent information absent from the wiki 2. If you're not sure, say so clearly 3. Prioritize the most recent information 4. Adapt detail level to the question (summary vs detail) ## Response Format - Start by directly answering the question - Add details if relevant - End with "Source: [Page Title]" for each piece of information ## Available Context {context} ## Conversation History {chat_history} """ def build_notion_prompt(company: str, docs: list, history: list) -> str: """Build prompt with context.""" context_parts = [] for doc in docs: context_parts.append(f"### {doc['title']}\n{doc['content'][:1000]}") history_parts = [] for msg in history[-5:]: # Last 5 messages role = "User" if msg['role'] == 'user' else "Assistant" history_parts.append(f"{role}: {msg['content']}") return NOTION_KB_SYSTEM_PROMPT.format( company_name=company, context="\n\n---\n\n".join(context_parts), chat_history="\n".join(history_parts) if history_parts else "None" )
Concrete Use Cases
1. New Employee Onboarding
Question: "How do I set up my development environment?"
RAG Response:
To set up your dev environment, here are the steps:
- Clone the main repo:
git clone [email protected]:company/main.git- Install dependencies:
npm install- Copy
.env.exampleto.env- Request credentials from the DevOps team on Slack (#devops-support)
Source: [Dev Setup Guide] | [Tech Onboarding]
2. Frequent HR Questions
Question: "How do I request time off?"
RAG Response:
To request time off:
- Go to Payfit > Absences > New request
- Select the type (PTO, comp time, unpaid)
- Enter dates and submit
Approval delay: 48h by your manager. For emergencies, contact HR.
Source: [Vacation and Absence Procedure]
3. Technical Documentation
Question: "What's the payment service architecture?"
RAG Response:
The payment service follows an event-driven architecture:
- API Gateway: Authentication and routing
- Payment Service: Transaction orchestration
- Stripe Adapter: Stripe integration
- Event Bus: Kafka for notifications
Full diagram available on the dedicated page.
Source: [Payment Service Architecture] | [Technical Diagrams]
Best Practices
Structure Notion for RAG
| Practice | Why |
|---|---|
| Descriptive titles | Improves search |
| Clear hierarchical structure | Facilitates chunking |
| Updated dates | Allows prioritizing recent content |
| Tags and categories | Enriches metadata |
| Internal links | Helps context |
Manage Permissions
RAG inherits permissions from the Notion integration. For granular control:
- Create a dedicated integration for RAG
- Share only public pages with the integration
- Manage access by workspace if multi-tenant
Monitor Quality
- Track unanswered questions
- Collect user feedback
- Identify most cited pages
- Detect outdated content
Related Resources
- Enterprise Knowledge Base - Complete pillar guide
- Confluence + RAG - For Atlassian environments
- SharePoint + RAG - For Microsoft 365
- Introduction to RAG - The fundamentals
Connect Notion with Ailog
Transform your Notion wiki into an intelligent assistant without writing a line of code. Ailog simplifies integration:
- Native Notion connector: Automatic synchronization in a few clicks
- Semantic search: Find info with your words, not the wiki's
- Multi-workspace: Manage multiple Notion spaces in one interface
- Access control: Respect your organization's permissions
- French hosting: Data on French servers, native GDPR compliance
Try Ailog for free and deploy your Notion assistant in 10 minutes.
Tags
Related Posts
Confluence: AI Knowledge Base for Teams
Complete guide to deploying a RAG assistant on Confluence. Transform your Atlassian documentation into an AI-queryable knowledge base.
SharePoint + RAG: Leverage Your Microsoft 365 Documents
Complete guide to connecting SharePoint to a RAG system. Make your Microsoft 365 documents AI-queryable with semantic search.
Slack RAG Bot: Intelligent Search in Your Conversations
Deploy a Slack bot connected to RAG to instantly find information shared in your channels and messages.