Diagrams and Schemas: Extracting Visual Information
Complete guide to integrating diagrams, technical schemas and infographics into your RAG system: extraction, interpretation and indexing with vision models.
Diagrams and Schemas: Extracting Visual Information
Diagrams, technical schemas and infographics contain enormous information density. A system architecture, flowchart or electrical schema encodes hours of documentation in a single image. This guide shows you how to make this visual content searchable in your RAG system.
The Diagram Challenge
Why It's Different from Photos
| Image Type | Characteristics | RAG Challenge |
|---|---|---|
| Photo | Continuous pixels, recognizable objects | Vision models excel |
| Diagram | Geometric shapes, relationships, text | Structure to understand |
| Technical schema | Standardized symbols, conventions | Specific vocabulary |
| Infographic | Text/visual mix, hierarchy | Ordered extraction |
Business Use Cases
- IT: System architectures, UML diagrams, network schemas
- Industry: Technical plans, electrical schemas, P&ID
- Business: Org charts, process maps, flowcharts
- Data: ERD diagrams, data lineage, pipelines
- Marketing: Infographics, presentations, visual reports
Indexing ROI
- -80% search time in technical documentation
- +50% understanding of complex systems
- Traceability: Find the origin of an architecture decision
Diagram Types and Strategies
Classification by Complexity
┌──────────────────────────────────────────────────────────────┐
│ DIAGRAM COMPLEXITY │
├──────────────────────────────────────────────────────────────┤
│ │
│ SIMPLE MEDIUM COMPLEX │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────┐ ┌──────────┐ ┌───────────┐ │
│ │ Simple │ │ System │ │Industrial │ │
│ │Flowchart│ │Architecture│ │ P&ID │ │
│ └────────┘ └──────────┘ └───────────┘ │
│ │
│ Approach: Approach: Approach: │
│ - Vision model - Multi-pass - OCR + Vision│
│ - Direct - Zone detection - Symbols DB │
│ description - Hierarchy - Expert rules│
│ │
└──────────────────────────────────────────────────────────────┘
Strategy by Type
| Type | Primary Extraction | Enrichment |
|---|---|---|
| Flowchart | GPT-4V description | Mermaid code |
| UML | Vision + OCR | PlantUML code |
| Architecture | Zones + relations | DOT/Graphviz |
| Electrical schema | Symbols + OCR | Netlist |
| Infographic | Sections + text | Markdown structure |
Extraction with Vision Models
Prompt Engineering for Diagrams
DEVELOPERpythonfrom openai import OpenAI import base64 def extract_diagram_info( image_path: str, diagram_type: str = "auto", client: OpenAI = None ) -> dict: """ Extract structured information from a diagram. """ if client is None: client = OpenAI() with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") # Type-specific prompts prompts = { "flowchart": """Analyze this flowchart/flow diagram: 1. **Main steps**: List each step in order 2. **Decision points**: Identify conditions/branches 3. **Data flow**: Describe the main path and alternatives 4. **Input/Output**: Identify start and end points Structure your response for easy search.""", "architecture": """Analyze this architecture diagram: 1. **Components**: List all elements (services, databases, APIs) 2. **Connections**: Describe links between components (protocols, flows) 3. **Layers**: Identify layers (frontend, backend, data, infra) 4. **Technologies**: Spot mentioned technologies Provide a hierarchical view.""", "uml": """Analyze this UML diagram: 1. **Diagram type**: Classes, sequences, use cases, etc. 2. **Entities**: List classes/objects/actors 3. **Relationships**: Inheritance, composition, association, dependencies 4. **Methods/Attributes**: If visible, list them Structure in technical format.""", "infographic": """Analyze this infographic: 1. **Main theme**: What is the subject? 2. **Sections**: Break down into logical zones 3. **Key data**: Numbers, statistics, facts 4. **Hierarchy**: Suggested reading order Extract content in a structured way.""", "auto": """Analyze this diagram/schema: 1. **Type**: Identify the diagram type 2. **Elements**: List all visible components 3. **Relationships**: Describe connections/flows between elements 4. **Text**: Extract all visible text 5. **Context**: What domain/use does this diagram represent? Be exhaustive and structure your response.""" } prompt = prompts.get(diagram_type, prompts["auto"]) response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{img_base64}", "detail": "high" } } ] }], max_tokens=2000 ) return { "diagram_type": diagram_type, "extraction": response.choices[0].message.content, "tokens_used": response.usage.total_tokens }
Diagram Code Generation
DEVELOPERpythondef diagram_to_code( image_path: str, output_format: str = "mermaid", client: OpenAI = None ) -> str: """ Convert a visual diagram to reproducible code. """ if client is None: client = OpenAI() with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") format_instructions = { "mermaid": """Convert this diagram to Mermaid code. Syntax example: ```mermaid graph TD A[Start] --> B{Decision} B -->|Yes| C[Action 1] B -->|No| D[Action 2] ```""", "plantuml": """Convert this diagram to PlantUML code. Example: ```plantuml @startuml class User { +name: String +login(): void } @enduml ```""", "dot": """Convert this diagram to DOT/Graphviz code. Example: ```dot digraph G { A -> B -> C; B -> D; } ```""" } prompt = f"""{format_instructions.get(output_format, format_instructions['mermaid'])} Analyze the diagram and generate the corresponding {output_format} code. Be precise about relationships and labels.""" response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{img_base64}", "detail": "high" } } ] }], max_tokens=2000 ) return response.choices[0].message.content
Advanced Multi-Zone Extraction
Region of Interest Detection
DEVELOPERpythonimport cv2 import numpy as np from dataclasses import dataclass from typing import List, Tuple @dataclass class DiagramRegion: bbox: Tuple[int, int, int, int] # x, y, w, h region_type: str # box, text, connector, icon content: np.ndarray confidence: float def detect_diagram_regions(image_path: str) -> List[DiagramRegion]: """ Detect regions of interest in a diagram. """ img = cv2.imread(image_path) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Edge detection edges = cv2.Canny(gray, 50, 150) contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) regions = [] for contour in contours: area = cv2.contourArea(contour) if area < 100: # Filter noise continue x, y, w, h = cv2.boundingRect(contour) # Classify region type aspect_ratio = w / h if h > 0 else 0 if 0.8 < aspect_ratio < 1.2 and area > 500: region_type = "box" elif aspect_ratio > 3: region_type = "connector" else: region_type = "unknown" regions.append(DiagramRegion( bbox=(x, y, w, h), region_type=region_type, content=img[y:y+h, x:x+w], confidence=0.8 )) return regions def analyze_regions_separately( image_path: str, regions: List[DiagramRegion], client: OpenAI ) -> List[dict]: """ Analyze each region separately for more precision. """ results = [] for i, region in enumerate(regions): if region.region_type != "box": continue # Encode region _, buffer = cv2.imencode('.png', region.content) img_base64 = base64.b64encode(buffer).decode('utf-8') response = client.chat.completions.create( model="gpt-4o-mini", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Describe the content of this diagram element in one sentence."}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{img_base64}", "detail": "low" } } ] }], max_tokens=100 ) results.append({ "region_id": i, "bbox": region.bbox, "type": region.region_type, "description": response.choices[0].message.content }) return results
Diagram-Specific OCR
DEVELOPERpythonimport pytesseract from PIL import Image def extract_diagram_text( image_path: str, preprocess: bool = True ) -> dict: """ Extract text from a diagram with optimized preprocessing. """ img = cv2.imread(image_path) if preprocess: # Convert to grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Adaptive binarization (better for diagrams) binary = cv2.adaptiveThreshold( gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2 ) # Dilation to connect fragmented characters kernel = np.ones((1, 1), np.uint8) processed = cv2.dilate(binary, kernel, iterations=1) else: processed = img # OCR with diagram-optimized config custom_config = r'--oem 3 --psm 11' # PSM 11 = Sparse text text = pytesseract.image_to_string(processed, config=custom_config) # Extraction with positions data = pytesseract.image_to_data(processed, output_type=pytesseract.Output.DICT) text_elements = [] for i, word in enumerate(data['text']): if word.strip(): text_elements.append({ "text": word, "x": data['left'][i], "y": data['top'][i], "width": data['width'][i], "height": data['height'][i], "confidence": data['conf'][i] }) return { "full_text": text, "elements": text_elements, "element_count": len(text_elements) }
RAG Indexing
Enriched Data Structure
DEVELOPERpythonfrom dataclasses import dataclass, field from typing import Optional, List @dataclass class DiagramDocument: """Indexable document for a diagram.""" diagram_id: str source_file: str diagram_type: str # Main extraction description: str extracted_text: str # Structure components: List[str] relationships: List[dict] hierarchy: Optional[dict] # Generated code mermaid_code: Optional[str] plantuml_code: Optional[str] # Metadata domain: str # IT, business, industrial technologies: List[str] = field(default_factory=list) keywords: List[str] = field(default_factory=list) def to_embedding_text(self) -> str: """Optimized text for embedding.""" parts = [ f"Type: {self.diagram_type}", f"Domain: {self.domain}", f"Description: {self.description}", ] if self.components: parts.append(f"Components: {', '.join(self.components)}") if self.technologies: parts.append(f"Technologies: {', '.join(self.technologies)}") if self.extracted_text: parts.append(f"Text content: {self.extracted_text}") return "\n".join(parts) def to_searchable_chunks(self) -> List[dict]: """Split into chunks for granular indexing.""" chunks = [] # Main chunk chunks.append({ "chunk_type": "overview", "content": self.to_embedding_text(), "metadata": { "diagram_id": self.diagram_id, "diagram_type": self.diagram_type } }) # Per-component chunks for component in self.components: related_rels = [ r for r in self.relationships if component in str(r) ] chunks.append({ "chunk_type": "component", "content": f"Component: {component}. Relations: {related_rels}", "metadata": { "diagram_id": self.diagram_id, "component_name": component } }) return chunks
Complete Indexing Pipeline
DEVELOPERpythonfrom qdrant_client import QdrantClient from qdrant_client.models import VectorParams, Distance, PointStruct import hashlib class DiagramRAGPipeline: def __init__(self): self.qdrant = QdrantClient(url="http://localhost:6333") self.openai = OpenAI() self.collection_name = "diagram_rag" def create_collection(self): self.qdrant.recreate_collection( collection_name=self.collection_name, vectors_config=VectorParams( size=1536, distance=Distance.COSINE ) ) def process_diagram( self, image_path: str, diagram_type: str = "auto", domain: str = "general" ) -> DiagramDocument: """Complete processing pipeline.""" diagram_id = hashlib.md5(image_path.encode()).hexdigest() print("1. Main extraction...") extraction = extract_diagram_info(image_path, diagram_type, self.openai) print("2. OCR...") ocr_result = extract_diagram_text(image_path) print("3. Mermaid code generation...") mermaid = diagram_to_code(image_path, "mermaid", self.openai) print("4. Component and relationship extraction...") components_prompt = f""" From this diagram description, extract: 1. List of components/elements (JSON array) 2. List of relationships (JSON array of objects with from, to, type) 3. Mentioned technologies (JSON array) 4. Keywords (JSON array) Description: {extraction['extraction']} OCR Text: {ocr_result['full_text']} JSON format: {{ "components": [], "relationships": [], "technologies": [], "keywords": [] }} """ response = self.openai.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": components_prompt}], response_format={"type": "json_object"} ) import json structured = json.loads(response.choices[0].message.content) return DiagramDocument( diagram_id=diagram_id, source_file=image_path, diagram_type=extraction['diagram_type'], description=extraction['extraction'], extracted_text=ocr_result['full_text'], components=structured.get('components', []), relationships=structured.get('relationships', []), hierarchy=None, mermaid_code=mermaid, plantuml_code=None, domain=domain, technologies=structured.get('technologies', []), keywords=structured.get('keywords', []) ) def index_diagram(self, doc: DiagramDocument): """Index a diagram.""" chunks = doc.to_searchable_chunks() points = [] for i, chunk in enumerate(chunks): # Embedding response = self.openai.embeddings.create( model="text-embedding-3-small", input=chunk["content"] ) embedding = response.data[0].embedding point = PointStruct( id=hash(f"{doc.diagram_id}_{i}") % (2**63), vector=embedding, payload={ **chunk["metadata"], "content": chunk["content"], "chunk_type": chunk["chunk_type"], "source_file": doc.source_file, "mermaid_code": doc.mermaid_code } ) points.append(point) self.qdrant.upsert( collection_name=self.collection_name, points=points ) print(f"Indexed {len(points)} chunks")
Search and Generation
Search with Visual Context
DEVELOPERpythondef search_diagrams( query: str, pipeline: DiagramRAGPipeline, limit: int = 5, filter_type: str = None ) -> List[dict]: """Search indexed diagrams.""" response = pipeline.openai.embeddings.create( model="text-embedding-3-small", input=query ) query_embedding = response.data[0].embedding filter_conditions = None if filter_type: from qdrant_client.models import Filter, FieldCondition, MatchValue filter_conditions = Filter( must=[FieldCondition(key="diagram_type", match=MatchValue(value=filter_type))] ) results = pipeline.qdrant.search( collection_name=pipeline.collection_name, query_vector=query_embedding, query_filter=filter_conditions, limit=limit ) return [ { "source": r.payload["source_file"], "diagram_type": r.payload.get("diagram_type"), "content": r.payload["content"][:300], "mermaid": r.payload.get("mermaid_code"), "score": r.score } for r in results ]
Response Generation with Diagram
DEVELOPERpythondef answer_with_diagrams( query: str, retrieved: List[dict], client: OpenAI ) -> str: """Generate response including relevant diagrams.""" context = "\n\n".join([ f"**Diagram: {r['source']}** (type: {r['diagram_type']})\n{r['content']}" for r in retrieved ]) # Include Mermaid code if available mermaid_codes = [r['mermaid'] for r in retrieved if r.get('mermaid')] prompt = f"""You are a technical assistant that answers questions using diagrams as source. Available diagrams: {context} Question: {query} Instructions: 1. Base your answer on the provided diagrams 2. If relevant, include Mermaid code so users can reproduce 3. Explain relationships between components 4. Cite sources [Diagram: name]""" if mermaid_codes: prompt += f"\n\nAvailable Mermaid codes:\n" + "\n".join(mermaid_codes[:2]) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}], max_tokens=1500 ) return response.choices[0].message.content
Specific Use Cases
IT Architecture
DEVELOPERpythondef analyze_architecture_diagram(image_path: str, client: OpenAI) -> dict: """Specialized analysis for IT architectures.""" with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") prompt = """Analyze this system architecture in detail: 1. **Services/Applications**: List with their role 2. **Databases**: Types (SQL, NoSQL, cache) 3. **Communication**: Protocols (REST, gRPC, MQ) 4. **Infrastructure**: Cloud, on-premise, containers 5. **Security**: Visible firewalls, auth, encryption 6. **Scalability**: Load balancers, replicas Structured technical format.""" response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}", "detail": "high"}} ] }], max_tokens=2000 ) return {"analysis": response.choices[0].message.content}
Electrical/P&ID Schemas
DEVELOPERpythonELECTRICAL_SYMBOLS = { "resistor": "Resistor", "capacitor": "Capacitor", "inductor": "Inductor", "diode": "Diode", "transistor": "Transistor", "ground": "Ground", "battery": "Battery/Power Supply" } def analyze_electrical_diagram(image_path: str, client: OpenAI) -> dict: """Specialized analysis for electrical schemas.""" with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") symbols_context = "\n".join([f"- {k}: {v}" for k, v in ELECTRICAL_SYMBOLS.items()]) prompt = f"""Analyze this electrical/electronic schema: Common symbols: {symbols_context} Provide: 1. **Components**: List with values if visible (R1=10k, C1=100uF) 2. **Connections**: How components are linked 3. **Functional blocks**: Power supply, amplification, filtering 4. **Signals**: Identifiable inputs/outputs 5. **Overall function**: What does this circuit do? Technical format.""" response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}", "detail": "high"}} ] }], max_tokens=2000 ) return {"analysis": response.choices[0].message.content}
Costs and Performance
Costs per Diagram
| Operation | Cost | Notes |
|---|---|---|
| GPT-4o extraction | $0.02-0.05 | Depends on complexity |
| OCR (local) | $0 | Tesseract |
| Code generation | $0.01-0.03 | Mermaid/PlantUML |
| Embedding | $0.0001 | text-embedding-3-small |
| Total | ~$0.05-0.10 | Per diagram |
Accuracy by Type
| Diagram Type | Extraction Accuracy | Code Accuracy |
|---|---|---|
| Simple flowchart | 95% | 90% |
| IT Architecture | 85% | 75% |
| UML Classes | 80% | 70% |
| Electrical schema | 70% | 50% |
| Infographic | 90% | N/A |
Integration with Ailog
Ailog supports diagram indexing:
- Upload: PNG, JPG, SVG, PDF
- Auto detection: Diagram type
- Smart extraction: Components and relationships
- Generated code: Reproducible Mermaid
Related Guides
Tags
Related Posts
Image RAG: Vision Models and Visual Search
Complete guide to integrating images into your RAG system: vision models, multimodal embeddings, indexing and visual search with GPT-4V, Claude Vision and CLIP.
Multimodal RAG: Images, PDFs, and Beyond Text
Extend your RAG beyond text: image indexing, PDF extraction, tables, and charts for a truly complete assistant.
Video RAG: Index and Search Your Videos
Complete guide to integrating video into your RAG system: frame extraction, audio transcription, scene detection and multimodal indexing.