GuideAdvanced

Diagrams and Schemas: Extracting Visual Information

March 22, 2026
20 min read
Ailog Team

Complete guide to integrating diagrams, technical schemas and infographics into your RAG system: extraction, interpretation and indexing with vision models.

Diagrams and Schemas: Extracting Visual Information

Diagrams, technical schemas and infographics contain enormous information density. A system architecture, flowchart or electrical schema encodes hours of documentation in a single image. This guide shows you how to make this visual content searchable in your RAG system.

The Diagram Challenge

Why It's Different from Photos

Image TypeCharacteristicsRAG Challenge
PhotoContinuous pixels, recognizable objectsVision models excel
DiagramGeometric shapes, relationships, textStructure to understand
Technical schemaStandardized symbols, conventionsSpecific vocabulary
InfographicText/visual mix, hierarchyOrdered extraction

Business Use Cases

  • IT: System architectures, UML diagrams, network schemas
  • Industry: Technical plans, electrical schemas, P&ID
  • Business: Org charts, process maps, flowcharts
  • Data: ERD diagrams, data lineage, pipelines
  • Marketing: Infographics, presentations, visual reports

Indexing ROI

  • -80% search time in technical documentation
  • +50% understanding of complex systems
  • Traceability: Find the origin of an architecture decision

Diagram Types and Strategies

Classification by Complexity

┌──────────────────────────────────────────────────────────────┐
│                 DIAGRAM COMPLEXITY                           │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  SIMPLE                    MEDIUM                 COMPLEX    │
│    │                         │                      │        │
│    ▼                         ▼                      ▼        │
│  ┌────────┐            ┌──────────┐          ┌───────────┐  │
│  │ Simple │            │ System   │          │Industrial │  │
│  │Flowchart│           │Architecture│        │ P&ID      │  │
│  └────────┘            └──────────┘          └───────────┘  │
│                                                              │
│  Approach:              Approach:             Approach:     │
│  - Vision model         - Multi-pass          - OCR + Vision│
│  - Direct               - Zone detection      - Symbols DB  │
│    description          - Hierarchy           - Expert rules│
│                                                              │
└──────────────────────────────────────────────────────────────┘

Strategy by Type

TypePrimary ExtractionEnrichment
FlowchartGPT-4V descriptionMermaid code
UMLVision + OCRPlantUML code
ArchitectureZones + relationsDOT/Graphviz
Electrical schemaSymbols + OCRNetlist
InfographicSections + textMarkdown structure

Extraction with Vision Models

Prompt Engineering for Diagrams

DEVELOPERpython
from openai import OpenAI import base64 def extract_diagram_info( image_path: str, diagram_type: str = "auto", client: OpenAI = None ) -> dict: """ Extract structured information from a diagram. """ if client is None: client = OpenAI() with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") # Type-specific prompts prompts = { "flowchart": """Analyze this flowchart/flow diagram: 1. **Main steps**: List each step in order 2. **Decision points**: Identify conditions/branches 3. **Data flow**: Describe the main path and alternatives 4. **Input/Output**: Identify start and end points Structure your response for easy search.""", "architecture": """Analyze this architecture diagram: 1. **Components**: List all elements (services, databases, APIs) 2. **Connections**: Describe links between components (protocols, flows) 3. **Layers**: Identify layers (frontend, backend, data, infra) 4. **Technologies**: Spot mentioned technologies Provide a hierarchical view.""", "uml": """Analyze this UML diagram: 1. **Diagram type**: Classes, sequences, use cases, etc. 2. **Entities**: List classes/objects/actors 3. **Relationships**: Inheritance, composition, association, dependencies 4. **Methods/Attributes**: If visible, list them Structure in technical format.""", "infographic": """Analyze this infographic: 1. **Main theme**: What is the subject? 2. **Sections**: Break down into logical zones 3. **Key data**: Numbers, statistics, facts 4. **Hierarchy**: Suggested reading order Extract content in a structured way.""", "auto": """Analyze this diagram/schema: 1. **Type**: Identify the diagram type 2. **Elements**: List all visible components 3. **Relationships**: Describe connections/flows between elements 4. **Text**: Extract all visible text 5. **Context**: What domain/use does this diagram represent? Be exhaustive and structure your response.""" } prompt = prompts.get(diagram_type, prompts["auto"]) response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{img_base64}", "detail": "high" } } ] }], max_tokens=2000 ) return { "diagram_type": diagram_type, "extraction": response.choices[0].message.content, "tokens_used": response.usage.total_tokens }

Diagram Code Generation

DEVELOPERpython
def diagram_to_code( image_path: str, output_format: str = "mermaid", client: OpenAI = None ) -> str: """ Convert a visual diagram to reproducible code. """ if client is None: client = OpenAI() with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") format_instructions = { "mermaid": """Convert this diagram to Mermaid code. Syntax example: ```mermaid graph TD A[Start] --> B{Decision} B -->|Yes| C[Action 1] B -->|No| D[Action 2] ```""", "plantuml": """Convert this diagram to PlantUML code. Example: ```plantuml @startuml class User { +name: String +login(): void } @enduml ```""", "dot": """Convert this diagram to DOT/Graphviz code. Example: ```dot digraph G { A -> B -> C; B -> D; } ```""" } prompt = f"""{format_instructions.get(output_format, format_instructions['mermaid'])} Analyze the diagram and generate the corresponding {output_format} code. Be precise about relationships and labels.""" response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{img_base64}", "detail": "high" } } ] }], max_tokens=2000 ) return response.choices[0].message.content

Advanced Multi-Zone Extraction

Region of Interest Detection

DEVELOPERpython
import cv2 import numpy as np from dataclasses import dataclass from typing import List, Tuple @dataclass class DiagramRegion: bbox: Tuple[int, int, int, int] # x, y, w, h region_type: str # box, text, connector, icon content: np.ndarray confidence: float def detect_diagram_regions(image_path: str) -> List[DiagramRegion]: """ Detect regions of interest in a diagram. """ img = cv2.imread(image_path) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Edge detection edges = cv2.Canny(gray, 50, 150) contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) regions = [] for contour in contours: area = cv2.contourArea(contour) if area < 100: # Filter noise continue x, y, w, h = cv2.boundingRect(contour) # Classify region type aspect_ratio = w / h if h > 0 else 0 if 0.8 < aspect_ratio < 1.2 and area > 500: region_type = "box" elif aspect_ratio > 3: region_type = "connector" else: region_type = "unknown" regions.append(DiagramRegion( bbox=(x, y, w, h), region_type=region_type, content=img[y:y+h, x:x+w], confidence=0.8 )) return regions def analyze_regions_separately( image_path: str, regions: List[DiagramRegion], client: OpenAI ) -> List[dict]: """ Analyze each region separately for more precision. """ results = [] for i, region in enumerate(regions): if region.region_type != "box": continue # Encode region _, buffer = cv2.imencode('.png', region.content) img_base64 = base64.b64encode(buffer).decode('utf-8') response = client.chat.completions.create( model="gpt-4o-mini", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Describe the content of this diagram element in one sentence."}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{img_base64}", "detail": "low" } } ] }], max_tokens=100 ) results.append({ "region_id": i, "bbox": region.bbox, "type": region.region_type, "description": response.choices[0].message.content }) return results

Diagram-Specific OCR

DEVELOPERpython
import pytesseract from PIL import Image def extract_diagram_text( image_path: str, preprocess: bool = True ) -> dict: """ Extract text from a diagram with optimized preprocessing. """ img = cv2.imread(image_path) if preprocess: # Convert to grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Adaptive binarization (better for diagrams) binary = cv2.adaptiveThreshold( gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2 ) # Dilation to connect fragmented characters kernel = np.ones((1, 1), np.uint8) processed = cv2.dilate(binary, kernel, iterations=1) else: processed = img # OCR with diagram-optimized config custom_config = r'--oem 3 --psm 11' # PSM 11 = Sparse text text = pytesseract.image_to_string(processed, config=custom_config) # Extraction with positions data = pytesseract.image_to_data(processed, output_type=pytesseract.Output.DICT) text_elements = [] for i, word in enumerate(data['text']): if word.strip(): text_elements.append({ "text": word, "x": data['left'][i], "y": data['top'][i], "width": data['width'][i], "height": data['height'][i], "confidence": data['conf'][i] }) return { "full_text": text, "elements": text_elements, "element_count": len(text_elements) }

RAG Indexing

Enriched Data Structure

DEVELOPERpython
from dataclasses import dataclass, field from typing import Optional, List @dataclass class DiagramDocument: """Indexable document for a diagram.""" diagram_id: str source_file: str diagram_type: str # Main extraction description: str extracted_text: str # Structure components: List[str] relationships: List[dict] hierarchy: Optional[dict] # Generated code mermaid_code: Optional[str] plantuml_code: Optional[str] # Metadata domain: str # IT, business, industrial technologies: List[str] = field(default_factory=list) keywords: List[str] = field(default_factory=list) def to_embedding_text(self) -> str: """Optimized text for embedding.""" parts = [ f"Type: {self.diagram_type}", f"Domain: {self.domain}", f"Description: {self.description}", ] if self.components: parts.append(f"Components: {', '.join(self.components)}") if self.technologies: parts.append(f"Technologies: {', '.join(self.technologies)}") if self.extracted_text: parts.append(f"Text content: {self.extracted_text}") return "\n".join(parts) def to_searchable_chunks(self) -> List[dict]: """Split into chunks for granular indexing.""" chunks = [] # Main chunk chunks.append({ "chunk_type": "overview", "content": self.to_embedding_text(), "metadata": { "diagram_id": self.diagram_id, "diagram_type": self.diagram_type } }) # Per-component chunks for component in self.components: related_rels = [ r for r in self.relationships if component in str(r) ] chunks.append({ "chunk_type": "component", "content": f"Component: {component}. Relations: {related_rels}", "metadata": { "diagram_id": self.diagram_id, "component_name": component } }) return chunks

Complete Indexing Pipeline

DEVELOPERpython
from qdrant_client import QdrantClient from qdrant_client.models import VectorParams, Distance, PointStruct import hashlib class DiagramRAGPipeline: def __init__(self): self.qdrant = QdrantClient(url="http://localhost:6333") self.openai = OpenAI() self.collection_name = "diagram_rag" def create_collection(self): self.qdrant.recreate_collection( collection_name=self.collection_name, vectors_config=VectorParams( size=1536, distance=Distance.COSINE ) ) def process_diagram( self, image_path: str, diagram_type: str = "auto", domain: str = "general" ) -> DiagramDocument: """Complete processing pipeline.""" diagram_id = hashlib.md5(image_path.encode()).hexdigest() print("1. Main extraction...") extraction = extract_diagram_info(image_path, diagram_type, self.openai) print("2. OCR...") ocr_result = extract_diagram_text(image_path) print("3. Mermaid code generation...") mermaid = diagram_to_code(image_path, "mermaid", self.openai) print("4. Component and relationship extraction...") components_prompt = f""" From this diagram description, extract: 1. List of components/elements (JSON array) 2. List of relationships (JSON array of objects with from, to, type) 3. Mentioned technologies (JSON array) 4. Keywords (JSON array) Description: {extraction['extraction']} OCR Text: {ocr_result['full_text']} JSON format: {{ "components": [], "relationships": [], "technologies": [], "keywords": [] }} """ response = self.openai.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": components_prompt}], response_format={"type": "json_object"} ) import json structured = json.loads(response.choices[0].message.content) return DiagramDocument( diagram_id=diagram_id, source_file=image_path, diagram_type=extraction['diagram_type'], description=extraction['extraction'], extracted_text=ocr_result['full_text'], components=structured.get('components', []), relationships=structured.get('relationships', []), hierarchy=None, mermaid_code=mermaid, plantuml_code=None, domain=domain, technologies=structured.get('technologies', []), keywords=structured.get('keywords', []) ) def index_diagram(self, doc: DiagramDocument): """Index a diagram.""" chunks = doc.to_searchable_chunks() points = [] for i, chunk in enumerate(chunks): # Embedding response = self.openai.embeddings.create( model="text-embedding-3-small", input=chunk["content"] ) embedding = response.data[0].embedding point = PointStruct( id=hash(f"{doc.diagram_id}_{i}") % (2**63), vector=embedding, payload={ **chunk["metadata"], "content": chunk["content"], "chunk_type": chunk["chunk_type"], "source_file": doc.source_file, "mermaid_code": doc.mermaid_code } ) points.append(point) self.qdrant.upsert( collection_name=self.collection_name, points=points ) print(f"Indexed {len(points)} chunks")

Search and Generation

Search with Visual Context

DEVELOPERpython
def search_diagrams( query: str, pipeline: DiagramRAGPipeline, limit: int = 5, filter_type: str = None ) -> List[dict]: """Search indexed diagrams.""" response = pipeline.openai.embeddings.create( model="text-embedding-3-small", input=query ) query_embedding = response.data[0].embedding filter_conditions = None if filter_type: from qdrant_client.models import Filter, FieldCondition, MatchValue filter_conditions = Filter( must=[FieldCondition(key="diagram_type", match=MatchValue(value=filter_type))] ) results = pipeline.qdrant.search( collection_name=pipeline.collection_name, query_vector=query_embedding, query_filter=filter_conditions, limit=limit ) return [ { "source": r.payload["source_file"], "diagram_type": r.payload.get("diagram_type"), "content": r.payload["content"][:300], "mermaid": r.payload.get("mermaid_code"), "score": r.score } for r in results ]

Response Generation with Diagram

DEVELOPERpython
def answer_with_diagrams( query: str, retrieved: List[dict], client: OpenAI ) -> str: """Generate response including relevant diagrams.""" context = "\n\n".join([ f"**Diagram: {r['source']}** (type: {r['diagram_type']})\n{r['content']}" for r in retrieved ]) # Include Mermaid code if available mermaid_codes = [r['mermaid'] for r in retrieved if r.get('mermaid')] prompt = f"""You are a technical assistant that answers questions using diagrams as source. Available diagrams: {context} Question: {query} Instructions: 1. Base your answer on the provided diagrams 2. If relevant, include Mermaid code so users can reproduce 3. Explain relationships between components 4. Cite sources [Diagram: name]""" if mermaid_codes: prompt += f"\n\nAvailable Mermaid codes:\n" + "\n".join(mermaid_codes[:2]) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}], max_tokens=1500 ) return response.choices[0].message.content

Specific Use Cases

IT Architecture

DEVELOPERpython
def analyze_architecture_diagram(image_path: str, client: OpenAI) -> dict: """Specialized analysis for IT architectures.""" with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") prompt = """Analyze this system architecture in detail: 1. **Services/Applications**: List with their role 2. **Databases**: Types (SQL, NoSQL, cache) 3. **Communication**: Protocols (REST, gRPC, MQ) 4. **Infrastructure**: Cloud, on-premise, containers 5. **Security**: Visible firewalls, auth, encryption 6. **Scalability**: Load balancers, replicas Structured technical format.""" response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}", "detail": "high"}} ] }], max_tokens=2000 ) return {"analysis": response.choices[0].message.content}

Electrical/P&ID Schemas

DEVELOPERpython
ELECTRICAL_SYMBOLS = { "resistor": "Resistor", "capacitor": "Capacitor", "inductor": "Inductor", "diode": "Diode", "transistor": "Transistor", "ground": "Ground", "battery": "Battery/Power Supply" } def analyze_electrical_diagram(image_path: str, client: OpenAI) -> dict: """Specialized analysis for electrical schemas.""" with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode("utf-8") symbols_context = "\n".join([f"- {k}: {v}" for k, v in ELECTRICAL_SYMBOLS.items()]) prompt = f"""Analyze this electrical/electronic schema: Common symbols: {symbols_context} Provide: 1. **Components**: List with values if visible (R1=10k, C1=100uF) 2. **Connections**: How components are linked 3. **Functional blocks**: Power supply, amplification, filtering 4. **Signals**: Identifiable inputs/outputs 5. **Overall function**: What does this circuit do? Technical format.""" response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}", "detail": "high"}} ] }], max_tokens=2000 ) return {"analysis": response.choices[0].message.content}

Costs and Performance

Costs per Diagram

OperationCostNotes
GPT-4o extraction$0.02-0.05Depends on complexity
OCR (local)$0Tesseract
Code generation$0.01-0.03Mermaid/PlantUML
Embedding$0.0001text-embedding-3-small
Total~$0.05-0.10Per diagram

Accuracy by Type

Diagram TypeExtraction AccuracyCode Accuracy
Simple flowchart95%90%
IT Architecture85%75%
UML Classes80%70%
Electrical schema70%50%
Infographic90%N/A

Integration with Ailog

Ailog supports diagram indexing:

  1. Upload: PNG, JPG, SVG, PDF
  2. Auto detection: Diagram type
  3. Smart extraction: Components and relationships
  4. Generated code: Reproducible Mermaid

Try Diagram RAG on Ailog

Related Guides

Tags

RAGmultimodaldiagramsschemasvisioninfographicsextraction

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !