Diagrams and Schemas: Extracting Visual Information

Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

Diagrams, technical schemas and infographics contain enormous information density. A system architecture, flowchart or electrical schema encodes hours of documentation in a single image. This guide shows you how to make this visual content searchable in your RAG system.

The Diagram Challenge

Why It's Different from Photos

Image Type	Characteristics	RAG Challenge
Photo	Continuous pixels, recognizable objects	Vision models excel
Diagram	Geometric shapes, relationships, text	Structure to understand
Technical schema	Standardized symbols, conventions	Specific vocabulary
Infographic	Text/visual mix, hierarchy	Ordered extraction

Business Use Cases

IT: System architectures, UML diagrams, network schemas
Industry: Technical plans, electrical schemas, P&ID
Business: Org charts, process maps, flowcharts
Data: ERD diagrams, data lineage, pipelines
Marketing: Infographics, presentations, visual reports

Indexing ROI

-80% search time in technical documentation
+50% understanding of complex systems
Traceability: Find the origin of an architecture decision

Diagram Types and Strategies

Classification by Complexity

┌──────────────────────────────────────────────────────────────┐
│                 DIAGRAM COMPLEXITY                           │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  SIMPLE                    MEDIUM                 COMPLEX    │
│    │                         │                      │        │
│    ▼                         ▼                      ▼        │
│  ┌────────┐            ┌──────────┐          ┌───────────┐  │
│  │ Simple │            │ System   │          │Industrial │  │
│  │Flowchart│           │Architecture│        │ P&ID      │  │
│  └────────┘            └──────────┘          └───────────┘  │
│                                                              │
│  Approach:              Approach:             Approach:     │
│  - Vision model         - Multi-pass          - OCR + Vision│
│  - Direct               - Zone detection      - Symbols DB  │
│    description          - Hierarchy           - Expert rules│
│                                                              │
└──────────────────────────────────────────────────────────────┘

Strategy by Type

Type	Primary Extraction	Enrichment
Flowchart	GPT-4V description	Mermaid code
UML	Vision + OCR	PlantUML code
Architecture	Zones + relations	DOT/Graphviz
Electrical schema	Symbols + OCR	Netlist
Infographic	Sections + text	Markdown structure

Extraction with Vision Models

Prompt Engineering for Diagrams

DEVELOPERpython
from openai import OpenAI
import base64

def extract_diagram_info(
    image_path: str,
    diagram_type: str = "auto",
    client: OpenAI = None
) -> dict:
    """
    Extract structured information from a diagram.
    """
    if client is None:
        client = OpenAI()

    with open(image_path, "rb") as f:
        img_base64 = base64.b64encode(f.read()).decode("utf-8")

    # Type-specific prompts
    prompts = {
        "flowchart": """Analyze this flowchart/flow diagram:

1. **Main steps**: List each step in order
2. **Decision points**: Identify conditions/branches
3. **Data flow**: Describe the main path and alternatives
4. **Input/Output**: Identify start and end points

Structure your response for easy search.""",

        "architecture": """Analyze this architecture diagram:

1. **Components**: List all elements (services, databases, APIs)
2. **Connections**: Describe links between components (protocols, flows)
3. **Layers**: Identify layers (frontend, backend, data, infra)
4. **Technologies**: Spot mentioned technologies

Provide a hierarchical view.""",

        "uml": """Analyze this UML diagram:

1. **Diagram type**: Classes, sequences, use cases, etc.
2. **Entities**: List classes/objects/actors
3. **Relationships**: Inheritance, composition, association, dependencies
4. **Methods/Attributes**: If visible, list them

Structure in technical format.""",

        "infographic": """Analyze this infographic:

1. **Main theme**: What is the subject?
2. **Sections**: Break down into logical zones
3. **Key data**: Numbers, statistics, facts
4. **Hierarchy**: Suggested reading order

Extract content in a structured way.""",

        "auto": """Analyze this diagram/schema:

1. **Type**: Identify the diagram type
2. **Elements**: List all visible components
3. **Relationships**: Describe connections/flows between elements
4. **Text**: Extract all visible text
5. **Context**: What domain/use does this diagram represent?

Be exhaustive and structure your response."""
    }

    prompt = prompts.get(diagram_type, prompts["auto"])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{img_base64}",
                        "detail": "high"
                    }
                }
            ]
        }],
        max_tokens=2000
    )

    return {
        "diagram_type": diagram_type,
        "extraction": response.choices[0].message.content,
        "tokens_used": response.usage.total_tokens
    }

Diagram Code Generation

DEVELOPERpython
def diagram_to_code(
    image_path: str,
    output_format: str = "mermaid",
    client: OpenAI = None
) -> str:
    """
    Convert a visual diagram to reproducible code.
    """
    if client is None:
        client = OpenAI()

    with open(image_path, "rb") as f:
        img_base64 = base64.b64encode(f.read()).decode("utf-8")

    format_instructions = {
        "mermaid": """Convert this diagram to Mermaid code.
Syntax example:
```mermaid
graph TD
    A[Start] --> B{Decision}
    B -->|Yes| C[Action 1]
    B -->|No| D[Action 2]
```""",

        "plantuml": """Convert this diagram to PlantUML code.
Example:
```plantuml
@startuml
class User {
  +name: String
  +login(): void
}
@enduml
```""",

        "dot": """Convert this diagram to DOT/Graphviz code.
Example:
```dot
digraph G {
    A -> B -> C;
    B -> D;
}
```"""
    }

    prompt = f"""{format_instructions.get(output_format, format_instructions['mermaid'])}

Analyze the diagram and generate the corresponding {output_format} code.
Be precise about relationships and labels."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{img_base64}",
                        "detail": "high"
                    }
                }
            ]
        }],
        max_tokens=2000
    )

    return response.choices[0].message.content

Advanced Multi-Zone Extraction

Region of Interest Detection

DEVELOPERpython
import cv2
import numpy as np
from dataclasses import dataclass
from typing import List, Tuple

@dataclass
class DiagramRegion:
    bbox: Tuple[int, int, int, int]  # x, y, w, h
    region_type: str  # box, text, connector, icon
    content: np.ndarray
    confidence: float

def detect_diagram_regions(image_path: str) -> List[DiagramRegion]:
    """
    Detect regions of interest in a diagram.
    """
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Edge detection
    edges = cv2.Canny(gray, 50, 150)
    contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    regions = []
    for contour in contours:
        area = cv2.contourArea(contour)
        if area < 100:  # Filter noise
            continue

        x, y, w, h = cv2.boundingRect(contour)

        # Classify region type
        aspect_ratio = w / h if h > 0 else 0
        if 0.8 < aspect_ratio < 1.2 and area > 500:
            region_type = "box"
        elif aspect_ratio > 3:
            region_type = "connector"
        else:
            region_type = "unknown"

        regions.append(DiagramRegion(
            bbox=(x, y, w, h),
            region_type=region_type,
            content=img[y:y+h, x:x+w],
            confidence=0.8
        ))

    return regions

def analyze_regions_separately(
    image_path: str,
    regions: List[DiagramRegion],
    client: OpenAI
) -> List[dict]:
    """
    Analyze each region separately for more precision.
    """
    results = []

    for i, region in enumerate(regions):
        if region.region_type != "box":
            continue

        # Encode region
        _, buffer = cv2.imencode('.png', region.content)
        img_base64 = base64.b64encode(buffer).decode('utf-8')

        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe the content of this diagram element in one sentence."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{img_base64}",
                            "detail": "low"
                        }
                    }
                ]
            }],
            max_tokens=100
        )

        results.append({
            "region_id": i,
            "bbox": region.bbox,
            "type": region.region_type,
            "description": response.choices[0].message.content
        })

    return results

Diagram-Specific OCR

DEVELOPERpython
import pytesseract
from PIL import Image

def extract_diagram_text(
    image_path: str,
    preprocess: bool = True
) -> dict:
    """
    Extract text from a diagram with optimized preprocessing.
    """
    img = cv2.imread(image_path)

    if preprocess:
        # Convert to grayscale
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        # Adaptive binarization (better for diagrams)
        binary = cv2.adaptiveThreshold(
            gray, 255,
            cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
            cv2.THRESH_BINARY,
            11, 2
        )

        # Dilation to connect fragmented characters
        kernel = np.ones((1, 1), np.uint8)
        processed = cv2.dilate(binary, kernel, iterations=1)
    else:
        processed = img

    # OCR with diagram-optimized config
    custom_config = r'--oem 3 --psm 11'  # PSM 11 = Sparse text
    text = pytesseract.image_to_string(processed, config=custom_config)

    # Extraction with positions
    data = pytesseract.image_to_data(processed, output_type=pytesseract.Output.DICT)

    text_elements = []
    for i, word in enumerate(data['text']):
        if word.strip():
            text_elements.append({
                "text": word,
                "x": data['left'][i],
                "y": data['top'][i],
                "width": data['width'][i],
                "height": data['height'][i],
                "confidence": data['conf'][i]
            })

    return {
        "full_text": text,
        "elements": text_elements,
        "element_count": len(text_elements)
    }

RAG Indexing

Enriched Data Structure

DEVELOPERpython
from dataclasses import dataclass, field
from typing import Optional, List

@dataclass
class DiagramDocument:
    """Indexable document for a diagram."""
    diagram_id: str
    source_file: str
    diagram_type: str

    # Main extraction
    description: str
    extracted_text: str

    # Structure
    components: List[str]
    relationships: List[dict]
    hierarchy: Optional[dict]

    # Generated code
    mermaid_code: Optional[str]
    plantuml_code: Optional[str]

    # Metadata
    domain: str  # IT, business, industrial
    technologies: List[str] = field(default_factory=list)
    keywords: List[str] = field(default_factory=list)

    def to_embedding_text(self) -> str:
        """Optimized text for embedding."""
        parts = [
            f"Type: {self.diagram_type}",
            f"Domain: {self.domain}",
            f"Description: {self.description}",
        ]

        if self.components:
            parts.append(f"Components: {', '.join(self.components)}")

        if self.technologies:
            parts.append(f"Technologies: {', '.join(self.technologies)}")

        if self.extracted_text:
            parts.append(f"Text content: {self.extracted_text}")

        return "\n".join(parts)

    def to_searchable_chunks(self) -> List[dict]:
        """Split into chunks for granular indexing."""
        chunks = []

        # Main chunk
        chunks.append({
            "chunk_type": "overview",
            "content": self.to_embedding_text(),
            "metadata": {
                "diagram_id": self.diagram_id,
                "diagram_type": self.diagram_type
            }
        })

        # Per-component chunks
        for component in self.components:
            related_rels = [
                r for r in self.relationships
                if component in str(r)
            ]
            chunks.append({
                "chunk_type": "component",
                "content": f"Component: {component}. Relations: {related_rels}",
                "metadata": {
                    "diagram_id": self.diagram_id,
                    "component_name": component
                }
            })

        return chunks

Complete Indexing Pipeline

DEVELOPERpython
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
import hashlib

class DiagramRAGPipeline:
    def __init__(self):
        self.qdrant = QdrantClient(url="http://localhost:6333")
        self.openai = OpenAI()
        self.collection_name = "diagram_rag"

    def create_collection(self):
        self.qdrant.recreate_collection(
            collection_name=self.collection_name,
            vectors_config=VectorParams(
                size=1536,
                distance=Distance.COSINE
            )
        )

    def process_diagram(
        self,
        image_path: str,
        diagram_type: str = "auto",
        domain: str = "general"
    ) -> DiagramDocument:
        """Complete processing pipeline."""
        diagram_id = hashlib.md5(image_path.encode()).hexdigest()

        print("1. Main extraction...")
        extraction = extract_diagram_info(image_path, diagram_type, self.openai)

        print("2. OCR...")
        ocr_result = extract_diagram_text(image_path)

        print("3. Mermaid code generation...")
        mermaid = diagram_to_code(image_path, "mermaid", self.openai)

        print("4. Component and relationship extraction...")
        components_prompt = f"""
From this diagram description, extract:
1. List of components/elements (JSON array)
2. List of relationships (JSON array of objects with from, to, type)
3. Mentioned technologies (JSON array)
4. Keywords (JSON array)

Description:
{extraction['extraction']}

OCR Text:
{ocr_result['full_text']}

JSON format:
{{
    "components": [],
    "relationships": [],
    "technologies": [],
    "keywords": []
}}
"""
        response = self.openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": components_prompt}],
            response_format={"type": "json_object"}
        )

        import json
        structured = json.loads(response.choices[0].message.content)

        return DiagramDocument(
            diagram_id=diagram_id,
            source_file=image_path,
            diagram_type=extraction['diagram_type'],
            description=extraction['extraction'],
            extracted_text=ocr_result['full_text'],
            components=structured.get('components', []),
            relationships=structured.get('relationships', []),
            hierarchy=None,
            mermaid_code=mermaid,
            plantuml_code=None,
            domain=domain,
            technologies=structured.get('technologies', []),
            keywords=structured.get('keywords', [])
        )

    def index_diagram(self, doc: DiagramDocument):
        """Index a diagram."""
        chunks = doc.to_searchable_chunks()
        points = []

        for i, chunk in enumerate(chunks):
            # Embedding
            response = self.openai.embeddings.create(
                model="text-embedding-3-small",
                input=chunk["content"]
            )
            embedding = response.data[0].embedding

            point = PointStruct(
                id=hash(f"{doc.diagram_id}_{i}") % (2**63),
                vector=embedding,
                payload={
                    **chunk["metadata"],
                    "content": chunk["content"],
                    "chunk_type": chunk["chunk_type"],
                    "source_file": doc.source_file,
                    "mermaid_code": doc.mermaid_code
                }
            )
            points.append(point)

        self.qdrant.upsert(
            collection_name=self.collection_name,
            points=points
        )

        print(f"Indexed {len(points)} chunks")

Search and Generation

Search with Visual Context

DEVELOPERpython
def search_diagrams(
    query: str,
    pipeline: DiagramRAGPipeline,
    limit: int = 5,
    filter_type: str = None
) -> List[dict]:
    """Search indexed diagrams."""

    response = pipeline.openai.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )
    query_embedding = response.data[0].embedding

    filter_conditions = None
    if filter_type:
        from qdrant_client.models import Filter, FieldCondition, MatchValue
        filter_conditions = Filter(
            must=[FieldCondition(key="diagram_type", match=MatchValue(value=filter_type))]
        )

    results = pipeline.qdrant.search(
        collection_name=pipeline.collection_name,
        query_vector=query_embedding,
        query_filter=filter_conditions,
        limit=limit
    )

    return [
        {
            "source": r.payload["source_file"],
            "diagram_type": r.payload.get("diagram_type"),
            "content": r.payload["content"][:300],
            "mermaid": r.payload.get("mermaid_code"),
            "score": r.score
        }
        for r in results
    ]

Response Generation with Diagram

DEVELOPERpython
def answer_with_diagrams(
    query: str,
    retrieved: List[dict],
    client: OpenAI
) -> str:
    """Generate response including relevant diagrams."""

    context = "\n\n".join([
        f"**Diagram: {r['source']}** (type: {r['diagram_type']})\n{r['content']}"
        for r in retrieved
    ])

    # Include Mermaid code if available
    mermaid_codes = [r['mermaid'] for r in retrieved if r.get('mermaid')]

    prompt = f"""You are a technical assistant that answers questions using diagrams as source.

Available diagrams:
{context}

Question: {query}

Instructions:
1. Base your answer on the provided diagrams
2. If relevant, include Mermaid code so users can reproduce
3. Explain relationships between components
4. Cite sources [Diagram: name]"""

    if mermaid_codes:
        prompt += f"\n\nAvailable Mermaid codes:\n" + "\n".join(mermaid_codes[:2])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=1500
    )

    return response.choices[0].message.content

Specific Use Cases

IT Architecture

DEVELOPERpython
def analyze_architecture_diagram(image_path: str, client: OpenAI) -> dict:
    """Specialized analysis for IT architectures."""

    with open(image_path, "rb") as f:
        img_base64 = base64.b64encode(f.read()).decode("utf-8")

    prompt = """Analyze this system architecture in detail:

1. **Services/Applications**: List with their role
2. **Databases**: Types (SQL, NoSQL, cache)
3. **Communication**: Protocols (REST, gRPC, MQ)
4. **Infrastructure**: Cloud, on-premise, containers
5. **Security**: Visible firewalls, auth, encryption
6. **Scalability**: Load balancers, replicas

Structured technical format."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}", "detail": "high"}}
            ]
        }],
        max_tokens=2000
    )

    return {"analysis": response.choices[0].message.content}

Electrical/P&ID Schemas

DEVELOPERpython
ELECTRICAL_SYMBOLS = {
    "resistor": "Resistor",
    "capacitor": "Capacitor",
    "inductor": "Inductor",
    "diode": "Diode",
    "transistor": "Transistor",
    "ground": "Ground",
    "battery": "Battery/Power Supply"
}

def analyze_electrical_diagram(image_path: str, client: OpenAI) -> dict:
    """Specialized analysis for electrical schemas."""

    with open(image_path, "rb") as f:
        img_base64 = base64.b64encode(f.read()).decode("utf-8")

    symbols_context = "\n".join([f"- {k}: {v}" for k, v in ELECTRICAL_SYMBOLS.items()])

    prompt = f"""Analyze this electrical/electronic schema:

Common symbols:
{symbols_context}

Provide:
1. **Components**: List with values if visible (R1=10k, C1=100uF)
2. **Connections**: How components are linked
3. **Functional blocks**: Power supply, amplification, filtering
4. **Signals**: Identifiable inputs/outputs
5. **Overall function**: What does this circuit do?

Technical format."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}", "detail": "high"}}
            ]
        }],
        max_tokens=2000
    )

    return {"analysis": response.choices[0].message.content}

Costs and Performance

Costs per Diagram

Operation	Cost	Notes
GPT-4o extraction	$0.02-0.05	Depends on complexity
OCR (local)	$0	Tesseract
Code generation	$0.01-0.03	Mermaid/PlantUML
Embedding	$0.0001	text-embedding-3-small
Total	~$0.05-0.10	Per diagram

Accuracy by Type

Diagram Type	Extraction Accuracy	Code Accuracy
Simple flowchart	95%	90%
IT Architecture	85%	75%
UML Classes	80%	70%
Electrical schema	70%	50%
Infographic	90%	N/A

Integration with Ailog

Ailog supports diagram indexing:

Upload: PNG, JPG, SVG, PDF
Auto detection: Diagram type
Smart extraction: Components and relationships
Generated code: Reproducible Mermaid

Try Diagram RAG on Ailog

Related Guides

Complete Multimodal RAG Guide - Pillar article
Image RAG: Vision Models
Table Parsing for RAG
OCR and Scanned Documents

Diagrams and Schemas: Extracting Visual Information

Diagrams and Schemas: Extracting Visual Information

The Diagram Challenge

Why It's Different from Photos

Business Use Cases

Indexing ROI

Diagram Types and Strategies

Classification by Complexity

Strategy by Type

Extraction with Vision Models

Prompt Engineering for Diagrams

Diagram Code Generation

Advanced Multi-Zone Extraction

Region of Interest Detection

Diagram-Specific OCR

RAG Indexing

Enriched Data Structure

Complete Indexing Pipeline

Search and Generation

Search with Visual Context

Response Generation with Diagram

Specific Use Cases

IT Architecture

Electrical/P&ID Schemas

Costs and Performance

Costs per Diagram

Accuracy by Type

Integration with Ailog

Related Guides

Tags

Related Posts

Image RAG: Vision Models and Visual Search

Multimodal RAG: Images, PDFs, and Beyond Text

State of the Art Multimodal RAG 2026

Ailog Assistant