About Chart Analysis AI

Hybrid AI system for extracting structured data from chart images in Vietnamese academic papers

Chart Analysis AI is a 5-stage pipeline system that processes PDF documents and chart images to extract structured data, answer questions, and generate analytical insights. It combines object detection (YOLO), vision-language models (DePlot, PaddleOCR-VL, Vintern), and fine-tuned small language models (6 SLM variants from 0.5B to 7B parameters) with cloud AI fallback (Gemini).

The system was designed for Vietnamese academic papers and supports bilingual (English/Vietnamese) chart analysis. A key finding: DePlot-extracted table data improves SLM accuracy by +3.0% on average -- a larger effect than scaling model size from 0.5B to 7B parameters.

Pipeline Architecture

Ingestion

Parse PDF/DOCX/images into clean page images

PyMuPDF, Pillow

Detection

Locate and crop chart regions from pages

YOLOv8-M (93.5% mAP@50)

Extraction

Classify chart type and extract structured table data

EfficientNet-B0 + DePlot VLM

Reasoning

Refine data, answer questions, generate descriptions

AI Router (7 SLM + 3 Gemini + Vintern)

Reporting

Generate insights, format output as JSON/Markdown/CSV

Template engine + insight rules

SLM Model Screening

6 models screened in 2 rounds: Round 1 (text-only) and Round 2 (with DePlot extraction). Trained on 4,000 bilingual samples with QLoRA 4-bit quantization.

Model	R1 (no DePlot)	R2 (+ DePlot)	Delta	VRAM
Llama-3.2-3BWINNER	83.3%	86.0%	+2.7%	5.5 GB
Qwen-2.5-7B	82.7%	85.7%	+2.9%	10.6 GB
Llama-3.2-1B	82.3%	85.1%	+2.8%	3.6 GB
Qwen-2.5-3B	81.7%	84.9%	+3.1%	5.3 GB
Qwen-2.5-1.5B	81.1%	84.3%	+3.2%	4.0 GB
Qwen-2.5-0.5B	80.1%	83.6%	+3.5%	2.8 GB

Key finding: DePlot adds +3.0% average accuracy across ALL models -- a bigger impact than scaling from 0.5B to 7B parameters (only 3.2% gap). Data quality matters more than model size.

Performance Metrics

93.5%

Chart Detection

YOLOv8-M mAP@50, 3-class

97.54%

Classification

EfficientNet-B0, 3-class

86.0%

Best SLM Accuracy

Llama-3.2-3B + QLoRA + DePlot

+3.0%

DePlot Impact

Average across all 6 models

60,613

Training Samples

Bilingual EN:VI 50:50

32,364

Chart Dataset

Classified academic charts

7,332

DePlot Extractions

Charts extracted, 0 errors

3-10s

Inference Speed

RTX 3060, 64x faster than Vintern

Technology Stack

Core Pipeline

Python 3.11PyMuPDFOmegaConfPydantic v2

Detection & Classification

YOLOv8-MEfficientNet-B0ResNet-18

Extraction (VLM)

DePlot (Pix2Struct)PaddleOCR-VLMatCha

Reasoning (AI)

Llama-3.2 (1B/3B)Qwen-2.5 (0.5B-7B)Vintern-1BGemini 3.x

Training

QLoRA (4-bit)Vertex AI A100WandBHugging Face

Backend

FastAPISSE StreamingDocker + NVIDIA GPU

Frontend

Next.js 16React 19Tailwind CSS 4shadcn/ui

Deployment

Docker ComposeCloud Run (L4 GPU)GCS

Project Information

ProjectChart Analysis AI v8.0

UniversityFPT University

SemesterSpring 2026 (Capstone)

Developerthatlq (ielx)

CollaboratorXHoang04 (Vintern model)

Training InfraGoogle Cloud Vertex AI (A100 SPOT)

Local DevRTX 3060 Laptop (6GB VRAM)

Sourcegithub.com/thatlq1812/chart_analysis_ai_v3