About Chart Analysis AI

Hybrid AI system for extracting structured data from chart images in Vietnamese academic papers

Chart Analysis AI is a 5-stage pipeline system that processes PDF documents and chart images to extract structured data, answer questions, and generate analytical insights. It combines object detection (YOLO), vision-language models (DePlot, PaddleOCR-VL, Vintern), and fine-tuned small language models (6 SLM variants from 0.5B to 7B parameters) with cloud AI fallback (Gemini).

The system was designed for Vietnamese academic papers and supports bilingual (English/Vietnamese) chart analysis. A key finding: DePlot-extracted table data improves SLM accuracy by +3.0% on average -- a larger effect than scaling model size from 0.5B to 7B parameters.

Pipeline Architecture
S1
Ingestion

Parse PDF/DOCX/images into clean page images

PyMuPDF, Pillow
S2
Detection

Locate and crop chart regions from pages

YOLOv8-M (93.5% mAP@50)
S3
Extraction

Classify chart type and extract structured table data

EfficientNet-B0 + DePlot VLM
S4
Reasoning

Refine data, answer questions, generate descriptions

AI Router (7 SLM + 3 Gemini + Vintern)
S5
Reporting

Generate insights, format output as JSON/Markdown/CSV

Template engine + insight rules
SLM Model Screening

6 models screened in 2 rounds: Round 1 (text-only) and Round 2 (with DePlot extraction). Trained on 4,000 bilingual samples with QLoRA 4-bit quantization.

ModelR1 (no DePlot)R2 (+ DePlot)DeltaVRAM
Llama-3.2-3BWINNER83.3%86.0%+2.7%5.5 GB
Qwen-2.5-7B82.7%85.7%+2.9%10.6 GB
Llama-3.2-1B82.3%85.1%+2.8%3.6 GB
Qwen-2.5-3B81.7%84.9%+3.1%5.3 GB
Qwen-2.5-1.5B81.1%84.3%+3.2%4.0 GB
Qwen-2.5-0.5B80.1%83.6%+3.5%2.8 GB

Key finding: DePlot adds +3.0% average accuracy across ALL models -- a bigger impact than scaling from 0.5B to 7B parameters (only 3.2% gap). Data quality matters more than model size.

Performance Metrics
93.5%
Chart Detection
YOLOv8-M mAP@50, 3-class
97.54%
Classification
EfficientNet-B0, 3-class
86.0%
Best SLM Accuracy
Llama-3.2-3B + QLoRA + DePlot
+3.0%
DePlot Impact
Average across all 6 models
60,613
Training Samples
Bilingual EN:VI 50:50
32,364
Chart Dataset
Classified academic charts
7,332
DePlot Extractions
Charts extracted, 0 errors
3-10s
Inference Speed
RTX 3060, 64x faster than Vintern
Technology Stack

Core Pipeline

Python 3.11PyMuPDFOmegaConfPydantic v2

Detection & Classification

YOLOv8-MEfficientNet-B0ResNet-18

Extraction (VLM)

DePlot (Pix2Struct)PaddleOCR-VLMatCha

Reasoning (AI)

Llama-3.2 (1B/3B)Qwen-2.5 (0.5B-7B)Vintern-1BGemini 3.x

Training

QLoRA (4-bit)Vertex AI A100WandBHugging Face

Backend

FastAPISSE StreamingDocker + NVIDIA GPU

Frontend

Next.js 16React 19Tailwind CSS 4shadcn/ui

Deployment

Docker ComposeCloud Run (L4 GPU)GCS
Project Information
ProjectChart Analysis AI v8.0
UniversityFPT University
SemesterSpring 2026 (Capstone)
Developerthatlq (ielx)
CollaboratorXHoang04 (Vintern model)
Training InfraGoogle Cloud Vertex AI (A100 SPOT)
Local DevRTX 3060 Laptop (6GB VRAM)
Sourcegithub.com/thatlq1812/chart_analysis_ai_v3