Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SketchVLM: Vision language models can annotate images to explain thoughts and guide users

About

When answering questions about images, humans naturally point, label, and draw to explain their reasoning. In contrast, modern vision-language models (VLMs) such as Gemini-3-Pro and GPT-5 only respond with text, which can be difficult for users to verify. We present SketchVLM, a training-free, model-agnostic framework that enables VLMs to produce non-destructive, editable SVG overlays on the input image to visually explain their answers. Across seven benchmarks spanning visual reasoning (maze navigation, ball-drop trajectory prediction, and object counting) and drawing (part labeling, connecting-the-dots, and drawing shapes around objects), SketchVLM improves visual reasoning task accuracy by up to +28.5 percentage points and annotation quality by up to 1.48x relative to image-editing and fine-tuned sketching baselines, while also producing annotations that are more faithful to the model's stated answer. We find that single-turn generation already achieves strong accuracy and annotation quality, and multi-turn generation opens up further opportunities for human-AI collaboration. An interactive demo and code are at https://sketchvlm.github.io/.

Brandon Collins, Logan Bolton, Hung Huy Nguyen, Mohammad Reza Taesiri, Trung Bui, Anh Totti Nguyen• 2026

Related benchmarks

TaskDatasetResultRank
CountingCounting
Accuracy95.9
7
Annotation QualityVPCT Ball Drop and Maze Navigation
VPCT Score3.12
5
Annotation-text AlignmentVPCT Ball Drop Maze Navigation
VPCT Score100
5
Ball DropBall Drop
Human Score3.79
5
Drawing Quality AssessmentSketchVLM Drawing Task Suite VPCT, Ball Drop, Maze, Counting
VPCT Score3.12
5
Maze NavigationMaze Navigation (Invalid)
Human Score4.13
5
Maze NavigationMaze Navigation (val)
Human Score4.45
5
Visual ReasoningVPCT Ball Drop Maze and Counting
VPCT Accuracy96
5
VPCTVPCT
Human Score4.56
5
Showing 9 of 9 rows

Other info

GitHub

Follow for update