Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models

About

Vision-Language Models (VLMs) are becoming the cornerstone of high-level reasoning for robotic automation, enabling robots to parse natural language commands and perceive their environments. However, their susceptibility to hallucinations introduces critical failures in decision-making, posing significant safety and reliability risks in physical deployments. This challenge is exacerbated by the open-ended nature of real-world tasks, where questions vary vastly in difficulty and modality, demanding robust and adaptable reasoning strategies. To tackle this, we propose the Pseudocode-guided Structured Reasoning framework (PStar), which adaptively selects structured pseudocode reasoning paths to help VLMs perform flexible and step-by-step reasoning. We first design a set of abstract reasoning functions and formulate a structured pseudocode library to represent modular reasoning strategies. Crucially, we design a Difficulty Feature Vector (DFV) that allows the model to assess question complexity and adaptively choose appropriate reasoning strategies-enhancing robustness and interpretability. Extensive experiments demonstrate that PStar significantly reduces hallucination rates, achieving state-of-the-art scores of 87.1% on POPE and 68.0% on MMStar, outperforming even GPT-4V. By providing a validated mechanism to reduce visual-language errors, PStar offers a critical step toward deploying more trustworthy and deterministic VLMs for real-world automated systems, where such errors can lead to catastrophic outcomes.

Weicong Ni, Tianbao Jiang, Linlin Wang• 2026

Related benchmarks

TaskDatasetResultRank
Multimodal ReasoningMMStar
Accuracy68
78
Hallucination and Visual Reasoning EvaluationHallusionBench
Accuracy (aACC)81.8
40
General Multimodal PerformancePOPE, HallusionBench, MMStar Average
Overall Score69.3
11
Multimodal Capability EvaluationMMStar
CP Score76.8
11
Object Hallucination DetectionPOPE
Accuracy88.7
11
Open-ended Question AnsweringOKVQA
LVM Evaluation Score71.6
6
Showing 6 of 6 rows

Other info

Follow for update