Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models

About

Vision-Language Models (VLMs) are becoming the cornerstone of high-level reasoning for robotic automation, enabling robots to parse natural language commands and perceive their environments. However, their susceptibility to hallucinations introduces critical failures in decision-making, posing significant safety and reliability risks in physical deployments. This challenge is exacerbated by the open-ended nature of real-world tasks, where questions vary vastly in difficulty and modality, demanding robust and adaptable reasoning strategies. To tackle this, we propose the Pseudocode-guided Structured Reasoning framework (PStar), which adaptively selects structured pseudocode reasoning paths to help VLMs perform flexible and step-by-step reasoning. We first design a set of abstract reasoning functions and formulate a structured pseudocode library to represent modular reasoning strategies. Crucially, we design a Difficulty Feature Vector (DFV) that allows the model to assess question complexity and adaptively choose appropriate reasoning strategies-enhancing robustness and interpretability. Extensive experiments demonstrate that PStar significantly reduces hallucination rates, achieving state-of-the-art scores of 87.1% on POPE and 68.0% on MMStar, outperforming even GPT-4V. By providing a validated mechanism to reduce visual-language errors, PStar offers a critical step toward deploying more trustworthy and deterministic VLMs for real-world automated systems, where such errors can lead to catastrophic outcomes.

Weicong Ni, Tianbao Jiang, Linlin Wang• 2026

Related benchmarks

Task	Dataset	Result
Multimodal Reasoning	MMStar	Accuracy68	102
Hallucination and Visual Reasoning Evaluation	HallusionBench	Accuracy (aACC)81.8	61
Multimodal Capability Evaluation	MMStar	Overall Score68	31
General Multimodal Performance	POPE, HallusionBench, MMStar Average	Overall Score69.3	11
Object Hallucination Detection	POPE	Accuracy88.7	11
Open-ended Question Answering	OKVQA	LVM Evaluation Score71.6	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord