MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning

About

Vision language models (VLMs) achieve strong performance on general image understanding but struggle to think with medical images, especially when performing multi-step reasoning through iterative visual interaction. Medical VLMs often rely on static visual embeddings and single-pass inference, preventing models from re-examining, verifying, or refining visual evidence during reasoning. While tool-integrated reasoning offers a promising path forward, open-source VLMs lack the training infrastructure to learn effective tool selection, invocation, and coordination in multi-modal medical reasoning. We introduce MedVistaGym, a scalable and interactive training environment that incentivizes tool-integrated visual reasoning for medical image analysis. MedVistaGym equips VLMs to determine when and which tools to invoke, localize task-relevant image regions, and integrate single or multiple sub-image evidence into interleaved multimodal reasoning within a unified, executable interface for agentic training. Using MedVistaGym, we train MedVistaGym-R1 to interleave tool use with agentic reasoning through trajectory sampling and end-to-end reinforcement learning. Across six medical VQA benchmarks, MedVistaGym-R1-8B exceeds comparably sized tool-augmented baselines by 19.10% to 24.21%, demonstrating that structured agentic training--not tool access alone--unlocks effective tool-integrated reasoning for medical image analysis.

Meng Lu, Yuxing Lu, Yuchen Zhuang, Megan Mullins, Yang Xie, Guanghua Xiao, Charles Fleming, Wenqi Shi, Xuan Wang• 2026

Related benchmarks

Task	Dataset	Result
Medical Visual Question Answering	Slake	Accuracy81.36	247
Medical Visual Question Answering	VQA-RAD	Accuracy70.75	228
Medical Visual Question Answering	PMC-VQA	Accuracy58	103
Medical Visual Question Answering	PathVQA	Overall Accuracy69	92
Medical Visual Question Answering	MicroVQA	Overall Accuracy43	48
Medical Visual Question Answering	MMMU Health & Medicine (test)	Accuracy42.9	39
Medical Visual Question Answering	MMMU H&M	Accuracy0.5643	25

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord