BCER Agent: Reliable Long-Horizon MRI Workflow Execution via Compilation, Artifact Binding, and Bounded Local Recovery

About

Many recent medical VLM and agent studies are benchmarked on 2D images or comparatively short tool-calling exchanges, whereas real MRI analysis typically demands long, interdependent pipelines that operate on 3D/4D volumetric data. Under these conditions, reactive tool-calling agents are prone to cascading breakdowns triggered by faulty intermediate references, mismatched tool arguments, and limited control over cross-step dependencies. To address this, we introduce BCER (Brain-Cerebellum-Extremity-Reflector), a controller architecture aimed at dependable long-horizon MRI workflow execution. BCER decouples high-level planning from execution and provides bounded local recovery. We assess BCER on a multi-organ MRI benchmark covering brain, prostate, and cardiac tasks with both short- and long-chain workflows, using matched task contracts across controller variants and several backbone models. Relative to reactive baselines, BCER yields consistent improvements in end-to-end execution, with the most pronounced gains observed on long-chain workflows. BCER additionally enables auditability by maintaining explicit links between final outputs and intermediate artifacts and measurements. Code and benchmark are released at https://github.com/Albertlongzi/BCER.

Ziyang Long, Xinqi Li, Junzhou Chen, Yifan Gao, Debiao Li, Hsin-Jung Yang• 2026

Related benchmarks

Task	Dataset	Result
Cardiac report	MRI Workflows	SR93	4
Prostate report	MRI Workflows	SR99	4
Super-Resolution	MRI Workflows	SR Score (%)100	4
Total Overall Performance	MRI Workflows	Success Rate (SR)99	4
Denoise	MRI Workflows	Success Rate (SR)100	4
Segmentation	MRI Workflows	SR100	4
Brain grading	MRI Workflows	SR100	4
Registration	MRI Workflows	Success Rate (SR)100	4
Reconstruction	MRI Workflows	SR100	4

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord