BCER Agent: Reliable Long-Horizon MRI Workflow Execution via Compilation, Artifact Binding, and Bounded Local Recovery
About
Many recent medical VLM and agent studies are benchmarked on 2D images or comparatively short tool-calling exchanges, whereas real MRI analysis typically demands long, interdependent pipelines that operate on 3D/4D volumetric data. Under these conditions, reactive tool-calling agents are prone to cascading breakdowns triggered by faulty intermediate references, mismatched tool arguments, and limited control over cross-step dependencies. To address this, we introduce BCER (Brain-Cerebellum-Extremity-Reflector), a controller architecture aimed at dependable long-horizon MRI workflow execution. BCER decouples high-level planning from execution and provides bounded local recovery. We assess BCER on a multi-organ MRI benchmark covering brain, prostate, and cardiac tasks with both short- and long-chain workflows, using matched task contracts across controller variants and several backbone models. Relative to reactive baselines, BCER yields consistent improvements in end-to-end execution, with the most pronounced gains observed on long-chain workflows. BCER additionally enables auditability by maintaining explicit links between final outputs and intermediate artifacts and measurements. Code and benchmark are released at https://github.com/Albertlongzi/BCER.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cardiac report | MRI Workflows | SR93 | 4 | |
| Prostate report | MRI Workflows | SR99 | 4 | |
| Super-Resolution | MRI Workflows | SR Score (%)100 | 4 | |
| Total Overall Performance | MRI Workflows | Success Rate (SR)99 | 4 | |
| Denoise | MRI Workflows | Success Rate (SR)100 | 4 | |
| Segmentation | MRI Workflows | SR100 | 4 | |
| Brain grading | MRI Workflows | SR100 | 4 | |
| Registration | MRI Workflows | Success Rate (SR)100 | 4 | |
| Reconstruction | MRI Workflows | SR100 | 4 |