BioResearcher: Scenario-Guided Multi-Agent for Translational Medicine

About

Translational medicine turns underspecified development goals into evidence synthesis that must combine literature, trials, patents, and quantitative multi-omics analysis while preserving identifiers, uncertainty, and retrievable provenance. General-purpose foundation models and off-the-shelf tool-augmented or multi-agent systems are not built for this: they tend to produce single-shot answers or run open-endedly, and fall short on the auditable, scenario-specific workflows that heterogeneous biomedical sources demand. This paper introduces Ingenix BioResearcher, a scenario-guided multi-agent system that maps queries to versioned research playbooks, delegates to specialized subagents over 30+ tools and machine-learning endpoints, mixes structured database access with sandboxed code for genome-scale analyses, and applies claim-level multi-model reconciliation before editorial assembly. We evaluate BioResearcher across unit-level capabilities, open-ended biomedical reasoning, and end-to-end clinical discovery. It leads evaluated baselines on 109 single-step tests (83.49% pass rate; 0.892 average score), achieves strong biomedical benchmark performance (89.33% on BixBench-Verified-50 and the top 0.758 mean score on BaisBench Scientific Discovery), and leads on a 30-query clinical end-to-end benchmark with the highest positive hit rate (74.7% $\pm$ 3.3%) and negative clear rate (96.8% $\pm$ 0.2%). These results show broad, competitive performance across unit-level, open-ended, and end-to-end clinical evaluations.

Remigiusz Kinas, Joanna Krawczyk, Rafa{\l} Powalski, Przemys{\l}aw Pietrzak, Agnieszka Kowalewska, Krzysztof Kolmus, Maciej Sypetkowski, {\L}ukasz Smoli\'nski, Tomasz Jetka• 2026

Related benchmarks

Task	Dataset	Result
Biological Reasoning	Single-Step 109-question (test)	L1 Accuracy84.85	8
Clinical End-to-End Performance	Translational medicine scenario family 30 queries	Positive Rate74.7	4
Quantitative reasoning and autonomous analysis	BixBench-Verified-50 Full set	Accuracy89.33	3
Scientific Discovery	BaisBench Scientific Discovery (BAIS-SD)	Mean SSD75.8	3
Quantitative reasoning and autonomous analysis	BixBench Human Verified-50	Accuracy81.82	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord