Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BioResearcher: Scenario-Guided Multi-Agent for Translational Medicine

About

Translational medicine turns underspecified development goals into evidence synthesis that must combine literature, trials, patents, and quantitative multi-omics analysis while preserving identifiers, uncertainty, and retrievable provenance. General-purpose foundation models and off-the-shelf tool-augmented or multi-agent systems are not built for this: they tend to produce single-shot answers or run open-endedly, and fall short on the auditable, scenario-specific workflows that heterogeneous biomedical sources demand. This paper introduces Ingenix BioResearcher, a scenario-guided multi-agent system that maps queries to versioned research playbooks, delegates to specialized subagents over 30+ tools and machine-learning endpoints, mixes structured database access with sandboxed code for genome-scale analyses, and applies claim-level multi-model reconciliation before editorial assembly. We evaluate BioResearcher across unit-level capabilities, open-ended biomedical reasoning, and end-to-end clinical discovery. It leads evaluated baselines on 109 single-step tests (83.49% pass rate; 0.892 average score), achieves strong biomedical benchmark performance (89.33% on BixBench-Verified-50 and the top 0.758 mean score on BaisBench Scientific Discovery), and leads on a 30-query clinical end-to-end benchmark with the highest positive hit rate (74.7% $\pm$ 3.3%) and negative clear rate (96.8% $\pm$ 0.2%). These results show broad, competitive performance across unit-level, open-ended, and end-to-end clinical evaluations.

Remigiusz Kinas, Joanna Krawczyk, Rafa{\l} Powalski, Przemys{\l}aw Pietrzak, Agnieszka Kowalewska, Krzysztof Kolmus, Maciej Sypetkowski, {\L}ukasz Smoli\'nski, Tomasz Jetka• 2026

Related benchmarks

TaskDatasetResultRank
Biological ReasoningSingle-Step 109-question (test)
L1 Accuracy84.85
8
Clinical End-to-End PerformanceTranslational medicine scenario family 30 queries
Positive Rate74.7
4
Quantitative reasoning and autonomous analysisBixBench-Verified-50 Full set
Accuracy89.33
3
Scientific DiscoveryBaisBench Scientific Discovery (BAIS-SD)
Mean SSD75.8
3
Quantitative reasoning and autonomous analysisBixBench Human Verified-50
Accuracy81.82
3
Showing 5 of 5 rows

Other info

Follow for update