Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

About

The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool combination becomes a high-dimensional optimization challenge. Existing approaches often rely on a single model or fixed tool-calling logic, failing to exploit the performance variations across heterogeneous model-tool pairs. In this paper, we present ATLAS (Adaptive Tool-LLM Alignment and Synergistic Invocation), a dual-path framework for dynamic tool usage in cross-domain complex reasoning. ATLAS operates via a dual-path approach: (1) \textbf{training-free cluster-based routing} that exploits empirical priors for domain-specific alignment, and (2) \textbf{RL-based multi-step routing} that explores autonomous trajectories for out-of-distribution generalization. Extensive experiments across 15 benchmarks demonstrate that our method outperforms closed-source models like GPT-4o, surpassing existing routing methods on both in-distribution (+10.1%) and out-of-distribution (+13.1%) tasks. Furthermore, our framework shows significant gains in visual reasoning by orchestrating specialized multi-modal tools.

Jinyang Wu, Guocheng Zhai, Ruihan Jin, Jiahao Yuan, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Tao• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAMC
Accuracy82.5
151
Mathematical ReasoningAIME 24
AIME 24 Accuracy43.3
84
Question AnsweringWebQuestions (WebQs)
Accuracy53.6
67
CodeHumanEval
HumanEval Accuracy91.5
50
CodingMBPP
Accuracy83.6
31
Logical reasoningLogiQA-2
Accuracy66.8
30
ScienceGPQA
Accuracy46.4
25
Knowledge EvaluationNatural Questions (NQ) (Evaluation)
Accuracy44.1
22
ArithmeticCalc
Accuracy83.3
16
Math ReasoningAIME25
AIME25 Accuracy40
16
Showing 10 of 10 rows

Other info

Follow for update