Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

About

While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM training, remains a significant challenge. In this paper, we introduce TREX, a multi-agent system that automates the entire LLM training life-cycle. By orchestrating collaboration between two core modules-the Researcher and the Executor-the system seamlessly performs requirement analysis, open-domain literature and data research, formulation of training strategies, preparation of data recipes, and model training and evaluation. The multi-round experimental process is modeled as a search tree, enabling the system to efficiently plan exploration paths, reuse historical results, and distill high-level insights from iterative trials. To evaluate the capability of automated LLM training, we construct FT-Bench, a benchmark comprising 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks. Experimental results demonstrate that the TREX agent consistently optimizes model performance on target tasks.

Zerun Ma, Guoqiang Wang, Xinchen Xie, Yicheng Chen, He Du, Bowen Li, Yanan Sun, Wenran Liu, Kai Chen, Yining Li• 2026

Related benchmarks

TaskDatasetResultRank
Autonomous LLM Fine-tuningACI-BENCH
Rouge-150.2
4
Autonomous LLM Fine-tuningTOMG-Bench
Validation Score68.1
4
Autonomous LLM Fine-tuningoMeBench
oMeScore48.4
4
Autonomous LLM Fine-tuningHOC
Macro-F189.7
4
Autonomous LLM Fine-tuningSST-2
Accuracy97.2
4
Autonomous LLM Fine-tuningCS-Bench
Accuracy58.1
4
Autonomous LLM Fine-tuningOpenFinData
Accuracy69.9
4
Autonomous LLM Fine-tuningEconlogicQA
Accuracy45.4
4
Autonomous LLM Fine-tuningGTA
Accuracy65.2
4
Autonomous LLM Fine-tuningLawBench
Hybrid Score42.1
4
Showing 10 of 10 rows

Other info

Follow for update