Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

About

While Large Language Model (LLM)-based agents have shown remarkable potential for solving complex tasks, existing systems remain heavily reliant on large-scale models, leaving the capabilities of edge-scale models largely underexplored. In this paper, we present the first systematic study on training agentic models at the 4B-parameter scale. We identify three primary bottlenecks hindering the performance of edge-scale models: catastrophic forgetting during Supervised Fine-Tuning (SFT), sensitivity to reward signal noise during Reinforcement Learning (RL), and reasoning degradation caused by redundant information in long-context scenarios. To address the issues, we propose AgentCPM-Explore, a compact 4B agent model with high knowledge density and strong exploration capability. We introduce a holistic training framework featuring parameter-space model fusion, reward signal denoising, and contextual information refinement. Through deep exploration, AgentCPM-Explore achieves state-of-the-art (SOTA) performance among 4B-class models, matches or surpasses 8B-class SOTA models on four benchmarks, and even outperforms larger-scale models such as Claude-4.5-Sonnet or DeepSeek-v3.2 in five benchmarks. Notably, AgentCPM-Explore achieves 97.09% accuracy on GAIA text-based tasks under pass@64. These results provide compelling evidence that the bottleneck for edge-scale models is not their inherent capability ceiling, but rather their inference stability. Based on our well-established training framework, AgentCPM-Explore effectively unlocks the significant, yet previously underestimated, potential of edge-scale models.

Haotian Chen, Xin Cong, Shengda Fan, Yuyang Fu, Ziqin Gong, Yaxi Lu, Yishan Li, Boye Niu, Chengjun Pan, Zijun Song, Huadong Wang, Yesai Wu, Yueying Wu, Zihao Xie, Yukun Yan, Zhong Zhang, Yankai Lin, Zhiyuan Liu, Maosong Sun• 2026

Related benchmarks

TaskDatasetResultRank
ReasoningHLE
Accuracy (HLE Reasoning)19.1
63
Deep ResearchBrowsecomp
Score24.1
47
Web Navigation Question AnsweringWebWalker QA
Accuracy68.1
23
Deep ResearchxBench-DS-2505
Score70
22
Deep Information Search and Synthesisxbench DeepSearch
Score70
22
Complex ReasoningGAIA text
Accuracy63.9
19
Agent Capability EvaluationSEAL 0--
19
Agentic SearchXbench DeepSearch 2505
Accuracy70
18
Web Browsing Competition (Chinese)Browse Comp ZH
Score29.1
18
Deep ResearchGAIA Text-Only
Score63.9
17
Showing 10 of 20 rows

Other info

Follow for update