AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

About

While Large Language Model (LLM)-based agents have shown remarkable potential for solving complex tasks, existing systems remain heavily reliant on large-scale models, leaving the capabilities of edge-scale models largely underexplored. In this paper, we present the first systematic study on training agentic models at the 4B-parameter scale. We identify three primary bottlenecks hindering the performance of edge-scale models: catastrophic forgetting during Supervised Fine-Tuning (SFT), sensitivity to reward signal noise during Reinforcement Learning (RL), and reasoning degradation caused by redundant information in long-context scenarios. To address the issues, we propose AgentCPM-Explore, a compact 4B agent model with high knowledge density and strong exploration capability. We introduce a holistic training framework featuring parameter-space model fusion, reward signal denoising, and contextual information refinement. Through deep exploration, AgentCPM-Explore achieves state-of-the-art (SOTA) performance among 4B-class models, matches or surpasses 8B-class SOTA models on four benchmarks, and even outperforms larger-scale models such as Claude-4.5-Sonnet or DeepSeek-v3.2 in five benchmarks. Notably, AgentCPM-Explore achieves 97.09% accuracy on GAIA text-based tasks under pass@64. These results provide compelling evidence that the bottleneck for edge-scale models is not their inherent capability ceiling, but rather their inference stability. Based on our well-established training framework, AgentCPM-Explore effectively unlocks the significant, yet previously underestimated, potential of edge-scale models.

Haotian Chen, Xin Cong, Shengda Fan, Yuyang Fu, Ziqin Gong, Yaxi Lu, Yishan Li, Boye Niu, Chengjun Pan, Zijun Song, Huadong Wang, Yesai Wu, Yueying Wu, Zihao Xie, Yukun Yan, Zhong Zhang, Yankai Lin, Zhiyuan Liu, Maosong Sun• 2026

Related benchmarks

Task	Dataset	Result
Reasoning	HLE	Accuracy (HLE Reasoning)19.1	63
Deep Research	Browsecomp	Score24.1	47
Web Navigation Question Answering	WebWalker QA	Accuracy68.1	23
Deep Research	xBench-DS-2505	Score70	22
Deep Information Search and Synthesis	xbench DeepSearch	Score70	22
Complex Reasoning	GAIA text	Accuracy63.9	19
Agent Capability Evaluation	SEAL 0	--	19
Agentic Search	Xbench DeepSearch 2505	Accuracy70	18
Web Browsing Competition (Chinese)	Browse Comp ZH	Score29.1	18
Deep Research	GAIA Text-Only	Score63.9	17

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord