Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents

About

Large Language Models (LLMs) are increasingly applied to software engineering (SWE), with SWE-bench as a key benchmark. Solutions are split into SWE-Agent frameworks with multi-turn interactions and workflow-based Agentless methods with single-turn verifiable steps. We argue these paradigms are not mutually exclusive: reasoning-intensive Agentless training induces skill priors, including localization, code edit, and self-reflection that enable efficient and effective SWE-Agent adaptation. In this work, we first curate the Agentless training recipe and present Kimi-Dev, an open-source SWE LLM achieving 60.4\% on SWE-bench Verified, the best among workflow approaches. With additional SFT adaptation on 5k publicly-available trajectories, Kimi-Dev powers SWE-Agents to 48.6\% pass@1, on par with that of Claude 3.5 Sonnet (241022 version). These results show that structured skill priors from Agentless training can bridge workflow and agentic frameworks for transferable coding agents.

Zonghan Yang, Shengjie Wang, Kelin Fu, Wenyang He, Weimin Xiong, Yibo Liu, Yibo Miao, Bofei Gao, Yejie Wang, Yingwei Ma, Yanhao Li, Yue Liu, Zhenxing Hu, Kaitai Zhang, Shuyi Wang, Huarong Chen, Flood Sung, Yang Liu, Yang Gao, Zhilin Yang, Tianyu Liu• 2025

Related benchmarks

Task	Dataset	Result
Agentic Coding	SWE-bench Verified	Percentage Resolved48.6	56
Competitive Programming	LiveCodeBench Pro 25Q2	Easy Score90.2	33
Competitive Programming	LiveCodeBench Pro 25Q1	Easy Score88.5	33
Competitive Programming	Codeforces 2501 - 2507	ELO2.33e+3	32
Software Engineering	SWE-bench Verified	Success Rate48.6	31
Software Engineering Issue Resolution	SWE-bench Verified	Resolution Rate60.6	26
LLM-as-a-Judge	High-contrast response pairs	Discriminability (πi)0.84	20
Self-Preference Bias Analysis	AlpacaEval	PIR12.5	20
Competitive Programming	LiveCodeBench 2408 - 2505 v6	Pass@185	15

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord