Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
About
The evolution of Large Language Models (LLMs) from passive responders to autonomous agents necessitates a fundamental shift in learning paradigms -- from static imitation to incentive-driven decision making. However, this transition is significantly impeded by the lack of scalable infrastructure capable of constructing high-quality interaction signals for effective policy learning. To address this, we introduce a comprehensive method designed to systematically scale the diversity and complexity of interactive environments. Our method realizes this scaling by addressing three orthogonal dimensions: (1) Complexity: NexAU, a flexible agent framework that supports building complex agent hierarchies via simple configurations; (2) Diversity: NexA4A automatically generates diverse agent hierarchies from natural language to cover infinite domains; and (3) Fidelity: NexGAP bridges the simulation-reality gap by integrating dynamic real-world environment for grounded trajectories synthesis. We train Nex-N1 upon the diverse and complex interactive environments established by our infrastructure. Empirical results on benchmarks such as SWE-bench and tau2 demonstrate that Nex-N1 consistently outperforms SOTA open-source models and achieves competitive performance against frontier proprietary models on complex agentic tasks. We open-source the Nex ecosystem and model weights to facilitate further research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automated Software Engineering | SWE-bench Verified | Resolved Rate20.3 | 39 | |
| Terminal Agentic Trajectory Generation | TerminalBench 2.0 | Score31.8 | 29 | |
| Terminal Agentic Trajectory Generation | TerminalBench 1.0 | Score31.56 | 23 | |
| Agentic Coding | SWE-bench Verified | Percentage Resolved70.6 | 19 | |
| Functional correctness for backend applications | Baxbench | Functional Correctness59.7 | 14 | |
| General AI Assistant Tasks | GAIA 2 | Score29.5 | 14 | |
| Function Calling | BFCL v4 | Score65.3 | 13 | |
| Agentic Coding | Project dev (test) | Tau^20.802 | 13 | |
| End-to-end terminal tasks | Terminal-Bench 2 | Score31.8 | 13 |