Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

About

While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current frameworks struggle with robustness in long-horizon workflows and generalization in novel domains. These limitations stem from a lack of granular control over historical visual context curation and the absence of visual-aware tutorial retrieval. To bridge these gaps, we introduce OS-Symphony, a holistic framework that comprises an Orchestrator coordinating two key innovations for robust automation: (1) a Reflection-Memory Agent that utilizes milestone-driven long-term memory to enable trajectory-level self-correction, effectively mitigating visual context loss in long-horizon tasks; (2) Versatile Tool Agents featuring a Multimodal Searcher that adopts a SeeAct paradigm to navigate a browser-based sandbox to synthesize live, visually aligned tutorials, thereby resolving fidelity issues in unseen scenarios. Experimental results demonstrate that OS-Symphony delivers substantial performance gains across varying model scales, establishing new state-of-the-art results on three online benchmarks, notably achieving 65.84% on OSWorld.

Bowen Yang, Kaiming Jin, Zhenyu Wu, Zhaoyang Liu, Qiushi Sun, Zehao Li, JingJing Xie, Zhoumianze Liu, Fangzhi Xu, Kanzhi Cheng, Qingyun Li, Yian Wang, Yu Qiao, Zun Wang, Zichen Ding• 2026

Related benchmarks

TaskDatasetResultRank
GUI Navigation and ActionOS World (test)
Success Rate (OS)78.26
26
OS GUI Agentic Task ExecutionOSWorld 361 tasks (Verified)
Average Success Rate65.84
21
GUI AutomationWindowsAgentArena
Success Rate (Office)54.76
11
Operating System Task AutomationMacOSArena
Single App Score32.14
9
Operating System Agent ControlWindowsAgentArena
Success Rate0.635
8
Showing 5 of 5 rows

Other info

GitHub

Follow for update