Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MetaForge: A Self-Evolving Multimodal Agent that Retrieves, Adapts, and Forges Tools On Demand

About

Multimodal agents have achieved notable progress on complex reasoning tasks through tool use, yet remain limited by two issues: statically predefined tool inventories fail to generalize to unseen scenarios, and indiscriminate tool invocation incurs redundant cost and noise-induced errors. We propose MetaForge, a multimodal agent framework that learns when to invoke tools and how to evolve its toolset on demand. MetaForge factorizes agentic behavior into four coupled stages: Decide (judging whether tool use is warranted), Retrieve (selecting suitable tools), Adapt (grounding tool parameters in task context), and Forge (synthesizing new skills online and recycling them into the tool library for reuse), forming a closed judge-retrieve-adapt-forge-recycle loop. A unified orchestration policy enables the agent to choose among answering directly, reusing existing tools, or forging new ones. We jointly optimize invocation necessity, retrieval accuracy, execution effectiveness, and forged-skill reusability via reinforcement learning, with an explicit invocation-cost penalty discouraging redundant calls. Across 12 benchmarks, MetaForge consistently surpasses 16 baselines in accuracy, efficiency, and generalization, validating a paradigm shift from static tool inventories to on-demand self-evolution.

Shouang Wei, Houcheng Min, Xinpeng Dong, Xin Lin, Sen Cui, Bo Jiang, Zhongxiang Dai, Kun Kuang, Guandong Xu, Fei Wu, Min Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Science Question AnsweringScienceQA--
791
Mathematical ReasoningMathVista
Score79.78
474
Diagram Question AnsweringAI2D
AI2D Accuracy89.6
387
CountingTallyQA
Accuracy81.17
67
Chart Question AnsweringChartQA
Accuracy90.49
59
OCR-based Visual Question AnsweringOCRVQA
Mean Accuracy86.51
50
Document Visual Question AnsweringDocVQA v1.0 (test)--
49
Tool UseVerlTool IID Tools
Att.190
11
Tool UseVerlTool OOD Tools
Attribute12
11
Visual Question AnsweringMapQA
Accuracy89.14
9
Showing 10 of 14 rows

Other info

Follow for update