A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge

About

The emergence of LLMs has catalyzed a paradigm shift in autonomous agent development, enabling systems capable of reasoning, planning, and executing complex multi-step tasks. However, existing agent frameworks often suffer from architectural rigidity, vendor lock-in, and prohibitive complexity that impedes rapid prototyping and deployment. This paper presents AgentForge, a lightweight, open-source Python framework designed to democratize the construction of LLM-driven autonomous agents through a principled modular architecture. AgentForge introduces three key innovations: (1) a composable skill abstraction that enables fine-grained task decomposition with formally defined input-output contracts, (2) a unified LLM backend interface supporting seamless switching between cloud-based APIs and local inference engines, and (3) a declarative YAML-based configuration system that separates agent logic from implementation details. We formalize the skill composition mechanism as a directed acyclic graph (DAG) and prove its expressiveness for representing arbitrary sequential and parallel task workflows. Comprehensive experimental evaluation across four benchmark scenarios demonstrates that AgentForge achieves competitive task completion rates while reducing development time by 62% compared to LangChain and 78% compared to direct API integration. Latency measurements confirm sub-100ms orchestration overhead, rendering the framework suitable for real-time applications. The modular design facilitates extension: we demonstrate the integration of six built-in skills and provide comprehensive documentation for custom skill development. AgentForge addresses a critical gap in the LLM agent ecosystem by providing researchers and practitioners with a production-ready foundation for constructing, evaluating, and deploying autonomous agents without sacrificing flexibility or performance.

Akbar Anbar Jafari, Cagri Ozcinar, Gholamreza Anbarjafari• 2026

Related benchmarks

Task	Dataset	Result
Content Generation	T4 Content 1.0 (test)	Task Completion Rate93.8	4
Data Analysis	T2 1.0 (test)	Task Completion Rate91.2	4
News Aggregation	T1 News 1.0 (test)	Task Completion Rate87.3	4
Research Assistant	T3 Research 1.0 (test)	Task Completion Rate85.5	4
Task T1	T1	Token Usage (Input + Output)2.91e+3	4
Task T2	T2	Total Tokens Used1.99e+3	4
Task T3	T3	Token Usage (Input + Output)2.29e+3	4
Task T4	T4	Token Usage (Total)3.51e+3	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord