FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments

About

Large Language Models are being increasingly deployed as the decision-making core of autonomous agents capable of effecting change in external environments. Yet, in conversational benchmarks, which simulate real-world customer-centric issue resolution scenarios, these agents frequently fail due to the cascading effects of incorrect decision-making. These challenges are particularly pronounced for open-source LLMs with smaller parameter sizes, limited context windows, and constrained inference budgets, which contribute to increased error accumulation in agentic settings. To tackle these challenges, we present the Failure-Aware Meta-Agentic (FAMA) framework. FAMA operates in two stages: first, it analyzes failure trajectories from baseline agents to identify the most prevalent errors; second, it employs an orchestration mechanism that activates a minimal subset of specialized agents tailored to address these failures by injecting a targeted context for the tool-use agent before the decision-making step. Experiments across open-source LLMs demonstrate performance gains up to 27% across evaluation modes over standard baselines. These results highlight that targeted curation of context through specialized agents to address common failures is a valuable design principle for building reliable, multi-turn tool-use LLM agents that simulate real-world conversational scenarios.

Amir Saeidi, Venkatesh Mishra, Souradeep Mukhopadhyay, Gaowen Liu, Ali Payani, Jayanth Srinivasa, Chitta Baral• 2026

Related benchmarks

Task	Dataset	Result
LLM Agent Evaluation	Tau-bench retail	Pass@144.17	38
Multi-turn agent task	ACEBench multi-turn (test)	Process Accuracy70.2	31
Agent Task Completion	τ-Bench Retail	--	31
LLM Agent Evaluation	Tau-bench airline	Pass@426.7	29
Agentic Task Performance	τ-Telehealth	Pass^1 Rate45	16
Agentic Task Performance	τ-Telecom	Pass@1 Success Rate52	16
Tool-Use Agent Evaluation	τ-Bench Airline	Pass@129.2	12
Agent Task Completion	τ-Bench Airline	--	8
Tool-Use Agent Evaluation	τ-Bench Retail	Pass@1 Success Rate44.173	6
Tool-Use Agent Evaluation	τ-Bench Retail (test)	Pass@4 Success Rate16.3	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord