ExpeL: LLM Agents Are Experiential Learners
About
The recent surge in research interest in applying large language models (LLMs) to decision-making tasks has flourished by leveraging the extensive world knowledge embedded in LLMs. While there is a growing demand to tailor LLMs for custom decision-making tasks, finetuning them for specific tasks is resource-intensive and may diminish the model's generalization capabilities. Moreover, state-of-the-art language models like GPT-4 and Claude are primarily accessible through API calls, with their parametric weights remaining proprietary and unavailable to the public. This scenario emphasizes the growing need for new methodologies that allow learning from agent experiences without requiring parametric updates. To address these problems, we introduce the Experiential Learning (ExpeL) agent. Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks. At inference, the agent recalls its extracted insights and past experiences to make informed decisions. Our empirical results highlight the robust learning efficacy of the ExpeL agent, indicating a consistent enhancement in its performance as it accumulates experiences. We further explore the emerging capabilities and transfer learning potential of the ExpeL agent through qualitative observations and additional experiments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Financial Question Answering | FiQA | Accuracy71.8 | 85 | |
| Interactive Decision-making | AlfWorld | PICK21 | 52 | |
| Web navigation and task completion | WebArena (test) | Average Task Completion21.8 | 42 | |
| Interactive web-based shopping tasks | Webshop | Score30.9 | 28 | |
| Interactive environment task success | ALFWorld (test) | Overall Success Rate59 | 20 | |
| Optimization Modeling | Optimization Modeling Datasets LogiOR OptiBench (out-of-distribution) | LogiOR Score47.8 | 10 | |
| Optimization Modeling | Optimization Modeling Datasets NLP4LP, NL4OPT, IndustryOR, MAMO (test) | NLP4LP Score79.5 | 10 | |
| Interactive Instruction Following | ALFWorld (train) | Success Rate68 | 9 | |
| Interactive Instruction Following | ALFWorld OOD | Success Rate74 | 9 | |
| Strategic game playing | Mastermind Hard | Average Return0.305 | 9 |