Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
About
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH500 (test) | -- | 895 | |
| Mathematical Reasoning | AIME 2024 (test) | -- | 209 | |
| Mathematical Reasoning | OlympiadBench (test) | -- | 40 | |
| Mathematical Reasoning | AMC 2023 (test) | Avg@8 Success Rate68.7 | 12 | |
| Mathematical Reasoning | GSM8K | Training Steps to Target Accuracy160 | 12 | |
| Mathematical Reasoning | MATH500 | Training Steps to Target Accuracy160 | 12 | |
| Mathematical Reasoning | OlympiadBench | Training Steps to Target Accuracy180 | 12 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy (Avg@8)90.9 | 12 | |
| Mathematical Reasoning | Aggregate AIME, AMC, GSM8K, MATH, MNV, OLPD | Avg Training Steps to Target Acc180 | 12 | |
| Mathematical Reasoning | AMC23 | Training Steps140 | 12 |