Learning to Learn: Meta-Critic Networks for Sample Efficient Learning
About
We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For supervised learning, this corresponds to the novel idea of a trainable task-parametrised loss generator. This meta-critic approach provides a route to knowledge transfer that can flexibly deal with few-shot and semi-supervised conditions for both reinforcement and supervised learning. Promising results are shown on both reinforcement and supervised learning problems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mountain Car | Mountain Car held-out domains random mountain heights (test) | Avg. Failure Rate5 | 6 | |
| Reinforcement Learning | Cart-Pole OpenAI Gym (3 held-out domains (variable pole length and cart mass)) | Return144.2 | 6 | |
| Reinforcement Learning | Cart-Pole Domain Generalization - Pole Length OpenAI Gym (3 held out domains) | Average Reward97.39 | 6 |