Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Interaction-Grounded Learning

About

Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vector, takes an action, and then observes a multidimensional feedback vector. This multidimensional feedback vector has no explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach.

Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad• 2021

Related benchmarks

TaskDatasetResultRank
Policy learning from action-inclusive feedbackOpenML K ≥ 3
Policy Accuracy15.65
3
Policy learning from action-inclusive feedbackOpenML (K ≥ 3, N ≥ 70,000)
Policy Accuracy11.91
3
Interaction-Grounded LearningSimulated BCI Action-Inclusive Feedback, 1% noise (test)
Policy Accuracy32.6
2
Interaction-Grounded LearningSimulated BCI Action-Inclusive Feedback, 5% noise (test)
Policy Accuracy33.75
2
Interaction-Grounded LearningSimulated BCI Action-Inclusive Feedback, 10% noise (test)
Policy Accuracy33.33
2
Showing 5 of 5 rows

Other info

Follow for update