Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

About

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards. We evaluate MAPO on weakly supervised program synthesis from natural language (semantic parsing). On the WikiTableQuestions benchmark, we improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our source code is available at https://github.com/crazydonkey200/neural-symbolic-machines

Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc Le, Ni Lao• 2018

Related benchmarks

TaskDatasetResultRank
Table Question AnsweringWikiTQ (test)--
92
Table Question AnsweringWikiTableQuestions (test)
Accuracy46.3
86
Table Question AnsweringWikiSQL (test)
Accuracy74.2
55
Semantic ParsingWikiSQL (test)--
27
Table-based Question AnsweringWIKITABLEQUESTIONS (dev)
Accuracy42.3
25
Table Question AnsweringWIKISQL WEAK (test)
Denotation Accuracy72.4
20
Math Word Problem SolvingMath23K
Accuracy0.208
19
Table Question AnsweringWIKISQL WEAK (dev)
Denotation Accuracy71.8
19
Table Question AnsweringWikiTQ (dev)
Denotation Acc42.7
18
Semantic ParsingWikiTableQuestions (test)
Execution Accuracy (Best)50.2
17
Showing 10 of 16 rows

Other info

Follow for update