ExPT: Synthetic Pretraining for Few-Shot Experimental Design
About
Experimental design is a fundamental problem in many science and engineering fields. In this problem, sample efficiency is crucial due to the time, money, and safety costs of real-world design evaluations. Existing approaches either rely on active data collection or access to large, labeled datasets of past experiments, making them impractical in many real-world scenarios. In this work, we address the more challenging yet realistic setting of few-shot experimental design, where only a few labeled data points of input designs and their corresponding values are available. We approach this problem as a conditional generation task, where a model conditions on a few labeled examples and the desired output to generate an optimal input design. To this end, we introduce Experiment Pretrained Transformers (ExPT), a foundation model for few-shot experimental design that employs a novel combination of synthetic pretraining with in-context learning. In ExPT, we only assume knowledge of a finite collection of unlabelled data points from the input domain and pretrain a transformer neural network to optimize diverse synthetic functions defined over this domain. Unsupervised pretraining allows ExPT to adapt to any design task at test time in an in-context fashion by conditioning on a few labeled data points from the target task and generating the candidate optima. We evaluate ExPT on few-shot experimental design in challenging domains and demonstrate its superior generality and performance compared to existing methods. The source code is available at https://github.com/tung-nd/ExPT.git.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Model-Based Optimization | GFP | 90th Percentile Oracle Score3.74 | 17 | |
| Offline Model-Based Optimization | TF Bind 8 | 90th Percentile Oracle Score48 | 17 | |
| Offline Model-Based Optimization | ChEMBL | 90th Percentile Oracle Score0.62 | 17 | |
| Offline Model-Based Optimization | D'Kitty | Oracle Score (90th Pctl)0.61 | 17 | |
| Offline Model-Based Optimization | UTR | 90th Percentile Oracle Score6.7 | 17 | |
| Model-Based Optimization | Design-Bench 2022 (test) | TF-Bind-8 Score0.927 | 16 | |
| Offline Model-Based Optimization | Branin | 90th Percentile Oracle Score-23.1 | 16 | |
| Offline Model-Based Optimization | LogP | 90th Percentile Oracle Score-16.7 | 16 | |
| Model-Based Optimization | Design-Bench | LogP-15.9 | 16 | |
| Offline Model-Based Optimization | Warfarin | 90th Percentile Oracle Score-40 | 15 |