Unlocking LLM Creativity in Science through Analogical Reasoning
About
Autonomous science promises to augment scientific discovery, particularly in complex fields like biomedicine. However, this requires AI systems that can consistently generate novel and diverse solutions to open-ended problems. We evaluate LLMs on the task of open-ended solution generation and quantify their tendency to mode collapse into low-diversity generations. To mitigate this mode collapse, we introduce analogical reasoning (AR) as a new approach to solution generation. AR generates analogies to cross-domain problems based on shared relational structure, then uses those analogies to search for novel solutions. Compared to baselines, AR discovers significantly more diverse generations (improving solution diversity metrics by 90-173%), generates novel solutions over 50% of the time (compared to as little as 1.6% for baselines), and produces high-quality analogies. To validate the real-world feasibility of AR, we implement AR-generated solutions across four biomedical problems, yielding consistent quantitative gains. AR-generated approaches achieve a nearly 13-fold improvement on distributional metrics for perturbation effect prediction, outperform all baselines on AUPRC when predicting cell-cell communication, infer brain region interactions with a high Spearman correlation ($\rho$=0.729) to published methods, and establish state-of-the-art performance on 2 datasets for oligonucleotide property prediction. The novel and diverse solutions produced by AR can be used to augment the search space of existing solution generation methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Oligonucleotide property prediction | OligoGym TLR7 (Random split) | PCC0.79 | 4 | |
| Oligonucleotide property prediction | OligoGym Ichihara (Random split) | PCC0.6 | 4 | |
| Oligonucleotide property prediction | OligoGym TLR7 Nucleobase split | PCC0.79 | 4 | |
| Oligonucleotide property prediction | OligoGym siRNAmod (Nucleobase) | PCC0.53 | 4 | |
| Oligonucleotide property prediction | OligoGym Ichihara Nucleobase | PCC0.57 | 4 | |
| Oligonucleotide property prediction | OligoGym TLR8 (Random split) | PCC0.62 | 4 | |
| Oligonucleotide property prediction | OligoGym Cytotox LNA (Random split) | PCC0.81 | 4 | |
| Oligonucleotide property prediction | OligoGym siRNAmod (Random split) | PCC0.67 | 4 | |
| Oligonucleotide property prediction | OligoGym Shmushkovich (Random split) | PCC0.37 | 4 | |
| Oligonucleotide property prediction | OligoGym TLR8 Nucleobase | PCC0.66 | 4 |