Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

About

Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures. This suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions. We show the benefits of our approach (1) on a synthetic task that captures the ability to learn a hidden data distribution, (2) with state-of-the-art results in RNA folding that reveal advantages on highly ambiguous data, and (3) demonstrating its generative capabilities on property-based molecule design by implicitly learning the underlying distributions and outperforming existing work.

J\"org K. H. Franke, Frederic Runge, Frank Hutter• 2022

Related benchmarks

TaskDatasetResultRank
RNA structure predictionTSsameSeq (test)
Avg Min Hamming Distance23.09
25
Multi-property conditional molecule generationGuacaMol
Unique Molecules430.8
16
RNA foldingTS0
F1 Score62.5
7
RNA foldingTSsameStruc
F1 Score93.2
7
Conditional Molecule GenerationGuacaMol
Validity98.1
2
Showing 5 of 5 rows

Other info

Code

Follow for update