Molecular Graph Convolutions: Moving Beyond Fingerprints
About
Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph---atoms, bonds, distances, etc.---which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular property prediction | QM9 (test) | mu0.7 | 229 | |
| Molecular property prediction | QM9 | Cv0.084 | 80 | |
| Molecular property prediction | BBBP | ROC AUC0.671 | 48 | |
| Molecular property prediction | ClinTox | ROC AUC83.2 | 47 | |
| Molecular Property Prediction (Regression) | ESOL | RMSE0.61 | 36 | |
| Regression | FreeSolv | RMSE1.22 | 33 | |
| Molecular property prediction | QM9 out-of-sample (test) | MAE (mu)0.101 | 31 | |
| Molecular property prediction | Tox21 | ROC AUC82 | 29 | |
| Atomization energy prediction | QM7 (10-fold cross validation) | MAE59.6 | 27 | |
| molecule property prediction | Overall | Top-1 Count4 | 8 |