Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

About

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules. The method first embeds the vertices in the molecule graph. It then constructs a compact representation for the graph by assembling the vertex embeddings in short walks in the graph, which we show is equivalent to a simple graph neural network that needs no training. The representations can thus be efficiently computed and then used with supervised learning methods for prediction. Experiments on 60 tasks from 10 benchmark datasets demonstrate its advantages over both popular graph neural networks and traditional representation methods. This is complemented by theoretical analysis showing its strong representation and prediction power.

Shengchao Liu, Mehmet Furkan Demirel, Yingyu Liang• 2018

Related benchmarks

TaskDatasetResultRank
Molecular Property ClassificationMoleculeNet BBBP
ROC AUC69.7
41
Molecular Property ClassificationMoleculeNet BACE
ROC AUC79.1
36
Molecular Property ClassificationMoleculeNet ClinTox
ROC-AUC87.5
27
Molecular Property ClassificationMoleculeNet SIDER
ROC-AUC0.668
21
Molecular Property Prediction (Classification)MoleculeNet (test)
BBBP69.7
20
RegressionMoleculeNet LIPO
RMSE0.812
19
Molecular property predictionMoleculeNet Regression--
15
RegressionMoleculeNet BACE (test)
RMSE1.318
14
Atomization energy predictionQM7 (10-fold cross validation)
MAE125.6
13
ClassificationMoleculeNet random scaffold
BBBP91.2
11
Showing 10 of 34 rows

Other info

Code

Follow for update