Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction

About

Modern SMILES-based chemical language models obtain strong MoleculeNet performance by treating SMILES as generic text and compensating with multi-million-molecule self-supervised pretraining. We ask: when a domain carries structural priors as rich as chemistry's, does it warrant a domain-native transformer rather than a generic one rescued by scale? We answer affirmatively with \textbf{GM-Net} (Geometric Measure Network), a transformer family in which every module is replaced by a sphere-native counterpart, and instantiate it as \textbf{Chem-GMNet}. Three blocks follow: SH-Embedding (tokens as learnable directions on $S^{k-1}$ lifted through a Gegenbauer feature map); DualSKA (a per-head fusion of a linear-time gated Sphere-Flow recurrence whose persistent state we prove is the truncated multipole expansion of the input distribution, and a softmax Sphere-Kernel branch over the same Schoenberg-valid kernel); and SH-FFN (sphere projection $\to$ Gegenbauer lift $\to$ moment readout). On canonical DeepChem scaffold splits, against same-shape ChemBERTa-2 baselines under the chemberta3-faithful protocol: (i) random-initialised, Chem-GMNet wins on 7 of 10 MoleculeNet endpoints at $\sim\!35\%$ fewer parameters; (ii) pretrained on the same 10M-SMILES ZINC corpus as ChemBERTa-2 MLM-10M, it matches or beats the public release on 6 of 8 shared endpoints (5/7 excluding a known ClinTox release anomaly). A $(k,L)$ ablation shows that increasing the sphere dimension from $k\!=\!8$ to $k\!=\!10$ at fixed $L\!=\!3$ lowers ESOL RMSE to $0.938$ at scratch, beating pretrained ChemBERTa-2 MLM-10M on this endpoint without any pretraining at all.

Deepak Warrier, Raja Sekhar Pappala• 2026

Related benchmarks

TaskDatasetResultRank
Molecular property predictionMoleculeNet BBBP (scaffold)
ROC AUC69.8
142
ClassificationMoleculeNet BBBP (test)
ROC AUC0.722
59
Molecular property predictionMoleculeNet ClinTox (scaffold)
ROC-AUC0.983
47
Molecular Property ClassificationMoleculeNet BACE
ROC AUC77.3
47
Molecular Property ClassificationBACE (MoleculeNet) scaffold (test)
ROC-AUC0.773
44
Molecular Property ClassificationMoleculeNet ClinTox
ROC-AUC99.5
39
RegressionMoleculeNet LIPO
RMSE0.932
30
RegressionMoleculeNet BACE (test)
RMSE1.103
25
RegressionMoleculeNet ESOL
RMSE0.938
19
RegressionMoleculeNet Clearance
RMSE49.36
16
Showing 10 of 16 rows

Other info

Follow for update