Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction
About
Modern SMILES-based chemical language models obtain strong MoleculeNet performance by treating SMILES as generic text and compensating with multi-million-molecule self-supervised pretraining. We ask: when a domain carries structural priors as rich as chemistry's, does it warrant a domain-native transformer rather than a generic one rescued by scale? We answer affirmatively with \textbf{GM-Net} (Geometric Measure Network), a transformer family in which every module is replaced by a sphere-native counterpart, and instantiate it as \textbf{Chem-GMNet}. Three blocks follow: SH-Embedding (tokens as learnable directions on $S^{k-1}$ lifted through a Gegenbauer feature map); DualSKA (a per-head fusion of a linear-time gated Sphere-Flow recurrence whose persistent state we prove is the truncated multipole expansion of the input distribution, and a softmax Sphere-Kernel branch over the same Schoenberg-valid kernel); and SH-FFN (sphere projection $\to$ Gegenbauer lift $\to$ moment readout). On canonical DeepChem scaffold splits, against same-shape ChemBERTa-2 baselines under the chemberta3-faithful protocol: (i) random-initialised, Chem-GMNet wins on 7 of 10 MoleculeNet endpoints at $\sim\!35\%$ fewer parameters; (ii) pretrained on the same 10M-SMILES ZINC corpus as ChemBERTa-2 MLM-10M, it matches or beats the public release on 6 of 8 shared endpoints (5/7 excluding a known ClinTox release anomaly). A $(k,L)$ ablation shows that increasing the sphere dimension from $k\!=\!8$ to $k\!=\!10$ at fixed $L\!=\!3$ lowers ESOL RMSE to $0.938$ at scratch, beating pretrained ChemBERTa-2 MLM-10M on this endpoint without any pretraining at all.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular property prediction | MoleculeNet BBBP (scaffold) | ROC AUC69.8 | 142 | |
| Classification | MoleculeNet BBBP (test) | ROC AUC0.722 | 59 | |
| Molecular property prediction | MoleculeNet ClinTox (scaffold) | ROC-AUC0.983 | 47 | |
| Molecular Property Classification | MoleculeNet BACE | ROC AUC77.3 | 47 | |
| Molecular Property Classification | BACE (MoleculeNet) scaffold (test) | ROC-AUC0.773 | 44 | |
| Molecular Property Classification | MoleculeNet ClinTox | ROC-AUC99.5 | 39 | |
| Regression | MoleculeNet LIPO | RMSE0.932 | 30 | |
| Regression | MoleculeNet BACE (test) | RMSE1.103 | 25 | |
| Regression | MoleculeNet ESOL | RMSE0.938 | 19 | |
| Regression | MoleculeNet Clearance | RMSE49.36 | 16 |