A Study of Adaptive Modeling Towards Robust Generalization
About
Large language models (LLMs) increasingly support reasoning over biomolecular structures, but most existing approaches remain modality-specific and rely on either sequence-style encodings or fixed-length connector tokens for structural inputs. These designs can under-expose explicit geometric cues and impose rigid fusion bottlenecks, leading to over-compression and poor token allocation as structural complexity grows. We present a unified all-atom framework that grounds language reasoning in geometric information while adaptively scaling structural tokens. The method first constructs variable-size structural patches on molecular graphs using an instruction-conditioned gating policy, enabling complexity-aware allocation of query tokens. It then refines the resulting patch tokens via cross-attention with modality embeddings and injects geometry-informed tokens into the language model to improve structure grounding and reduce structural hallucinations. Across diverse all-atom benchmarks, the proposed approach yields consistent gains in heterogeneous structure-grounded reasoning. An anonymized implementation is provided in the supplementary material.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Forward reaction prediction | Mol-Instructions | -- | 24 | |
| Reagent Prediction | Mol-Instructions | -- | 24 | |
| Retrosynthesis | Mol-Instructions | -- | 24 | |
| Molecule Captioning | Mol-Instructions | ROUGE-L0.766 | 17 | |
| Multimodal Reasoning | GEO-AT Molecule | METEOR0.415 | 17 | |
| Multimodal Reasoning | GEO-AT Protein | METEOR41.7 | 17 | |
| Multimodal Reasoning | GEO-AT DNA | METEOR52.9 | 17 | |
| Multimodal Reasoning | GEO-AT RNA | METEOR0.491 | 17 | |
| Entity recognition | Mol-Instructions | F1 Score78 | 13 | |
| Interaction Extraction | Mol-Instructions | F1 Score27 | 13 |