Scaling-Aware Adapter for Structure-Grounded LLM Reasoning
About
Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query connectors. Such architectures either omit the geometric grounding requisite for mitigating structural hallucinations, or impose inflexible modality fusion bottlenecks that concurrently over-compress and suboptimally allocate structural tokens, thereby impeding the realization of generalized all-atom reasoning. We introduce Cuttlefish, a unified multimodal LLM that grounds language reasoning in geometric cues while scaling modality tokens with structural complexity. First, Scaling-Aware Patching leverages an instruction-conditioned gating mechanism to generate variable-size patches over structural graphs, adaptively scaling the query token budget with structural complexity to mitigate fixed-length connector bottlenecks. Second, Geometry Grounding Adapter refines these adaptive tokens via cross-attention to modality embeddings and injects the resulting modality tokens into the LLM, exposing explicit geometric cues to reduce structural hallucination. Experiments across interdisciplinary all-atom benchmarks demonstrate that Cuttlefish achieves superior performance in heterogeneous structure-grounded reasoning. Code: github.com/zihao-jing/Cuttlefish.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Forward reaction prediction | Mol-Instructions | -- | 30 | |
| Reagent Prediction | Mol-Instructions | -- | 30 | |
| Retrosynthesis | Mol-Instructions | -- | 30 | |
| Molecule Captioning | Mol-Instructions | ROUGE-L0.766 | 17 | |
| Multimodal Reasoning | GEO-AT Molecule | METEOR0.415 | 17 | |
| Multimodal Reasoning | GEO-AT Protein | METEOR41.7 | 17 | |
| Multimodal Reasoning | GEO-AT DNA | METEOR52.9 | 17 | |
| Multimodal Reasoning | GEO-AT RNA | METEOR0.491 | 17 | |
| Entity recognition | Mol-Instructions | F1 Score78 | 13 | |
| Interaction Extraction | Mol-Instructions | F1 Score27 | 13 |