Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scaling-Aware Adapter for Structure-Grounded LLM Reasoning

About

Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query connectors. Such architectures either omit the geometric grounding requisite for mitigating structural hallucinations, or impose inflexible modality fusion bottlenecks that concurrently over-compress and suboptimally allocate structural tokens, thereby impeding the realization of generalized all-atom reasoning. We introduce Cuttlefish, a unified multimodal LLM that grounds language reasoning in geometric cues while scaling modality tokens with structural complexity. First, Scaling-Aware Patching leverages an instruction-conditioned gating mechanism to generate variable-size patches over structural graphs, adaptively scaling the query token budget with structural complexity to mitigate fixed-length connector bottlenecks. Second, Geometry Grounding Adapter refines these adaptive tokens via cross-attention to modality embeddings and injects the resulting modality tokens into the LLM, exposing explicit geometric cues to reduce structural hallucination. Experiments across interdisciplinary all-atom benchmarks demonstrate that Cuttlefish achieves superior performance in heterogeneous structure-grounded reasoning. Code: github.com/zihao-jing/Cuttlefish.

Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Yi Li, Yan Sun, Boyu Wang, Pingzhao Hu• 2026

Related benchmarks

TaskDatasetResultRank
Forward reaction predictionMol-Instructions--
30
Reagent PredictionMol-Instructions--
30
RetrosynthesisMol-Instructions--
30
Molecule CaptioningMol-Instructions
ROUGE-L0.766
17
Multimodal ReasoningGEO-AT Molecule
METEOR0.415
17
Multimodal ReasoningGEO-AT Protein
METEOR41.7
17
Multimodal ReasoningGEO-AT DNA
METEOR52.9
17
Multimodal ReasoningGEO-AT RNA
METEOR0.491
17
Entity recognitionMol-Instructions
F1 Score78
13
Interaction ExtractionMol-Instructions
F1 Score27
13
Showing 10 of 23 rows

Other info

Follow for update