Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks

About

Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs are ineffective in this problem space because: (i) most models lack polymer-specific knowledge, and (ii) existing aligned models have limited coverage of knowledge and capabilities relevant to polymer design. Addressing this, we introduce PolyBench, a large-scale training and test benchmark dataset of more than 125K polymer design-related tasks, leveraging a knowledge base of more than 13 million data points obtained from experimental and synthetic data sources to ensure broad coverage of polymers and their properties. For effective alignment using PolyBench, we introduce a knowledge-augmented reasoning distillation method that augments this dataset with structured CoT. Furthermore, tasks in PolyBench are organized from simple to complex analytical reasoning problems, enabling generalization tests and diagnostic probes across the problem space. Experiments show that small- and mid- sized language models (SLMs) with 7B to 32BB parameters, trained on PolyBench, outperform similar-sized models and remain competitive with closed-source frontier LLMs on PolyBench's test dataset, while demonstrating performance gains on external polymer benchmarks. Dataset and associated code available at https://github.com/StonyBrookNLP/PolyBench.

Dikshya Mohanty, Mohammad Saqib Hasan, Syed Mostofa Monsur, Size Zheng, Benjamin Hsiao, Niranjan Balasubramanian• 2026

Related benchmarks

Task	Dataset	Result
Advanced Property Reasoning	PolyBench (test)	RgL0.39	19
Design & Synthesis	PolyBench (test)	Similarity Score82	19
Polymer Concepts	PolyBench 1.0 (test)	RgL Score0.33	19
Property Prediction	PolyBench (test)	r0.98	19
Structural Understanding	PolyBench 1.0 (test)	EM87	19
Property Comparison & Ranking	PolyBench (test)	MCQ Accuracy65	19
Polymer Science Response Quality Evaluation	External Polymer Benchmarks Blk, ChemD, Llml 1.0 (test)	Blk Score4.93	13

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord