Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks

About

Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs prove ineffective on this problem space because: (i) most models lack polymer-specific knowledge (ii) existing aligned models lack coverage of knowledge and capabilities relevant to polymer design. Addressing this, we introduce PolyBench, a large scale training and test benchmark dataset of more than 125K polymer design related tasks, leveraging a knowledge base of 13M+ data points obtained from experimental and synthetic sources to ensure broad coverage of polymers and their properties. For effective alignment using PolyBench, we introduce a knowledge-augmented reasoning distillation method that augments this dataset with structured CoT. Furthermore, tasks in PolyBench are organized from simple to complex analytical reasoning problems, enabling generalization tests and diagnostic probes across the problem space. Experiments show that small language models (SLMs), of 7B to 14B parameters, trained on PolyBench data outperform similar sized models, and even closed source frontier LLMs on PolyBench test dataset while demonstrating gains on other polymer benchmarks as well.

Dikshya Mohanty, Mohammad Saqib Hasan, Syed Mostofa Monsur, Size Zheng, Benjamin Hsiao, Niranjan Balasubramanian• 2026

Related benchmarks

TaskDatasetResultRank
Advanced Property ReasoningPolyBench (test)
RgL0.39
19
Design & SynthesisPolyBench (test)
Similarity Score82
19
Polymer ConceptsPolyBench 1.0 (test)
RgL Score0.33
19
Property PredictionPolyBench (test)
r0.98
19
Structural UnderstandingPolyBench 1.0 (test)
EM87
19
Property Comparison & RankingPolyBench (test)
MCQ Accuracy65
19
Polymer Science Response Quality EvaluationExternal Polymer Benchmarks Blk, ChemD, Llml 1.0 (test)
Blk Score4.93
13
Showing 7 of 7 rows

Other info

Follow for update