Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs

About

Large language models (LLMs) have shown strong knowledge reserves and task-solving capabilities, but still face the challenge of severe hallucination, hindering their practical application. Though scientific theories and rules can efficiently direct the behaviors of human manipulators, LLMs still do not utilize these highly-condensed knowledge sufficiently through training or prompting. To address this issue, we propose \textbf{SciDC}, an LLM generation method that integrate subject-specific knowledge with strong constraints. By adopting strong LLMs to automatically convert flexible knowledge into multi-layered, standardized rules, we build an extensible framework to effectively constrain the model generation on domain tasks. Experiments on scientific tasks including industrial formulation design, clinical tumor diagnosis and retrosynthesis planning, consistently demonstrate the effectiveness of our method, achieving a 12\% accuracy improvement on average compared with vanilla generation. We further discuss the potential of LLMs in automatically inductively summarizing highly-condensed knowledge, looking ahead to practical solutions for accelerating the overall scientific research process. All the code of this paper can be obtained (https://github.com/Maotian-Ma/SciDC).

Maotian Ma, Zheni Zeng, Zhenghao Liu, Yukun Yan• 2026

Related benchmarks

Task	Dataset	Result
Legal Reasoning	LegalBench Hearsay	Accuracy86.46	16
Formulation QA	Formulation QA (Standard)	Accuracy56.7	12
Retrosynthesis	Retrosynthesis	Validity100	8
Tumor diagnosis	Tumor diagnosis	Validity100	8
Formulation design	Formulation design	Validity75.5	8
Formulation QA	Formulation QA (OOD)	Accuracy38.3	6
Constraint Code Generation	Tumor Diagnosis Human Evaluation Samples	Correctness5	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord