Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

About

Large language models (LLMs) have demonstrated exceptional performance in reasoning tasks such as mathematics and coding, matching or surpassing human capabilities. However, these impressive reasoning abilities face significant challenges in specialized domains. Taking Go as an example, although AlphaGo has established the high performance ceiling of AI systems in Go, mainstream LLMs still struggle to reach even beginner-level proficiency, let alone perform natural language reasoning. This performance gap between general-purpose LLMs and domain experts is significantly limiting the application of LLMs on a wider range of domain-specific tasks. In this work, we aim to bridge the divide between LLMs' general reasoning capabilities and expert knowledge in domain-specific tasks. We perform mixed fine-tuning with structured Go expertise and general long Chain-of-Thought (CoT) reasoning data as a cold start, followed by reinforcement learning to integrate expert knowledge in Go with general reasoning capabilities. Through this methodology, we present \textbf{LoGos}, a powerful LLM that not only maintains outstanding general reasoning abilities, but also conducts Go gameplay in natural language, demonstrating effective strategic reasoning and accurate next-move prediction. LoGos achieves performance comparable to human professional players, substantially surpassing all existing LLMs. Through this work, we aim to contribute insights on applying general LLM reasoning capabilities to specialized domains. We will release the first large-scale Go dataset for LLM training, the first LLM Go evaluation benchmark, and the first general LLM that reaches human professional-level performance in Go at: https://github.com/Entarochuan/LoGos.

Yichuan Ma, Linyang Li, Yongkang Chen, Peiji Li, Jiasheng Ye, Qipeng Guo, Dahua Lin, Kai Chen• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Problem Solving	MATH	Accuracy96.5	229
Code Generation	LiveCodeBench	Average Score50.9	68
General Reasoning	BBEH	--	64
Mathematical Problem Solving	AIME	AIME Score56.7	52
Playing Go	KataGo-Bench 1K	Accuracy88.6	15
General Reasoning (Korean)	KOR-Bench	Score74.8	11
Scientific Reasoning	GPQA Diamond	Score63.6	11

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord