Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

About

Large language models (LLMs) have demonstrated exceptional performance in reasoning tasks such as mathematics and coding, matching or surpassing human capabilities. However, these impressive reasoning abilities face significant challenges in specialized domains. Taking Go as an example, although AlphaGo has established the high performance ceiling of AI systems in Go, mainstream LLMs still struggle to reach even beginner-level proficiency, let alone perform natural language reasoning. This performance gap between general-purpose LLMs and domain experts is significantly limiting the application of LLMs on a wider range of domain-specific tasks. In this work, we aim to bridge the divide between LLMs' general reasoning capabilities and expert knowledge in domain-specific tasks. We perform mixed fine-tuning with structured Go expertise and general long Chain-of-Thought (CoT) reasoning data as a cold start, followed by reinforcement learning to integrate expert knowledge in Go with general reasoning capabilities. Through this methodology, we present \textbf{LoGos}, a powerful LLM that not only maintains outstanding general reasoning abilities, but also conducts Go gameplay in natural language, demonstrating effective strategic reasoning and accurate next-move prediction. LoGos achieves performance comparable to human professional players, substantially surpassing all existing LLMs. Through this work, we aim to contribute insights on applying general LLM reasoning capabilities to specialized domains. We will release the first large-scale Go dataset for LLM training, the first LLM Go evaluation benchmark, and the first general LLM that reaches human professional-level performance in Go at: https://github.com/Entarochuan/LoGos.

Yichuan Ma, Linyang Li, Yongkang Chen, Peiji Li, Jiasheng Ye, Qipeng Guo, Dahua Lin, Kai Chen• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical Problem SolvingMATH
Accuracy96.5
166
Code GenerationLiveCodeBench
Average Score50.9
68
Mathematical Problem SolvingAIME
AIME Score56.7
35
General ReasoningBBEH--
19
Playing GoKataGo-Bench 1K
Accuracy88.6
15
General Reasoning (Korean)KOR-Bench
Score74.8
11
Scientific ReasoningGPQA Diamond
Score63.6
11
Showing 7 of 7 rows

Other info

Follow for update