MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

About

Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.

Zizhen Li, Chuanhao Li, Yibin Wang, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Yifei Huang, Kaipeng Zhang• 2026

Related benchmarks

Task	Dataset	Result
Human Pairwise Comparison	Gaming Content Familiar Games N=60 samples	Win Rate83.3	8
Human Pairwise Comparison	Gaming Content N=60 samples (Unfamiliar Games)	Win %86.7	8
Opinion Recovery	Board Game Playtesting Dataset	Op-Rec69.77	8
Preference Alignment	Board Game Playtesting Dataset	MAE0.6576	8
Review Generation	Board Game Playtesting Dataset	Factuality98.86	8

Showing 5 of 5 rows

Other info

GitHub

Follow for update

@wizwand_team Discord