Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Do Vision Language Models Understand Human Engagement in Games?

About

Inferring human engagement from gameplay video is important for game design and player-experience research, yet it remains unclear whether vision--language models (VLMs) can infer such latent psychological states from visual cues alone. Using the GameVibe Few-Shot dataset across nine first-person shooter games, we evaluate three VLMs under six prompting strategies, including zero-shot prediction, theory-guided prompts grounded in Flow, GameFlow, Self-Determination Theory, and MDA, and retrieval-augmented prompting. We consider both pointwise engagement prediction and pairwise prediction of engagement change between consecutive windows. Results show that zero-shot VLM predictions are generally weak and often fail to outperform simple per-game majority-class baselines. Memory- or retrieval-augmented prompting improves pointwise prediction in some settings, whereas pairwise prediction remains consistently difficult across strategies. Theory-guided prompting alone does not reliably help and can instead reinforce surface-level shortcuts. These findings suggest a perception--understanding gap in current VLMs: although they can recognize visible gameplay cues, they still struggle to robustly infer human engagement across games.

Ziyi Wang, Qizan Guo, Rishitosh Singh, Xiyang Hu• 2026

Related benchmarks

TaskDatasetResultRank
Pairwise engagement predictionBorderlands 3
Accuracy87.5
15
Pairwise engagement predictionCS:GO Office
Accuracy76.9
15
Pairwise engagement predictionBlitz Brigade
Accuracy0.714
15
Pairwise engagement predictionCorridor 7
Accuracy73
15
Pairwise engagement predictionBattlefield 42
Accuracy67.2
15
Pairwise engagement predictionApex Legends
Accuracy71.4
15
Pairwise engagement predictionCSGO 19
Accuracy70
15
Pairwise engagement predictionCSGO 18
Accuracy53.6
15
Pairwise engagement predictionCS 1.6
Accuracy66.7
15
Pointwise Engagement PredictionBorderlands 3
Accuracy84.2
15
Showing 10 of 18 rows

Other info

Follow for update