Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Agent Strategic Reasoning on Tic-Tac-Toe (In-domain)
Loading...
67.2
Success Rate
MAGE
-1.024
16.688
34.4
52.112
Mar 4, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
MAGE
Category=Meta-RL
2026.03
67.2
LAMER
Category=Meta-RL
2026.03
60.2
GiGPO
Category=RL
2026.03
41.4
Reflexion
Category=In-Context Le...
2026.03
24.2
GRPO
Category=RL
2026.03
21.9
ReAct
Category=In-Context Le...
2026.03
3.9
Memento
Category=Memory-based
2026.03
3.1
A-MEM
Category=Memory-based
2026.03
1.6
Feedback
Search any
task
Search any
task