Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Reasoning on MineSweeper (test)
Loading...
48.2
Success Rate
RETROAGENT
4.832
16.091
27.35
38.609
Mar 9, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
RETROAGENT
Evaluation Protocol=RL...
2026.03
48.2
RETROAGENT
Evaluation Protocol=RL...
2026.03
47.9
GiGPO
Evaluation Protocol=Fi...
2026.03
41.1
GRPO w/ EMPG
Evaluation Protocol=Fi...
2026.03
40.1
GRPO
Evaluation Protocol=Fi...
2026.03
39.3
LAMER
Evaluation Protocol=Fi...
2026.03
33.3
RLOO
Evaluation Protocol=Fi...
2026.03
32.8
EvolveR
Evaluation Protocol=Fi...
2026.03
11.7
Reflexion
Evaluation Protocol=Pr...
2026.03
7.4
ReAct
Evaluation Protocol=Pr...
2026.03
7
MemRL
Evaluation Protocol=Fi...
2026.03
7
Qwen-2.5-7B-Instruct
Evaluation Protocol=Ze...
2026.03
6.5
Feedback
Search any
task
Search any
task