Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on GeoShape BBEH
Loading...
43
Accuracy
MemAPO
1.4
12.2
23
33.8
Mar 23, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
MemAPO
Backbone=GPT-4o-mini
2026.03
43
RaR
Backbone=GPT-4o-mini
2026.03
30
SPO
Backbone=Qwen3-8B
2026.03
25
Step-Back
Backbone=GPT-4o-mini
2026.03
25
MemAPO
Backbone=Qwen3-8B
2026.03
24
SPO
Backbone=GPT-4o-mini
2026.03
24
TextGrad
Backbone=GPT-4o-mini
2026.03
23
OPRO
Backbone=GPT-4o-mini
2026.03
18
CoT
Backbone=GPT-4o-mini
2026.03
17
TextGrad
Backbone=Qwen3-8B
2026.03
14
ProTeGi
Backbone=GPT-4o-mini
2026.03
14
IO
Backbone=GPT-4o-mini
2026.03
13
PromptBreeder
Backbone=GPT-4o-mini
2026.03
13
PromptBreeder
Backbone=Qwen3-8B
2026.03
10
OPRO
Backbone=Qwen3-8B
2026.03
9
Step-Back
Backbone=Qwen3-8B
2026.03
8
ProTeGi
Backbone=Qwen3-8B
2026.03
7
IO
Backbone=Qwen3-8B
2026.03
6
CoT
Backbone=Qwen3-8B
2026.03
6
RaR
Backbone=Qwen3-8B
2026.03
3
Feedback
Search any
task
Search any
task