Share your thoughts, 1 month free Claude Pro on usSee more

Accuracy on BBH General Reasoning

93.2Accuracy

SABA

Updated 1mo ago

Evaluation Results

Method	Links
SABA 2026.04		93.2
Base 2026.05		92
COSE 2026.05		91.85
COSE 2026.05		91.51
COSE 2026.05		91.29
SELF-DISC. 2026.04		91
MAE 2026.05		90.8
S^2R 2026.04		90.7
GoT 2026.04		90.7
AZR 2026.05		90.6
R-Zero 2026.05		90.3
SC(k=5) 2026.04		89.6
CAP-CoT 2026.04		89.1
GPT-4-0613 2026.05		89.1
CRITIC 2026.04		88
MAE 2026.05		88
CAP-CoT 2026.04		87.9
CAP-CoT 2026.04		87.8
Self-Refine 2026.04		87.3
AoT 2026.04		87.2
CAP-CoT 2026.04		87.1
GoT 2026.04		87
ECON 2026.04		86.9
GoT 2026.04		86.2
MAD 2026.04		86.2
AoT 2026.04		86.1
CoT 2026.04		86
ECON 2026.04		86
AoT 2026.04		86
ToT 2026.04		85.9
InfiGFusion 2025.05		85.62
ECON 2026.04		85.5
ToT 2026.04		85.5
AoT 2026.04		85.4
MAD 2026.04		85.2
GoT 2026.04		84.9
MAD 2026.04		84.8
CoT-SC 2026.04		84.5
ECON 2026.04		84.4
ToT 2026.04		84.4
GoT 2026.04		84.1
ToT 2026.04		83.7
CoT-SC 2026.04		83.6
MAD 2026.04		83.5
FoT 2026.04		83.5
CoT-SC 2026.04		83.4
FuseChat 2025.05		83.37
CoT-SC 2026.04		83.2
TDA-RC 2026.03		82.9
MiniLogit 2025.05		82.68
FoT 2026.04		82.6
TDA-RC 2026.03		82.5
PromptAgent 2026.04		82.5
FoT 2026.04		82.4
FoT 2026.04		82.3
TDA-RC 2026.03		82.2
CCoT 2026.04		81.9
Self-Refine 2026.04		81.6
Mistral-Small 2025.05		81.59
PromptAgent 2026.04		81.2
InfiFusion 2025.05		80.94
Instruction Induction 2026.03		80.8
Instruction Induction 2026.03		80.5
CCoT 2026.04		80.5
HoT 2026.03		80.4
PromptAgent 2026.04		80.4
Self-Refine 2026.04		80.4
CCoT 2026.04		80.2
CCoT 2026.04		80.2
HoT 2026.03		80.1
Instruction Induction 2026.03		80.1
Role / Persona Prompting 2026.03		80.1
PromptAgent 2026.04		80.1
HoT 2026.03		80
Self-Refine 2026.04		80
Role / Persona Prompting 2026.03		79.9
Self-Refine 2026.04		79.8
CoT 2026.04		79.8
Prompt Canvas 2026.03		79.7
Prompt Canvas 2026.03		79.6
Role / Persona Prompting 2026.03		79.5
Prompt Canvas 2026.03		79.2
Direct 2026.04		78.7
CoT 2026.04		78.5
CoT 2026.04		78.3
CoT 2026.04		78.1
FuseLLM 2025.05		77.62
Qwen2.5-Instruct 2025.05		77.59
AFlow 2026.04		77.5
AFlow 2026.04		76.4
AFlow 2026.04		76
AFlow 2026.04		75.7
Qwen2.5-Coder 2025.05		75.4
AP 2026.04		74.2
Analogical Prompting 2026.03		72.8
AP 2026.04		72.8
Analogical Prompting 2026.03		72.5
AP 2026.04		72.5
Analogical Prompting 2026.03		72.2
AP 2026.04		72.2

Showing 100 of 190 rows