Share your thoughts, 1 month free Claude Pro on usSee more

Reasoning on Combined Reasoning Benchmarks

59.29Overall Accuracy

OpenSIR

Updated 1mo ago

Evaluation Results

Method	Links
OpenSIR 2025.11		59.29
GRPOmath 2025.11		56.18
Base 2025.11		55.89
GRPOgsm8k 2025.11		55.15
Absolute Zero 2025.11		54.53
R-Zero 2025.11		53.96
WIST 2026.03		49.4
SPICE 2026.03		48.6
R-Zero 2026.03		47.5
Base Model 2026.03		46.2
OpenSIR 2025.11		40.56
GRPOmath 2025.11		37.27
GRPOgsm8k 2025.11		36.97
Base 2025.11		36.31
R-Zero 2025.11		36.25
Absolute Zero 2025.11		35.66