Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on Reasoning Benchmark Suite Aggregate

59.44Average Score

gpt-oss-puzzle-88B

41.198445.934250.6755.4058Apr 15, 2025Jun 4, 2025Jul 25, 2025Sep 13, 2025Nov 3, 2025Dec 23, 2025Feb 12, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
59.44
2026.02
59.2
2026.02
58.67
2026.02
58.19
2025.04
58.12
2025.04
57.82
2025.04
57.72
2025.04
57.49
2025.04
56.95
2025.04
56.89
2025.12
56.7
2026.02
56.64
2025.12
56
2025.04
55.66
2025.12
55.4
2025.04
55.2
2025.12
55.1
2026.02
54.93
2025.12
54.3
2025.12
53.7
2026.02
53.66
2025.04
53.3
2026.02
52.89
2025.12
52.8
2025.12
51.4
2025.12
50.7
2026.02
50.61
2025.12
49.9
2025.12
49.9
2026.02
48.38
2025.12
48.2
2025.12
46.9
2026.02
45.41
2025.04
44.75
2026.02
44.71
2025.12
41.9