Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning quality evaluation on STRATEGYQA
Loading...
0.2735
Somers' D
MarODE(αβ)
-0.00627
0.06636
0.13899
0.21162
Mar 2, 2026
Somers' D
Updated 1mo ago
Evaluation Results
Method
Method
Links
Somers' D
MarODE(αβ)
2026.03
0.2735
ReCEval
2026.03
0.2561
MarODE
2026.03
0.2256
MarODE_QUALITY(β)
2026.03
0.2213
MarODE(βγ)
2026.03
0.1952
MarODE_COHERENCE(α)
2026.03
0.1254
MarODE(αγ)
2026.03
0.116
ROSCOE_MEAN
2026.03
0.0887
MarODE_EVIDENCE(γ)
2026.03
0.0744
ROSCOE-SA
2026.03
0.0629
ROSCOE-LI
2026.03
0.0604
Local_and_Global_Coherence
2026.03
0.0489
ROSCOE-LC
2026.03
0.0156
ROSCOE-SS
2026.03
0.0125
LLM_as_a_Judge
2026.03
0.0045
Feedback
Search any
task
Search any
task