Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning quality evaluation on PROOFWRITER
Loading...
0.339
Somers' D
MarODE
-0.036262
0.061152
0.158565
0.255978
Mar 2, 2026
Somers' D
Updated 1mo ago
Evaluation Results
Method
Method
Links
Somers' D
MarODE
2026.03
0.339
MarODE(αβ)
2026.03
0.3386
MarODE_QUALITY(β)
2026.03
0.24
MarODE(βγ)
2026.03
0.2382
MarODE(αγ)
2026.03
0.2157
MarODE_COHERENCE(α)
2026.03
0.2141
ROSCOE_MEAN
2026.03
0.2115
ROSCOE-SA
2026.03
0.1703
ROSCOE-SS
2026.03
0.1544
MarODE_EVIDENCE(γ)
2026.03
0.1369
ROSCOE-LC
2026.03
0.0971
ROSCOE-LI
2026.03
0.0539
LLM_as_a_Judge
2026.03
0
Local_and_Global_Coherence
2026.03
-0.0067
ReCEval
2026.03
-0.0218
Feedback
Search any
task
Search any
task