Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Expert-Iteration RLVR on MedQA, HEAD-QA, ARC-C, and CaseHOLD
Loading...
4
Pathwise Clean Score
CSA
-0.16
0.92
2
3.08
May 18, 2026
Pathwise Clean Score
Non-Refusing Score
Both Success Score
Min Answer Rate
Updated 13d ago
Evaluation Results
Method
Method
Links
Pathwise Clean Score
Non-Refusing Score
Both Success Score
Min Answer Rate
CSA
replications per cell=...
2026.05
4
4
4
52
CRC
replications per cell=...
2026.05
4
0
0
2.8
LTT
replications per cell=...
2026.05
4
1
1
14.2
ConfFact
replications per cell=...
2026.05
3
1
1
25.5
Fixed-Threshold
replications per cell=...
2026.05
2
4
2
66.1
NEX-Conf
replications per cell=...
2026.05
1
4
1
79.8
ACI
replications per cell=...
2026.05
1
4
1
90.7
SAOCP
replications per cell=...
2026.05
0
4
0
90.3
Naive-Tuning
replications per cell=...
2026.05
0
4
0
87.2
Always-Act
replications per cell=...
2026.05
0
4
0
100
Feedback
Search any
task
Search any
task