Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM-as-a-judge on HH-RLHF
Loading...
81.3
Coverage
Heuristic Selection
28.988
42.569
56.15
69.731
May 14, 2026
Coverage
Success
Updated 16d ago
Evaluation Results
Method
Method
Links
Coverage
Success
Heuristic Selection
Cascade=M → L → O, Tar...
2026.05
81.3
0
Heuristic Selection
Cascade=L → Q → O, Tar...
2026.05
72.4
0
CSE + Predictive
Cascade=M → L → O, Tar...
2026.05
72.3
0
CSE + Predictive
Cascade=L → Q → O, Tar...
2026.05
58.6
0
CSE + Ours
Cascade=M → L → O, Tar...
2026.05
47.2
42.8
CSE + Random
Cascade=M → L → O, Tar...
2026.05
45.4
29.8
CSE + Simulated
Cascade=M → L → O, Tar...
2026.05
42.5
37
CSE + Vanilla
Cascade=M → L → O, Tar...
2026.05
41.3
33.4
CSE + Ours
Cascade=L → Q → O, Tar...
2026.05
36.9
51.6
CSE + Random
Cascade=L → Q → O, Tar...
2026.05
34.2
35.7
CSE + Vanilla
Cascade=L → Q → O, Tar...
2026.05
32.7
37.9
CSE + Simulated
Cascade=L → Q → O, Tar...
2026.05
31
46.9
Feedback
Search any
task
Search any
task