Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM-as-a-judge on AlpacaEval
Loading...
78.3
Coverage
Heuristic Selection
31.396
43.573
55.75
67.927
May 14, 2026
Coverage
Success Rate
Updated 16d ago
Evaluation Results
Method
Method
Links
Coverage
Success Rate
Heuristic Selection
Cascade=M → L → O, Tar...
2026.05
78.3
0
Heuristic Selection
Cascade=L → Q → O, Tar...
2026.05
74.5
0
CSE + Predictive
Cascade=M → L → O, Tar...
2026.05
54.5
64.7
CSE + Ours
Cascade=M → L → O, Tar...
2026.05
54.5
82.1
CSE + Simulated
Cascade=M → L → O, Tar...
2026.05
52.6
72.8
CSE + Vanilla
Cascade=M → L → O, Tar...
2026.05
52
75.7
CSE + Random
Cascade=M → L → O, Tar...
2026.05
48.7
74.6
CSE + Ours
Cascade=L → Q → O, Tar...
2026.05
38.4
94.8
CSE + Predictive
Cascade=L → Q → O, Tar...
2026.05
37.6
86.9
CSE + Vanilla
Cascade=L → Q → O, Tar...
2026.05
34.6
93.4
CSE + Simulated
Cascade=L → Q → O, Tar...
2026.05
34.1
90.8
CSE + Random
Cascade=L → Q → O, Tar...
2026.05
33.2
92
Feedback
Search any
task
Search any
task