Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM-as-a-judge on TL;DR
Loading...
82.6
Coverage
Heuristic Selection
39.856
50.953
62.05
73.147
May 14, 2026
Coverage
Success
Updated 16d ago
Evaluation Results
Method
Method
Links
Coverage
Success
Heuristic Selection
Cascade=M → L → O, Tar...
2026.05
82.6
12.7
Heuristic Selection
Cascade=L → Q → O, Tar...
2026.05
71.2
31.8
CSE + Simulated
Cascade=M → L → O, Tar...
2026.05
58.9
59
CSE + Ours
Cascade=M → L → O, Tar...
2026.05
58.9
64.3
CSE + Vanilla
Cascade=M → L → O, Tar...
2026.05
57.3
60.5
CSE + Predictive
Cascade=M → L → O, Tar...
2026.05
55.4
59.6
CSE + Random
Cascade=M → L → O, Tar...
2026.05
52.9
54.7
CSE + Ours
Cascade=L → Q → O, Tar...
2026.05
47.6
79.2
CSE + Simulated
Cascade=L → Q → O, Tar...
2026.05
45.8
73.6
CSE + Predictive
Cascade=L → Q → O, Tar...
2026.05
43.8
74.9
CSE + Vanilla
Cascade=L → Q → O, Tar...
2026.05
43.5
75.4
CSE + Random
Cascade=L → Q → O, Tar...
2026.05
41.5
73.1
Feedback
Search any
task
Search any
task