Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Model Ranking on TruthfulQA LLM-Judge (test)
Loading...
0.49
Kendall's Tau
Adaptive Multi-Model Ranking
0.3964
0.4207
0.445
0.4693
Jan 20, 2026
Kendall's Tau
Items Evaluated
Usage Percentage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Kendall's Tau
Items Evaluated
Usage Percentage
Adaptive Multi-Model Ranking
Ranking Strategy=Adaptive
2026.01
0.49
93
2.9
Baseline
Ranking Strategy=Baseline
2026.01
0.4
-
-
Feedback
Search any
task
Search any
task