Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Ranking on TruthfulQA LLM-Judge (test)

0.49Kendall's Tau

Adaptive Multi-Model Ranking

0.39640.42070.4450.4693Jan 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
0.49932.9
2026.01
0.4--