Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM Performance Estimation on TruthfulQA (test)

1.027MAE (%)

SparseEval

0.891.814752.73953.66425Feb 8, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
1.0270.931
2026.02
1.0830.922
2026.02
1.2430.911
2026.02
1.5770.895
2026.02
1.5890.886
2026.02
1.7180.874
2026.02
1.7330.891
2026.02
1.7560.878
2026.02
1.7580.885
2026.02
1.8080.847
2026.02
1.9280.863
2026.02
1.9580.87
2026.02
1.9730.836
2026.02
2.1050.855
2026.02
2.250.823
2026.02
2.4430.838
2026.02
2.5540.847
2026.02
3.0320.771
2026.02
3.2150.803
2026.02
4.4520.712