Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Correctness Prediction on TriviaQA

0.999AUROC

Self-consistency (10 samples)

0.616280.715640.8150.91436Aug 11, 2025Sep 24, 2025Nov 8, 2025Dec 23, 2025Feb 5, 2026Mar 22, 2026May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.04
0.999-
2026.05
0.8649-
2026.05
0.8591-
2026.05
0.8591-
2026.05
0.8589-
2026.05
0.8565-
2026.05
0.8523-
2026.05
0.8522-
2025.09
0.852-
2026.05
0.8472-
2026.05
0.8467-
2026.05
0.8461-
2025.09
0.846-
2026.05
0.8453-
2026.05
0.8453-
2026.05
0.8413-
2026.05
0.8402-
2026.05
0.8356-
2026.05
0.8352-
2026.05
0.8312-
2026.05
0.8311-
2025.09
0.826-
2026.05
0.8244-
2026.05
0.8237-
2026.05
0.8202-
2026.05
0.8188-
2025.08
0.818-
2026.05
0.818-
2026.05
0.8169-
2026.05
0.8152-
2026.05
0.8139-
2025.08
0.812-
2026.05
0.8101-
2025.08
0.81-
2025.09
0.807-
2026.05
0.8064-
2025.08
0.806-
2025.09
0.804-
2026.05
0.804-
2026.05
0.8037-
2026.05
0.8-
2025.09
0.796-
2025.08
0.796-
2026.05
0.7907-
2025.09
0.79-
2025.09
0.789-
2025.08
0.786-
2026.05
0.786-
2025.08
0.783-
2026.05
0.7821-
2025.08
0.781-
2026.04
0.7780.626
2026.04
0.7760.635
2026.04
0.7750.632
2025.08
0.774-
2025.08
0.774-
2026.04
0.774-
2025.08
0.773-
2026.05
0.7664-
2025.08
0.765-
2026.04
0.7650.625
2026.05
0.7641-
2026.04
0.7640.624
2026.05
0.763-
2026.05
0.7626-
2026.05
0.7624-
2026.05
0.7616-
2025.09
0.759-
2025.09
0.758-
2026.04
0.7570.653
2026.04
0.7570.62
2025.08
0.754-
2025.08
0.751-
2025.08
0.751-
2025.08
0.739-
2026.04
0.7380.625
2026.04
0.7370.612
2025.08
0.736-
2025.09
0.735-
2025.09
0.734-
2026.04
0.7330.595
2026.04
0.7260.718
2026.05
0.7256-
2026.05
0.7247-
2026.04
0.7240.591
2026.04
0.7240.626
2026.04
0.7210.627
2026.04
0.7170.624
2026.04
0.710.634
2026.04
0.7090.633
2026.04
0.701-
2026.04
0.6910.751
2026.03
0.69-
2025.08
0.683-
2026.04
0.6790.782
2025.08
0.672-
2026.04
0.6640.741
2026.04
0.6590.658
2025.09
0.643-
2025.08
0.631-
Showing 100 of 113 rows