Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Confidence Estimation on Internal Verification Suite GPT-4.1-mini (test)

0.912Q1 Attribution Present (AUROC)

Logprob

0.801760.830380.8590.88762May 11, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
0.9120.7160.7060.840.9050.9950.602
2026.05
0.8510.632-0.61--0.735
2026.05
0.8060.8250.9150.8770.90510.459