Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Estimation on Internal Verification Suite GPT-4.1-mini (test)
Loading...
0.912
Q1 Attribution Present (AUROC)
Logprob
0.80176
0.83038
0.859
0.88762
May 11, 2026
Q1 Attribution Present (AUROC)
Q2 Attribution Accuracy (AUROC)
Q4 Job Detail Accuracy (AUROC)
Q5 Skill Relevancy (AUROC)
Q7 Logical Plausibility (AUROC)
Q8 Grammar (AUROC)
Q11 Overpromising (AUROC)
Updated 21d ago
Evaluation Results
Method
Method
Links
Q1 Attribution Present (AUROC)
Q2 Attribution Accuracy (AUROC)
Q4 Job Detail Accuracy (AUROC)
Q5 Skill Relevancy (AUROC)
Q7 Logical Plausibility (AUROC)
Q8 Grammar (AUROC)
Q11 Overpromising (AUROC)
Logprob
Base Model=GPT-4.1-mini
2026.05
0.912
0.716
0.706
0.84
0.905
0.995
0.602
Verbalized
Base Model=GPT-4.1-mini
2026.05
0.851
0.632
-
0.61
-
-
0.735
VERDI CV
Base Model=GPT-4.1-min...
2026.05
0.806
0.825
0.915
0.877
0.905
1
0.459
Feedback
Search any
task
Search any
task