Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Truthfulness and Calibration Evaluation on Cross-LVLM Pooled Average (GQA, POPE, etc.)
Loading...
7.1
ECE
BICR
5.864
14.207
22.55
30.893
May 11, 2026
ECE
Brier Score
Accuracy
F1 Score
AUCPR
AUROC
Updated 21d ago
Evaluation Results
Method
Method
Links
ECE
Brier Score
Accuracy
F1 Score
AUCPR
AUROC
BICR
Aggregation=Cross-LVLM...
2026.05
7.1
18.4
71.5
76.9
87.5
78.6
II
Aggregation=Cross-LVLM...
2026.05
8.3
19.9
69
78.4
84.3
74.8
PIK
Aggregation=Cross-LVLM...
2026.05
9.3
19.4
69.8
78.3
86.3
76.6
SAPLMA
Aggregation=Cross-LVLM...
2026.05
12.2
21.3
69.5
79
81.6
72.9
CCPS
Aggregation=Cross-LVLM...
2026.05
15.3
27.1
63.8
72.1
72.8
63.1
PE
Aggregation=Cross-LVLM...
2026.05
18.5
26.1
63
77.1
69.3
60
Self-Probing
Aggregation=Cross-LVLM...
2026.05
27.5
29.8
64.5
77.5
77.8
65.6
P(True)
Aggregation=Cross-LVLM...
2026.05
38
39.8
52.1
58.6
72.8
55.7
Feedback
Search any
task
Search any
task