Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Summarization on FeedSum (test)
Loading...
0.022
ECE (Instance)
QAB
-0.21712
1.39694
3.011
4.62506
Apr 19, 2026
ECE (Instance)
GECE (Instance)
Brier Score (Instance)
Calibration Slope (Instance)
Accuracy (Instance)
ECE (Average)
GECE (Average)
Brier Score (Average)
Calibration Slope (Average)
Mean Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
ECE (Instance)
GECE (Instance)
Brier Score (Instance)
Calibration Slope (Instance)
Accuracy (Instance)
ECE (Average)
GECE (Average)
Brier Score (Average)
Calibration Slope (Average)
Mean Score
QAB
Indicator=Gain
2026.04
0.022
0.028
0.005
0.147
-0.043
0.033
0.023
0.007
0.612
0.093
GIRB
Indicator=Gain
2026.04
0.027
0.03
0.008
0.161
0.017
0.035
0.026
0.008
0.594
0.101
None
Indicator=Wins
2026.04
1
0
1
0
3
0
0
0
0
0.56
QAB
Indicator=Wins
2026.04
1
2
2
3
0
4
1
4
3
2.22
GIRB
Indicator=Wins
2026.04
6
5
5
3
3
6
6
6
3
4.78
Feedback
Search any
task
Search any
task