Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-evidence Aggregation on Multi-evidence Aggregation Dataset (test)

0.036ECE

Baseline (Uniform Avg)

0.030.07050.1110.1515Mar 13, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
0.03685--50-
2026.03
0.05895.40.00850.002120.013
2026.03
0.18692-0.002120.013