Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multiple-choice Question Answering on LongBench v2 (val)

37.8Overall Accuracy

o1-mini

26.15229.17632.235.224Sep 10, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.09
37.838.937.148.633.328.614
2024.09
34.43832.241.730.729.615
2024.09
31.833.330.937.828.428.7-
2024.09
31.632.331.241.127.424.116
2024.09
31.231.331.239.42725.9-
2024.09
3132.829.938.327.92517
2024.09
30.833.928.937.827.425.918
2024.09
30.230.729.933.929.82519
2024.09
3030.729.63527.925.920
2024.09
3030.729.640.624.224.121
2024.09
29.834.42736.72724.122
2024.09
29.331.128.231.828.626.223
2024.09
28.230.22636.824.221.5-
2024.09
27.830.226.436.723.721.324
2024.09
26.629.724.837.819.522.225