Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generative sense-making QA on LongBench

0.6573Comprehensiveness

HGMEM

0.5629720.5874610.611950.636439Dec 30, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2025.12
0.65730.6974
2025.12
0.64180.6651
2025.12
0.63620.6598
2025.12
0.62180.6582
2025.12
0.61620.642
2025.12
0.61550.6337
2025.12
0.61450.6356
2025.12
0.61410.6225
2025.12
0.60820.6273
2025.12
0.60780.6216
2025.12
0.60740.6128
2025.12
0.60390.6402
2025.12
0.58920.6127
2025.12
0.56660.608