Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Narrative Reasoning on WebQA (test)

0.623BLEURT

LogicAgent

0.56060.57680.5930.6092Feb 7, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
0.623
2026.02
0.613
2026.02
0.612
2026.02
0.608
2026.02
0.605
2026.02
0.605
2026.02
0.603
2026.02
0.599
2026.02
0.594
2026.02
0.587
2026.02
0.585
2026.02
0.58
2026.02
0.58
2026.02
0.563