Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Knowledge-Intensive Reasoning on HotpotQA (F1 score)

0.654F1 Score

Llama3.1-8B + ARPO

0.074720.225110.37550.52589Dec 11, 2025
Updated 2d ago

Evaluation Results

MethodLinks
2025.12
0.654
2025.12
0.59
2025.12
0.588
2025.12
0.585
2025.12
0.578
2025.12
0.577
2025.12
0.571
2025.12
0.566
2025.12
0.565
2025.12
0.559
2025.12
0.551
2025.12
0.548
2025.12
0.485
2025.12
0.243
2025.12
0.154
2025.12
0.148
2025.12
0.122
2025.12
0.097