Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop Reasoning and Fact-checking on FRAMES

90.6Average @3

Tongyi-DeepResearch-30B

56.865.57574.3583.125Nov 14, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.11
90.6
2025.11
87.1
2025.11
85.4
2025.11
85
2025.11
84
2025.11
83.7
2025.11
82.8
2025.11
80.7
2025.11
80.6
2025.11
80.2
2025.11
78.8
2025.11
75.7
2025.11
58.1