Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form generation on FreshWiki
Loading...
54.1
ROUGE-1
Agentic Reasoning
26.2488
33.4794
40.71
47.9406
Feb 7, 2025
ROUGE-1
ROUGE-L
Entity Recall
Updated 3d ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-L
Entity Recall
Agentic Reasoning
backbone=DeepSeek-R1
2025.02
54.1
19.62
18.77
STORM
backbone=DeepSeek-R1
2025.02
47.93
17.42
15.43
Search-O1
backbone=DeepSeek-R1
2025.02
41.56
16.08
12.88
RAgent
backbone=DeepSeek-R1
2025.02
30.04
14.21
9.08
RAG
backbone=DeepSeek-R1
2025.02
29.14
14.23
8.84
Direct Gen
backbone=DeepSeek-R1
2025.02
27.32
13.13
6.11
Feedback
Search any
task
Search any
task