Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Report writing on Task 3 Single-domain
Loading...
8.31
LLM-as-Judge Score
IoDResearch
7.582
7.771
7.96
8.149
Oct 2, 2025
LLM-as-Judge Score
Human Expert Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
LLM-as-Judge Score
Human Expert Score
IoDResearch
multi-agent=true
2025.10
8.31
7.01
DeepSearcher
2025.10
8.13
6.77
IoDResearch
multi-agent=false
2025.10
8.03
6.56
Light RAG
2025.10
7.95
6.53
Zero-shot LLM
zero-shot=true
2025.10
7.61
5.65
Feedback
Search any
task
Search any
task