Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Report writing on Task 3 Cross-domain
Loading...
8.23
LLM-as-Judge Score
IoDResearch
7.4188
7.6294
7.84
8.0506
Oct 2, 2025
LLM-as-Judge Score
Human Expert Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
LLM-as-Judge Score
Human Expert Score
IoDResearch
multi-agent=true
2025.10
8.23
6.45
DeepSearcher
2025.10
8.08
6.02
IoDResearch
multi-agent=false
2025.10
7.92
5.94
Light RAG
2025.10
7.86
5.88
Zero-shot LLM
zero-shot=true
2025.10
7.45
5.23
Feedback
Search any
task
Search any
task