Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Evaluation on DeepResearch Bench 20 reports (sampled)
Loading...
95
Readability (Win/Tie Rate)
PTAH
84.6
87.3
90
92.7
May 28, 2026
Readability (Win/Tie Rate)
Usability (Win/Tie Rate)
Information Acquisition Efficiency (Win/Tie Rate)
Overall Preference (Win/Tie Rate)
Updated 5d ago
Evaluation Results
Method
Method
Links
Readability (Win/Tie Rate)
Usability (Win/Tie Rate)
Information Acquisition Efficiency (Win/Tie Rate)
Overall Preference (Win/Tie Rate)
PTAH
Evaluator=General U2
2026.05
95
90
95
95
PTAH
Evaluator=General U1
2026.05
90
95
100
100
PTAH
Evaluator=Average
2026.05
88.75
88.75
96.25
95
PTAH
Evaluator=Expert E1
2026.05
85
90
95
95
PTAH
Evaluator=Expert E2
2026.05
85
80
95
90
Feedback
Search any
task
Search any
task