Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Evaluation on Human Evaluation Evil Players
Loading...
3.78
Contributed Success
GRAIL Agent
3
3.2025
3.405
3.6075
Jun 21, 2025
Contributed Success
Helpful Comments
Updated 5d ago
Evaluation Results
Method
Method
Links
Contributed Success
Helpful Comments
GRAIL Agent
Sample size (n)=30
2025.06
3.78
3.88
Human Player
Sample size (n)=28
2025.06
3.71
3.57
Reasoning Agent
Sample size (n)=30
2025.06
3.03
2.95
Feedback
Search any
task
Search any
task