Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Non-Agentic Performance Evaluation on Persuade (test)
Loading...
53.2
Mean Score
Gemini 2.5 Pro
36.6224
40.9262
45.23
49.5338
Mar 5, 2026
Mean Score
Std Dev
Minimum Score
Maximum Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Mean Score
Std Dev
Minimum Score
Maximum Score
Gemini 2.5 Pro
2026.03
53.2
11.1
40
70
LLama 4 Maverick
2026.03
52.62
15.05
30
70.97
GPT-4o
2026.03
48.43
16.61
30
77.42
Claude Sonnet 4.5
2026.03
37.26
17.18
20
60
Feedback
Search any
task
Search any
task