Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Persuasion on Anthropic Persuasion Dataset Persuadee: Phi-4 OOD
Loading...
32.42
Agreement Shift
LLaMa3.2 + ToMAP
16.2168
20.4234
24.63
28.8366
May 29, 2025
Agreement Shift
Updated 1d ago
Evaluation Results
Method
Method
Links
Agreement Shift
LLaMa3.2 + ToMAP
Size=3B
2025.05
32.42
DS-R1
Size=671B
2025.05
30.61
LLaMa3.2 + RL
Size=3B
2025.05
30.27
Gemma-3
Size=27B
2025.05
29.91
GPT-4o
Size=N/A
2025.05
26.56
Qwen2.5
Size=3B
2025.05
23.24
Qwen2.5 + ToMAP
Size=3B
2025.05
22.79
Qwen2.5 + SFT
Size=3B
2025.05
22.35
LLaMa3.1
Size=70B
2025.05
20.5
LLaMa3.2 + SFT
Size=3B
2025.05
20.03
LLaMa3.2
Size=3B
2025.05
18.75
Qwen2.5 + RL
Size=3B
2025.05
16.84
Feedback
Search any
task
Search any
task