Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Persona Induction on AOL (test)
Loading...
71.8
Coherence
pi_theta (Ours)
55.68
59.865
64.05
68.235
Apr 28, 2026
Coherence
Alignment
Truthfulness
Final Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Coherence
Alignment
Truthfulness
Final Score
pi_theta (Ours)
Backbone=Gemma3-27B
2026.04
71.8
94.4
79.1
81.7
GPT-oss-120b
Evaluation protocol=pr...
2026.04
61
95.5
65.4
73.9
GPT-5.1
Evaluation protocol=pr...
2026.04
60.5
94.2
60.7
71.8
PersonaXs
Backbone=Gemma3-27B
2026.04
59.7
86.5
59.9
68.6
PersonaXr
Backbone=Gemma3-27B
2026.04
59.4
71.5
42.8
57.8
Qwen3-80b
Evaluation protocol=pr...
2026.04
57.3
83.5
54.6
65.1
Claude-4.5
Evaluation protocol=pr...
2026.04
56.3
95.9
63.5
71.8
Feedback
Search any
task
Search any
task