Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Persona

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM SteeringPersona
Average Trait Score81.75
18
Watch duration predictionPersona B
SMAPE0.86
4
Watch duration predictionPersona A
SMAPE0.617
4
Stylized Dialogue148-query style-1 persona (test)
Context Score4.257
3
Dialogue Quality EvaluationPersona High Info
BF1 (qt, at)0.62
1
Dialogue Quality EvaluationPersona Med. Info
BF1 (qt, at)61
1
Dialogue Quality EvaluationPersona Low Info
BF1 (qt, at)61
1
Dialogue Quality EvaluationPersona Sing. Inst.
BF1 (qt, at)0.58
1
Showing 8 of 8 rows