Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Personalized Response Generation on Real-world failure cases from large-scale commercial PA
Loading...
73.4
Macro Accuracy
RP-Reasoner
10.168
26.584
43
59.416
Jan 23, 2026
Macro Accuracy
Micro Accuracy
Error Severity (Judge)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Macro Accuracy
Micro Accuracy
Error Severity (Judge)
RP-Reasoner
Reasoning=Rational Per...
2026.01
73.4
83.4
1.07
Vanilla
Prompting=Vanilla
2026.01
48.2
54.8
1.899
CoT
Prompting=Chain-of-Tho...
2026.01
34.2
49.7
2.216
Reminder
Prompting=Reminder
2026.01
12.6
24.1
3.271
Feedback
Search any
task
Search any
task