Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Counseling Dialogue Generation on CounselBench Eval (test)
Loading...
4.29
Overall Score
Llama3-G2C
3.8844
3.9897
4.095
4.2003
Apr 22, 2026
Overall Score
Empathy Score
Specificity Score
Medical Advice Accuracy Score
Factual Consistency Score
Toxicity Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Overall Score
Empathy Score
Specificity Score
Medical Advice Accuracy Score
Factual Consistency Score
Toxicity Score
Llama3-G2C
Training=fine-tuned, G...
2026.04
4.29
4.26
4.3
4
3.98
1
Llama3-MAG
Training=fine-tuned
2026.04
4.12
4.09
4.12
8
3.98
1
Llama3-SQP
Training=fine-tuned
2026.04
4.07
4.06
4.08
7
3.98
1
CAMEL
Training=fine-tuned
2026.04
3.9
3.94
3.91
4
3.97
1
Feedback
Search any
task
Search any
task