Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Response Generation Quality on General Response Quality Set
Loading...
51.8
Quality Score
DCR
35.056
39.403
43.75
48.097
Feb 10, 2026
Quality Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Quality Score
DCR
Base Model=Qwen2.5-1.5B
2026.02
51.8
STL-aug
Base Model=Qwen2.5-1.5B
2026.02
50.1
STL
Base Model=Qwen2.5-1.5B
2026.02
50
STL
Base Model=Qwen2.5-7B
2026.02
50
STL
Base Model=LLaMA-3-8B
2026.02
50
STL-aug
Base Model=Qwen2.5-7B
2026.02
49.9
STL-aug
Base Model=LLaMA-3-8B
2026.02
49.7
SCANS
Base Model=Qwen2.5-1.5B
2026.02
47
Surgical
Base Model=LLaMA-3-8B
2026.02
46.2
DCR
Base Model=LLaMA-3-8B
2026.02
46
DCR
Base Model=Qwen2.5-7B
2026.02
45.8
SCANS
Base Model=Qwen2.5-7B
2026.02
45.5
SCANS
Base Model=LLaMA-3-8B
2026.02
45.5
Surgical
Base Model=Qwen2.5-1.5B
2026.02
40.2
Surgical
Base Model=Qwen2.5-7B
2026.02
35.7
Feedback
Search any
task
Search any
task