Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Moral Reasoning on UNIMORAL
Loading...
67.9
Acc (mean)
Qwen2.5-7B-Instruct
55.42
58.66
61.9
65.14
Dec 2, 2025
Acc (mean)
Acc Var
Consistency
Output Tokens Var
Updated 3mo ago
Evaluation Results
Method
Method
Links
Acc (mean)
Acc Var
Consistency
Output Tokens Var
Qwen2.5-7B-Instruct
Setting=Optimized
2025.12
67.9
0.003
40.5
20,826.8
Gemma-3-12B-Instruct
Setting=Random
2025.12
66.9
0.005
49.1
75,661.4
Gemma-3-12B-Instruct
Setting=Optimized
2025.12
64
0.0018
49.8
49,311.87
Llama-3.1-8B-Instruct
Setting=Optimized
2025.12
58.9
0.004
47.3
13,603.5
Qwen2.5-7B-Instruct
Setting=Random
2025.12
57.7
0.011
29.5
354,881.31
Llama-3.1-8B-Instruct
Setting=Random
2025.12
55.9
0.024
24
171,978.54
Feedback
Search any
task
Search any
task