Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Utility Maintenance on RWKU Utility Set 1.0
Loading...
66.1
Generation Score
Before
65.372
65.561
65.75
65.939
Dec 19, 2025
Generation Score
Reasoning Score
Truthfulness Score
Factuality Score
Fluency Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Generation Score
Reasoning Score
Truthfulness Score
Factuality Score
Fluency Score
Before
Base Model=Llama3.1-In...
2025.12
66.1
45.2
39.5
55.3
694
CAE
Base Model=Llama3.1-In...
2025.12
66
45.1
39.6
55.2
695
Before
Base Model=Llama3-Inst...
2025.12
65.7
42.3
36.8
53.5
705.8
CAE
Base Model=Llama3-Inst...
2025.12
65.4
38.7
40.4
52.1
708
Feedback
Search any
task
Search any
task