Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Model Utility Maintenance on RWKU Utility Set 1.0
Loading...
66.1
Generation Score
Before
65.372
65.561
65.75
65.939
Dec 19, 2025
Generation Score
Reasoning Score
Truthfulness Score
Factuality Score
Fluency Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Generation Score
Reasoning Score
Truthfulness Score
Factuality Score
Fluency Score
Before
Base Model=Llama3.1-In...
2025.12
66.1
45.2
39.5
55.3
694
CAE
Base Model=Llama3.1-In...
2025.12
66
45.1
39.6
55.2
695
Before
Base Model=Llama3-Inst...
2025.12
65.7
42.3
36.8
53.5
705.8
CAE
Base Model=Llama3-Inst...
2025.12
65.4
38.7
40.4
52.1
708
Feedback
Search any
task
Search any
task