Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PersonalLLM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward ModelingPersonalLLM Near Uniform (α=0.1) Overall
Accuracy96.6
7
Reward ModelingPersonalLLM Near Uniform (α=0.1) Unseen
Accuracy96.4
7
Reward ModelingPersonalLLM Near Uniform (α=0.1) Seen
Accuracy96.8
7
Reward ModelingPersonalLLM Moderately Diverse (α=0.01) Overall
Accuracy95.1
7
Reward ModelingPersonalLLM Moderately Diverse (α=0.01) Unseen
Accuracy94.7
7
Reward ModelingPersonalLLM Moderately Diverse (α=0.01) Seen
Accuracy95.5
7
Reward ModelingPersonalLLM Very Diverse (α=0.001) Overall
Accuracy95.3
7
Reward ModelingPersonalLLM Very Diverse (α=0.001) Unseen
Accuracy95.1
7
Reward ModelingPersonalLLM Very Diverse (α=0.001) Seen
Accuracy95.6
7
Showing 9 of 9 rows