Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UltraMedical

Benchmarks

Task NameDataset NameSOTA ResultTrend
MT-BenchUltraMedical Preference
MT Score6.9
28
AlpacaEval 2.0UltraMedical Preference
LC17.7
28
Pairwise Preference RankingUltraMedical
Q-Score0.8195
5
Reward ModelingUltraMedical (test)
Easy Score95.8
5
Showing 4 of 4 rows