Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Preference Evaluation on Oral Argument Simulation (Evaluation set)

72Wins

Gemini-2.5-Pro

22.0835.044860.96Mar 5, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
72312555.615.8
2026.03
66373454.69.9
2026.03
624631518.6
2026.03
62412649.315.1
46553341.111.8
2026.03
45523240.115.1
2026.03
42582435.518.4
2026.03
36602832.918.4
2026.03
24751721.423.7