Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-a-Judge on JudgeBench

84.19Accuracy

DeepSeek-V3

33.032446.313759.59572.8763Jan 7, 2026Jan 19, 2026Jan 31, 2026Feb 13, 2026Feb 25, 2026Mar 9, 2026Mar 22, 2026
Updated 25d ago

Evaluation Results

MethodLinks
84.19--
2026.01
83.87--
2026.01
82.42--
80.48--
2026.01
79.75--
2026.01
79.45--
2026.01
74--
2026.03
62.268.436.9
2026.03
61.474.245.1
2026.01
60.4--
2026.03
60.17045.3
2026.03
59.475.243.4
2026.03
57.158.332.4
2026.03
56.863.533.1
2026.03
56.26432.5
2026.03
55.462.129.7
2026.03
54.359.637.4
2026.03
53.359.230.4
2026.03
52.96128.9
2026.03
52.226.114.9
2026.03
51.248.829.8
2026.03
50.462.634.8
2026.03
50.22310.9
2026.03
49.756.431.3
2026.03
49.315.77.1
2026.03
49.116.27.2
2026.03
48.256.128.2
2026.03
43.945.516.5
2026.03
3534.86.1