Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-judge evaluation on AXBENCH

92.5Concept Score

SPLIT

19.738.657.576.4Feb 2, 2026
Updated 3mo ago

Evaluation Results

MethodLinks
2026.02
92.581.125
2026.02
92.582.375
2026.02
90.62582.5
2026.02
88.7581.81
2026.02
88.7582.06
2026.02
88.12575.94
2026.02
86.87582.625
2026.02
86.87581.81
2026.02
86.87582.06
2026.02
8572.435
2026.02
8577.75
2026.02
8571.625
2026.02
84.37570.815
2026.02
83.12572.69
2026.02
78.7579
2026.02
76.87556.435
2026.02
74.37565.875
2026.02
74.37570.065
2026.02
58.12552.75
2026.02
47.549.75
2026.02
23.7524.75
2026.02
22.523.565