Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-a-Judge on BigGen-Bench (test)

0.312Pearson Correlation

LLaDA (FS)

-0.019760.066370.15250.23863Apr 4, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
0.3120.5480.2633.73
2026.04
0.2590.5250.213.78
2026.04
0.2050.4060.1687.14
2026.04
-0.0070.0780.0687.4