Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pointwise evaluation on BIGGEN

0.584Spearman Corr

Human-crafted (existing) Rubrics

0.267840.349920.4320.51408May 28, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.05
0.5840.598
2026.05
0.5830.609
2026.05
0.5520.594
0.510.521
0.490.504
2026.05
0.4770.496
2026.05
0.4770.496
2026.05
0.4760.494
2026.05
0.4610.453
2026.05
0.4540.489
2026.05
0.4460.462
2026.05
0.4460.462
0.4450.452
0.4410.461
2026.05
0.440.455
0.440.447
0.4390.459
2026.05
0.4360.45
2026.05
0.4310.426
2026.05
0.4240.455
2026.05
0.420.434
2026.05
0.420.434
2026.05
0.3840.382
2026.05
0.3780.361
2026.05
0.3660.401
2026.05
0.3430.392
2026.05
0.3370.353
2026.05
0.3320.383
2026.05
0.3250.336
2026.05
0.3180.361
2026.05
0.3180.361
2026.05
0.280.307