Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Correlation with Human Majority Vote on Novelty-Bench Human Majority Vote

0.88Spearman Correlation (All)

GPT-4o

0.23520.40260.570.7374Sep 25, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
0.881110.830.640.830.85
2025.09
0.881110.830.740.710.85
2025.09
0.8311110.740.410.85
2025.09
0.80.430.820.590.660.770.660.77
2025.09
0.640.250.510.440.460.70.540.7
2025.09
0.590.370.130.450.690.590.620.73
2025.09
0.260.060.620.450.620.730.390.49