Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pitch Evaluation Dataset

0.0014P-Value

Best Frontier (Gemini 3.1 Pro)

-0.0681180.4012970.8707121.340128Mar 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
0.0014-39.1320.87---10.173
2026.03
0.0014----39.1320.8710.173
2026.03
0.002----40.7821.369.587
2026.03
0.0029-40.7820.39---8.889
2026.03
0.0073----41.5721.357.2
2026.03
0.0104-41.5720.22---6.568
2026.03
1.74-31.0529.78----
2026.03
1.74----31.0529.78-
2026.03
-60.83------
2026.03
-60------
2026.03
-61.8------
2026.03
-61.17------
2026.03
----60.83---
2026.03
----60---
2026.03
----62.92---
2026.03
----62.14---