Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Benchmark Subset Selection on LAM Evaluation Benchmark 40 tasks

0.977Pearson Correlation

Combined Embedding

0.423720.567360.7110.85464Apr 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
0.977---
2026.04
0.964---
2026.04
0.963---
2026.04
0.96---
2026.04
0.959---
2026.04
0.952---
2026.04
0.943---
2026.04
0.94---
2026.04
0.94---
2026.04
0.937---
2026.04
0.936---
2026.04
0.934---
2026.04
0.924---
2026.04
0.921---
2026.04
0.919---
2026.04
0.916---
2026.04
0.907---
2026.04
0.904---
2026.04
0.894---
2026.04
0.887---
2026.04
0.884---
2026.04
0.878---
2026.04
0.877---
2026.04
0.87---
2026.04
0.864---
2026.04
0.863---
2026.04
0.856---
2026.04
0.856---
2026.04
0.831---
2026.04
0.811---
2026.04
0.804---
2026.04
0.803---
2026.04
0.797---
2026.04
0.791---
2026.04
0.784---
2026.04
0.781---
2026.04
0.778---
2026.04
0.761---
2026.04
0.756---
2026.04
0.736---
2026.04
0.734---
2026.04
0.719---
2026.04
0.718---
2026.04
0.716---
2026.04
0.698---
2026.04
0.676---
2026.04
0.676---
2026.04
0.672---
2026.04
0.656---
2026.04
0.651---
2026.04
0.628---
2026.04
0.627---
2026.04
0.619---
2026.04
0.608---
2026.04
0.559---
2026.04
0.544---
2026.04
0.525---
2026.04
0.486---
2026.04
0.466---
2026.04
0.445---
2026.04
-0.89183164
2026.04
-0.854119300
2026.04
-0.86699300
2026.04
-0.742--
2026.04
-0.90271157
2026.04
-0.89281156
2026.04
-0.92740155
2026.04
-0.85660350
2026.04
-0.8592250
2026.04
-0.9433267