Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Capability on OLMES benchmarks

51.4Average Score

Full-IT

44.01645.93347.8549.767May 7, 2026
Updated 26d ago

Evaluation Results

MethodLinks
2026.05
51.444.957.874.646.636.1
2026.05
51.345.75775.246.634.9
2026.05
51.2465875.445.333.9
2026.05
50.24456.173.845.235
2026.05
49.945.156.175.542.932.3
2026.05
49.141.754.875.643.134.3
2026.05
48.642.355.673.642.832
2026.05
45.443447140.629.9
2026.05
44.339.345.674.136.129.1