Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Evaluation on Curated Population (MATH-500, MMLU-Redux, SimpleQA)

82.57Accuracy

gemini-2.5-pro

37.038848.859460.6872.5006Feb 3, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
82.5768.5176.6990.0771.2384.3779.0872.7562.5186.9195.34
2026.02
81.9268.779.4889.2971.6283.3977.5572.5466.8984.8995.2
2026.02
81.5168.5279.1989.0571.7983.2277.0472.4666.2884.4995.32
2026.02
81.1467.778.488.8170.5883.0477.0672.0264.884.7995.16
2026.02
76.6959.6370.0583.7865.517873.5864.3159.2579.6191.65
2026.02
67.3949.9556.6769.8360.8665.626752.3351.4167.9173.96
2026.02
66.8151.4258.4368.661.6164.5966.552.5955.3767.1771.8
2026.02
66.0843.6653.2171.7354.4265.8666.6747.8842.770.5480.82
65.0144.4250.4768.1755.8663.6966.1746.8342.867.7275.19
2026.02
64.4548.3251.6265.4358.1362.0965.7750.3447.7665.9867.09
48.8525.9841.3449.447.2342.0945.7226.1140.7744.5251.87
46.6223.4429.1445.445.438.5348.5823.4235.1946.7642.06
2026.02
43.820.6128.7841.5942.9735.844.6620.3233.3441.6638.85
43.6418.331.5742.3743.634.2542.7717.8236.6140.3640.94
38.7917.2526.7736.6739.4829.9539.2117.5733.6736.6232.45