Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Coding Ability on MBPP (test)

51.58Accuracy

Alpaca-GPT4

44.705646.490348.27550.0597Mar 13, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
51.5836.03-
2026.03
49.7437.062.88
2026.03
49.4737.163.15
2026.03
48.4137.74.65
2026.03
47.8836.711.89
2026.03
47.6237.23.24
2026.03
47.4737.183.2
2026.03
47.3535.69-0.94
2026.03
47.0836.471.23
2026.03
46.5636.170.39
2026.03
46.0835.18-2.34
2026.03
44.9735.68-0.98