Share your thoughts, 1 month free Claude Pro on usSee more

Automated Probing on MBPP

0.38Error Rate (%)

PAIR

Updated 1mo ago

Evaluation Results

Method	Links
PAIR 2026.02		0.38	94.85
AutoDetect 2026.02		0.46	-
PROBELLM 2026.02		0.9	-