Share your thoughts, 1 month free Claude Pro on usSee more

Automated Probing on MMLU

46Error Rate (%)

PAIR

Updated 1mo ago

Evaluation Results

Method	Links
PAIR 2026.02		46	94.87
AutoDetect 2026.02		67	-
PROBELLM 2026.02		86	-