Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MCQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Answer-position balance evaluationMCQ
MCQ TV0.061
34
Distractor GenerationMCQ (test)
P@122.39
17
Series ComparisonMCQ2
Accuracy67
15
Analogical ReasoningMCQ
Accuracy46
14
Distractor GenerationMCQ
P@130.5
12
DetectionMCQ
Detection Score71.6
5
PreventionMCQ
gpt-5.1 Score99.3
5
Distractor GenerationMCQ dataset
Relevance4.45
5
Showing 8 of 8 rows