Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MCQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Distractor GenerationMCQ (test)
P@122.39
17
Analogical ReasoningMCQ
Accuracy46
14
DetectionMCQ
Detection Score71.6
5
PreventionMCQ
gpt-5.1 Score99.3
5
Distractor GenerationMCQ dataset
Relevance4.45
5
Showing 5 of 5 rows