Share your thoughts, 1 month free Claude Pro on usSee more

Multilingual Question Answering on mGPQA

32.9Accuracy

self-cons (ours)

Updated 1mo ago

Evaluation Results

Method	Links
self-cons (ours) 2026.05		32.9
S1 2026.05		32.1
LIDR 2026.05		29.4
self-cons (ours) 2026.05		28.9
base 2026.05		28.2
S1 2026.05		25.7
self-cons (ours) 2026.05		24.9
base 2026.05		24.6
S1 2026.05		24.5
LIDR 2026.05		23.5
base 2026.05		22.8
LIDR 2026.05		18.8