Share your thoughts, 1 month free Claude Pro on usSee more

Linguistically Diverse Reasoning on ProverQA

94Accuracy (Easy)

AR

Updated 2mo ago

Evaluation Results

Method	Links
AR 2025.10		94	91.4	69.6
AR 2025.10		92.8	91	70.8
GPT-4o 2025.10		81	65.4	46.4
Llama3.1 70B it 2025.10		74.8	58.8	41
Gemma2 27B it 2025.10		74.8	69	46.8
DeepSeek-R1-8B 2025.10		65.6	58.6	44.2
Llama3.1 8B 2025.10		43.6	33.6	36.8
Gemma2 9B 2025.10		39.4	29.8	25.8