Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scientific Reasoning on TheoremQA (test)

48.4Accuracy

GPT-4-Turbo-0409

28.22433.46238.743.938May 6, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.05
48.4
2024.05
34.9
2024.05
34.1
2024.05
32.5
2024.05
32.2
2024.05
32.2
2024.05
30.4
2024.05
29.2
2024.05
29