Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Knowledge Reasoning on GPQA Diamond

47.1Accuracy (avg@8)

DeepSeek-R1-Distill-Qwen-7B

36.38839.16941.9544.731Dec 18, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
47.1
2025.12
46.4
2025.12
43.2
2025.12
40.1
2025.12
36.8