Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Reasoning on GPQA Diamond (pass@1)

69.5pass@1

SPLA

Updated 4mo ago

Evaluation Results

Method	Links
SPLA 2026.01		69.5
SPA 2026.01		69.2
InfLLM-v2 2026.01		68.7
Dense Attention 2026.01		68.5
NSA 2026.01		59.6
Continual LUFFY 2025.10		49
On-Policy (Continual) 2025.10		47
ExGRPO (Continual) 2025.10		42.4
GPG-Zero 2025.10		40.4
LUFFY 2025.10		39.9
On-Policy 2025.10		37.4
ExGRPO 2025.10		37.4
Qwen-Instruct 2025.10		24.7
SFT 2025.10		24.7
RePO-Zero 2025.10		24.2
SFT+RL 2025.10		24.2
Oat-Zero 2025.10		23.7
PRIME-Zero 2025.10		18.2
Qwen-Base 2025.10		11.1