Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Expert-level Reasoning on GPQA Diamond (Pass@1)
Loading...
50.51
Pass@1 Score
openPangu-Embedded KD
-2.0204
11.6173
25.255
38.8927
Aug 12, 2025
Sep 25, 2025
Nov 9, 2025
Dec 24, 2025
Feb 6, 2026
Mar 23, 2026
May 7, 2026
Pass@1 Score
Updated 13d ago
Evaluation Results
Method
Method
Links
Pass@1 Score
openPangu-Embedded KD
# Total Params=1B, Tra...
2026.05
50.51
openPangu-Embedded RL
# Total Params=1B, Tra...
2026.05
48.48
openPangu-Embedded SFT
# Total Params=1B, Tra...
2026.05
43.43
Full Attention
2025.08
38.9
Quest
relative budget (b)=0.15
2025.08
33.6
RetroAttention
relative budget (b)=0....
2025.08
33.6
Qwen2.5
# Total Params=1.5B
2026.05
29.8
Llama3.2
# Total Params=1B
2026.05
29.29
Qwen3
# Total Params=1.7B
2026.05
28.6
MiniCPM4
# Total Params=0.5B
2026.05
28.28
Qwen3
# Total Params=0.6B
2026.05
22.9
StreamingLLM
2025.08
19.7
Gemma3
# Total Params=1B
2026.05
19.2
TOVA
2025.08
0
Feedback
Search any
task
Search any
task