Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Science Reasoning on TAL-SCQ5K CN
Loading...
64
Pass@1
MIG
48.4
52.45
56.5
60.55
Feb 1, 2026
Pass@1
Pass@8
Delta Pass@1
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@8
Delta Pass@1
MIG
Evaluation Mode=In-Dom...
2026.02
64
92
11
GRPO
Evaluation Mode=In-Dom...
2026.02
53
85
-
Base Model
Evaluation Mode=In-Dom...
2026.02
49
77
-
Feedback
Search any
task
Search any
task