Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graduate-level Reasoning on SuperGPQA
Loading...
33.6
Accuracy
RealSafe
31.416
31.983
32.55
33.117
Sep 29, 2025
Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Accuracy
RealSafe
backbone=DS-8B, alignm...
2025.09
33.6
Base
backbone=DS-8B
2025.09
33.5
IPO
backbone=DS-8B
2025.09
31.5
Feedback
Search any
task
Search any
task