Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on Theorem (test)
Loading...
87.3
AUROC
TRACED
85.116
85.683
86.25
86.817
Mar 11, 2026
AUROC
p-value
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
p-value
TRACED
Base Model=DeepSeek-R1
2026.03
87.3
0.008
SAPLMA
Base Model=DeepSeek-R1...
2026.03
85.2
-
Feedback
Search any
task
Search any
task