Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
First-error detection on ProcessBench
Loading...
68.7
Accuracy
Teacher
22.628
34.589
46.55
58.511
May 13, 2026
Accuracy
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Teacher
deployment=non-deployable
2026.05
68.7
TL-Entropy
2026.05
46.3
LLM-Check
mechanism=attention
2026.05
43.8
TL-Perplexity
2026.05
43.2
Student
deployment=deployable
2026.05
34.4
Linear Probe
2026.05
24.4
Feedback
Search any
task
Search any
task