| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Dafny Benchmark Overall Vericoding (aggregate) | Model union | Pass Rate82.2 | 4 | 2d ago | |
| APPS Vericoding-derived (test) | Model union | Pass Rate83 | 4 | 2d ago | |
| Dafny Hard Subset Quality-filtered (val) | Multi-turn RLVR | Pass Rate31.1 | 2 | 2d ago | |
| APPS Dafny-derived (val) | Initial RLVR | Pass Rate53.9 | 1 | 2d ago |