Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Dafny Program Verification on HumanEvalDafny (test)
Loading...
97
Verification Success Rate (NoDiff)
SEVerA
72.768
79.059
85.35
91.641
Mar 26, 2026
Verification Success Rate (NoDiff)
Verification Success Rate
Violation Rate
Execution Time (s)
Updated 23d ago
Evaluation Results
Method
Method
Links
Verification Success Rate (NoDiff)
Verification Success Rate
Violation Rate
Execution Time (s)
SEVerA
constraints=NoDiff beh...
2026.03
97
97
0
18.2
DafnyBench baseline
2026.03
86.9
87.9
4
16.1
SEVerA (w/o constraints)
constraints=None
2026.03
84.8
88.9
5.1
15.7
LLM (Claude Sonnet 4.5)
Model=Claude Sonnet 4.5
2026.03
73.7
76.8
8.1
9.8
Feedback
Search any
task
Search any
task