Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Selene: Pioneering Automated Proof in Software Verification

About

Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Selene provides a comprehensive framework for end-to-end proof generation and a lightweight verification environment. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation. Additionally, our further proposed augmentations indicate that the challenges presented by Selene can be mitigated in future research endeavors.

Lichen Zhang, Shuai Lu, Nan Duan• 2024

Related benchmarks

TaskDatasetResultRank
Automated Theorem ProvingseL4 (val)
Proof Success Rate6.1
6
Automated Theorem ProvingseL4 (test)
Proof Success Rate7
6
Automated Theorem ProvingseL4 hard (test)
Proof Success Rate3.3
6
Automated Theorem ProvingseL4
Proof Success Rate5.6
6
Automated Theorem ProvingseL4 proof corpus (full library)
Proof Lines Count197
5
Showing 5 of 5 rows

Other info

Follow for update