Automated Proof Generation for Rust Code via Self-Evolution

Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference

Bibtex Paper

Authors

Tianyu Chen, Shuai Lu, Shan Lu, Yeyun Gong, Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Hao Yu, Nan Duan, Peng CHENG, Fan Yang, Shuvendu Lahiri, Tao Xie, Lidong Zhou

Abstract

Ensuring correctness is crucial for code generation. Formal verification offers adefinitive assurance of correctness, but demands substantial human effort in proofconstruction and hence raises a pressing need for automation. The primary obsta-cle lies in the severe lack of data—there is much fewer proofs than code snippetsfor Large Language Models (LLMs) to train upon. In this paper, we introduceSAFE, a framework that overcomes the lack of human-written proofs to enableautomated proof generation of Rust code. SAFE establishes a self-evolving cyclewhere data synthesis and fine-tuning collaborate to enhance the model capability,leveraging the definitive power of a symbolic verifier in telling correct proofs fromincorrect ones. SAFE also re-purposes the large number of synthesized incorrectproofs to train the self-debugging capability of the fine-tuned models, empoweringthem to fix incorrect proofs based on the verifier’s feedback. SAFE demonstratessuperior efficiency and precision compared to GPT-4o. Through tens of thousandsof synthesized proofs and the self-debugging mechanism, we improve the capa-bility of open-source models, initially unacquainted with formal verification, toautomatically write proofs for Rust code. This advancement leads to a signifi-cant improvement in performance, achieving a 52.52% accuracy rate in a bench-mark crafted by human experts, a significant leap over GPT-4o’s performance of14.39%.