From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Posted by the Big Sleep team Introduction In our previous post, Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models, we introduced our framework for large-language-model-assisted vulnerability research and demonstrated its potential by improving the state-of-the-art performance on Meta's CyberSecEval2 benchmarks. Since then, Naptime has evolved into Big Sleep, a collaboration between Google Project Zero and Google DeepMind. Today, we're excited to share the first real-world vulnerability discovered by the Big Sleep agent : an exploitable stack buffer underflow in SQLite, a widely used open source database engine. We discovered the vulnerability and reported it to the developers in [...]