Security Blog

The latest news and insights from Google on security and safety on the Internet

Scaling security with AI: from detection to solution

31 de janeiro de 2024

Dongge Liu and Oliver Chang, Google Open Source Security Team, Jan Nowakowski and Jan Keller, Machine Learning for Security Team

The AI world moves fast, so we’ve been hard at work keeping security apace with recent advancements. One of our approaches, in alignment with Google’s Secure AI Framework (SAIF), is using AI itself to automate and streamline routine and manual security tasks, including fixing security bugs. Last year we wrote about our experiences using LLMs to expand vulnerability testing coverage, and we’re excited to share some updates.

Today, we’re releasing our fuzzing framework as a free, open source resource that researchers and developers can use to improve fuzzing’s bug-finding abilities. We’ll also show you how we’re using AI to speed up the bug patching process. By sharing these experiences, we hope to spark new ideas and drive innovation for a stronger ecosystem security.

Update: AI-powered vulnerability discovery

Last August, we announced our framework to automate manual aspects of fuzz testing (“fuzzing”) that often hindered open source maintainers from fuzzing their projects effectively. We used LLMs to write project-specific code to boost fuzzing coverage and find more vulnerabilities. Our initial results on a subset of projects in our free OSS-Fuzz service were very promising, with code coverage increased by 30% in one example. Since then, we’ve expanded our experiments to more than 300 OSS-Fuzz C/C++ projects, resulting in significant coverage gains across many of the project codebases. We’ve also improved our prompt generation and build pipelines, which has increased code line coverage by up to 29% in 160 projects.

How does that translate to tangible security improvements? So far, the expanded fuzzing coverage offered by LLM-generated improvements allowed OSS-Fuzz to discover two new vulnerabilities in cJSON and libplist, two widely used projects that had already been fuzzed for years. As always, we reported the vulnerabilities to the project maintainers for patching. Without the completely LLM-generated code, these two vulnerabilities could have remained undiscovered and unfixed indefinitely.

And more: AI-powered vulnerability fixing

Fuzzing is fantastic for finding bugs, but for security to improve, those bugs also need to be patched. It’s long been an industry-wide struggle to find the engineering hours needed to patch open bugs at the pace that they are uncovered, and triaging and fixing bugs is a significant manual toll on project maintainers. With continued improvements in using LLMs to find more bugs, we need to keep pace in creating similarly automated solutions to help fix those bugs. We recently announced an experiment doing exactly that: building an automated pipeline that intakes vulnerabilities (such as those caught by fuzzing), and prompts LLMs to generate fixes and test them before selecting the best for human review.

This AI-powered patching approach resolved 15% of the targeted bugs, leading to significant time savings for engineers. The potential of this technology should apply to most or all categories throughout the software development process. We’re optimistic that this research marks a promising step towards harnessing AI to help ensure more secure and reliable software.

Try it out

Since we’ve now open sourced our framework to automate manual aspects of fuzzing, any researcher or developer can experiment with their own prompts to test the effectiveness of fuzz targets generated by LLMs (including Google’s VertexAI or their own fine-tuned models) and measure the results against OSS-Fuzz C/C++ projects. We also hope to encourage research collaborations and to continue seeing other work inspired by our approach, such as Rust fuzz target generation.

If you’re interested in using LLMs to patch bugs, be sure to read our paper on building an AI-powered patching pipeline. You’ll find a summary of our own experiences, some unexpected data about LLM’s abilities to patch different types of bugs, and guidance for building pipelines in your own organizations.