In partnership with NVIDIA and HiddenLayer, as part of the Open Source Security Foundation, we are now launching the first stable version of our model signing library. Using digital signatures like those from Sigstore, we allow users to verify that the model used by the application is exactly the model that was created by the developers. In this blog post we will illustrate why this release is important from Google’s point of view.
With the advent of LLMs, the ML field has entered an era of rapid evolution. We have seen remarkable progress leading to weekly launches of various applications which incorporate ML models to perform tasks ranging from customer support, software development, and even performing security critical tasks.
However, this has also opened the door to a new wave of security threats. Model and data poisoning, prompt injection, prompt leaking and prompt evasion are just a few of the risks that have recently been in the news. Garnering less attention are the risks around the ML supply chain process: since models are an uninspectable collection of weights (sometimes also with arbitrary code), an attacker can tamper with them and achieve significant impact to those using the models. Users, developers, and practitioners need to examine an important question during their risk assessment process: “can I trust this model?”
Since its launch, Google’s Secure AI Framework (SAIF) has created guidance and technical solutions for creating AI applications that users can trust. A first step in achieving trust in the model is to permit users to verify its integrity and provenance, to prevent tampering across all processes from training to usage, via cryptographic signing. 
The ML supply chain
To understand the need for the model signing project, let’s look at the way ML powered applications are developed, with an eye to where malicious tampering can occur.
Applications that use advanced AI models are typically developed in at least three different stages. First, a large foundation model is trained on large datasets. Next, a separate ML team finetunes the model to make it achieve good performance on application specific tasks. Finally,  this fine-tuned model is embedded into an application.
The three steps involved in building an application that uses large language models.
These three stages are usually handled by different teams, and potentially even different companies, since each stage requires specialized expertise. To make models available from one stage to the next, practitioners leverage model hubs, which are repositories for storing models. Kaggle and HuggingFace are popular open source options, although internal model hubs could also be used.
This separation into stages creates multiple opportunities where a malicious user (or external threat actor who has compromised the internal infrastructure) could tamper with the model. This could range from just a slight alteration of the model weights that control model behavior, to injecting architectural backdoors — completely new model behaviors and capabilities that could be triggered only on specific inputs. It is also possible to exploit the serialization format and inject arbitrary code execution in the model as saved on disk — our whitepaper on AI supply chain integrity goes into more details on how popular model serialization libraries could be exploited. The following diagram summarizes the risks across the ML supply chain for developing a single model, as discussed in the whitepaper.
The supply chain diagram for building a single model, illustrating some supply chain risks (oval labels) and where model signing can defend against them (check marks)
The diagram shows several places where the model could be compromised. Most of these could be prevented by signing the model during training and verifying integrity before any usage, in every step: the signature would have to be verified when the model gets uploaded to a model hub, when the model gets selected to be deployed into an application (embedded or via remote APIs) and when the model is used as an intermediary during another training run. Assuming the training infrastructure is trustworthy and not compromised, this approach guarantees that each model user can trust the model.
Sigstore for ML models
Signing models is inspired by code signing, a critical step in traditional software development. A signed binary artifact helps users identify its producer and prevents tampering after publication. The average developer, however, would not want to manage keys and rotate them on compromise.
These challenges are addressed by using Sigstore, a collection of tools and services that make code signing secure and easy. By binding an OpenID Connect token to a workload or developer identity, Sigstore alleviates the need to manage or rotate long-lived secrets. Furthermore, signing is made transparent so signatures over malicious artifacts could be audited in a public transparency log, by anyone. This ensures that split-view attacks are not possible, so any user would get the exact same model. These features are why we recommend Sigstore’s signing mechanism as the default approach for signing ML models.
Today the OSS community is releasing the v1.0 stable version of our model signing library as a Python package supporting Sigstore and traditional signing methods. This model signing library is specialized to handle the sheer scale of ML models (which are usually much larger than traditional software components), and handles signing models represented as a directory tree. The package provides CLI utilities so that users can sign and verify model signatures for individual models. The package can also be used as a library which we plan to incorporate directly into model hub upload flows as well as into ML frameworks.
Future goals
We can view model signing as establishing the foundation of trust in the ML ecosystem. We envision extending this approach to also include datasets and other ML-related artifacts. Then, we plan to build on top of signatures, towards fully tamper-proof metadata records, that can be read by both humans and machines. This has the potential to automate a significant fraction of the work needed to perform incident response in case of a compromise in the ML world. In an ideal world, an ML developer would not need to perform any code changes to the training code, while the framework itself would handle model signing and verification in a transparent manner.
If you are interested in the future of this project, join the OpenSSF meetings attached to the project. To shape the future of building tamper-proof ML, join the Coalition for Secure AI, where we are planning to work on building the entire trust ecosystem together with the open source community. In collaboration with multiple industry partners, we are starting up a special interest group under CoSAI for defining the future of ML signing and including tamper-proof ML metadata, such as model cards and evaluation results.
No comments :
Post a Comment