Google Online Security Blog: 2025

How Pixel and Android are bringing a new level of trust to your images with C2PA Content Credentials

September 10, 2025

Posted by Eric Lynch, Senior Product Manager, Android Security, and Sherif Hanna, Group Product Manager, Google C2PA Core At Made by Google 2025, we announced that the new Google Pixel 10 phones will support C2PA Content Credentials in Pixel Camera and Google Photos. This announcement represents a series of steps towards greater digital media transparency:

The Pixel 10 lineup is the first to have Content Credentials built in across every photo created by Pixel Camera.

The Pixel Camera app achieved Assurance Level 2, the highest security rating currently defined by the C2PA Conformance Program. Assurance Level 2 for a mobile app is currently only possible on the Android platform.

A private-by-design approach to C2PA certificate management, where no image or group of images can be related to one another or the person who created them.

Pixel 10 phones support on-device trusted time-stamps, which ensures images captured with your native camera app can be trusted after the certificate expires, even if they were captured when your device was offline.

These capabilities are powered by Google Tensor G5, Titan M2 security chip, the advanced hardware-backed security features of the Android platform, and Pixel engineering expertise. In this post, we’ll break down our architectural blueprint for bringing a new level of trust to digital media, and how developers can apply this model to their own apps on Android. A New Approach to Content Credentials Generative AI can help us all to be more creative, productive, and innovative. But it can be hard to tell the difference between content that’s been AI-generated, and content created without AI. The ability to verify the source and history—or provenance—of digital content is more important than ever. Content Credentials convey a rich set of information about how media such as images, videos, or audio files were made, protected by the same digital signature technology that has secured online transactions and mobile apps for decades. It empowers users to identify AI-generated (or altered) content, helping to foster transparency and trust in generative AI. It can be complemented by watermarking technologies such as SynthID. Content Credentials are an industry standard backed by a broad coalition of leading companies for securely conveying the origin and history of media files. The standard is developed by the Coalition for Content Provenance and Authenticity (C2PA), of which Google is a steering committee member. The traditional approach to classifying digital image content has focused on categorizing content as “AI” vs. “not AI”. This has been the basis for many legislative efforts, which have required the labeling of synthetic media. This traditional approach has drawbacks, as described in Chapter 5 of this seminal report by Google. Research shows that if only synthetic content is labeled as “AI”, then users falsely believe unlabeled content is “not AI”, a phenomenon called “the implied truth effect”. This is why Google is taking a different approach to applying C2PA Content Credentials. Instead of categorizing digital content into a simplistic “AI” vs. “not AI”, Pixel 10 takes the first steps toward implementing our vision of categorizing digital content as either i) media that comes with verifiable proof of how it was made or ii) media that doesn't.

Pixel Camera attaches Content Credentials to any JPEG photo capture, with the appropriate description as defined by the Content Credentials specification for each capture mode.

Google Photos attaches Content Credentials to JPEG images that already have Content Credentials and are edited using AI or non-AI tools, and also to any images that are edited using AI tools. It will validate and display Content Credentials under a new section in the About panel, if the JPEG image being viewed contains this data. Learn more about it in Google Photos Help.

Given the broad range of scenarios in which Content Credentials are attached by these apps, we designed our C2PA implementation architecture from the onset to be:

Secure from silicon to applications

Verifiable, not personally identifiable

Useable offline

Secure from Silicon to Applications Good actors in the C2PA ecosystem are motivated to ensure that provenance data is trustworthy. C2PA Certification Authorities (CAs), such as Google, are incentivized to only issue certificates to genuine instances of apps from trusted developers in order to prevent bad actors from undermining the system. Similarly, app developers want to protect their C2PA claim signing keys from unauthorized use. And of course, users want assurance that the media files they rely on come from where they claim. For these reasons, the C2PA defined the Conformance Program. The Pixel Camera application on the Pixel 10 lineup has achieved Assurance Level 2, the highest security rating currently defined by the C2PA Conformance Program. This was made possible by a strong set of hardware-backed technologies, including Tensor G5 and the certified Titan M2 security chip, along with Android’s hardware-backed security APIs. Only mobile apps running on devices that have the necessary silicon features and Android APIs can be designed to achieve this assurance level. We are working with C2PA to help define future assurance levels that will push protections even deeper into hardware. Achieving Assurance Level 2 requires verifiable, difficult-to-forge evidence. Google has built an end-to-end system on Pixel 10 devices that verifies several key attributes. However, the security of any claim is fundamentally dependent on the integrity of the application and the OS, an integrity that relies on both being kept current with the latest security patches.

Hardware Trust: Android Key Attestation in Pixel 10 is built on support for Device Identifier Composition Engine (DICE) by Tensor, and Remote Key Provisioning (RKP) to establish a trust chain from the moment the device starts up to the OS, stamping out the most common forms of abuse on Android.

Genuine Device and Software: Aided by the hardware trust described above, Android Key Attestation allows Google C2PA Certification Authorities (CAs) to verify that they are communicating with a genuine physical device. It also allows them to verify the device has booted securely into a Play Protect Certified version of Android, and verify how recently the operating system, bootloader, and system software and firmware were patched for security vulnerabilities.

Genuine Application: Hardware-backed Android Key Attestation certificates include the package name and signing certificates associated with the app that requested the generation of the C2PA signing key, allowing Google C2PA CAs to check that the app requesting C2PA claim signing certificates is a trusted, registered app.

Tamper-Resistant Key Storage: On Pixel, C2PA claim signing keys are generated and stored using Android StrongBox in the Titan M2 security chip. Titan M2 is Common Criteria PP.0084 AVA_VAN.5 certified, meaning that it is strongly resistant to extracting or tampering with the cryptographic keys stored in it. Android Key Attestation allows Google C2PA CAs to verify that private keys were indeed created inside this hardware-protected vault before issuing certificates for their public key counterparts.

The C2PA Conformance Program requires verifiable artifacts backed by a hardware Root of Trust, which Android provides through features like Key Attestation. This means Android developers can leverage these same tools to build apps that meet this standard for their users. Privacy Built on a Foundation of Trust: Verifiable, Not Personally Identifiable The robust security stack we described is the foundation of privacy. But Google takes steps further to ensure your privacy even as you use Content Credentials, which required solving two additional challenges: Challenge 1: Server-side Processing of Certificate Requests. Google’s C2PA Certification Authorities must certify new cryptographic keys generated on-device. To prevent fraud, these certificate enrollment requests need to be authenticated. A more common approach would require user accounts for authentication, but this would create a server-side record linking a user's identity to their C2PA certificates—a privacy trade-off we were unwilling to make. Our Solution: Anonymous, Hardware-Backed Attestation. We solve this with Android Key Attestation, which allows Google CAs to verify what is being used (a genuine app on a secure device) without ever knowing who is using it (the user). Our CAs also enforce a strict no-logging policy for information like IP addresses that could tie a certificate back to a user. Challenge 2: The Risk of Traceability Through Key Reuse. A significant privacy risk in any provenance system is traceability. If the same device or app-specific cryptographic key is used to sign multiple photos, those images can be linked by comparing the key. An adversary could potentially connect a photo someone posts publicly under their real name with a photo they post anonymously, deanonymizing the creator.

Our Solution: Unique Certificates. We eliminate this threat with a maximally private approach. Each key and certificate is used to sign exactly one image. No two images ever share the same public key, a "One-and-Done" Certificate Management Strategy, making it cryptographically impossible to link them. This engineering investment in user privacy is designed to set a clear standard for the industry. Overall, you can use Content Credentials on Pixel 10 without fear that another person or Google could use it to link any of your images to you or one another. Ready to Use When You Are - Even Offline Implementations of Content Credentials use trusted time-stamps to ensure the credentials can be validated even after the certificate used to produce them expires. Obtaining these trusted time-stamps typically requires connectivity to a Time-Stamping Authority (TSA) server. But what happens if the device is offline? This is not a far-fetched scenario. Imagine you’ve captured a stunning photo of a remote waterfall. The image has Content Credentials that prove that it was captured by a camera, but the cryptographic certificate used to produce them will eventually expire. Without a time-stamp, that proof could become untrusted, and you're too far from a cell signal, which is required to receive one. To solve this, Pixel developed an on-device, offline TSA. Powered by the security features of Tensor, Pixel maintains a trusted clock in a secure environment, completely isolated from the user-controlled one in Android. The clock is synchronized regularly from a trusted source while the device is online, and is maintained even after the device goes offline (as long as the phone remains powered on). This allows your device to generate its own cryptographically-signed time-stamps the moment you press the shutter—no connection required. It ensures the story behind your photo remains verifiable and trusted after its certificate expires, whether you took it in your living room or at the top of a mountain. Building a More Trustworthy Ecosystem, Together C2PA Content Credentials are not the sole solution for identifying the provenance of digital media. They are, however, a tangible step toward more media transparency and trust as we continue to unlock more human creativity with AI. In our initial implementation of Content Credentials on the Android platform and Pixel 10 lineup, we prioritized a higher standard of privacy, security, and usability. We invite other implementers of Content Credentials to evaluate our approach and leverage these same foundational hardware and software security primitives. The full potential of these technologies can only be realized through widespread ecosystem adoption. We look forward to adding Content Credentials across more Google products in the near future.

Android’s pKVM Becomes First Globally Certified Software to Achieve Prestigious SESIP Level 5 Security Certification

August 12, 2025

Posted by Dave Kleidermacher, VP Engineering, Android Security & Privacy Today marks a watershed moment and new benchmark for open-source security and the future of consumer electronics. Google is proud to announce that protected KVM (pKVM), the hypervisor that powers the Android Virtualization Framework, has officially achieved SESIP Level 5 certification. This makes pKVM the first software security system designed for large-scale deployment in consumer electronics to meet this assurance bar.
Supporting Next-Gen Android Features The implications for the future of secure mobile technology are profound. With this level of security assurance, Android is now positioned to securely support the next generation of high-criticality isolated workloads. This includes vital features, such as on-device AI workloads that can operate on ultra-personalized data, with the highest assurances of privacy and integrity. This certification required a hands-on evaluation by Dekra, a globally recognized cybersecurity certification lab, which conducted an evaluation against the TrustCB SESIP scheme, compliant to EN-17927. Achieving Security Evaluation Standard for IoT Platforms (SESIP) Level 5 is a landmark because it incorporates AVA_VAN.5, the highest level of vulnerability analysis and penetration testing under the ISO 15408 (Common Criteria) standard. A system certified to this level has been evaluated to be resistant to highly skilled, knowledgeable, well-motivated, and well-funded attackers who may have insider knowledge and access. This certification is the cornerstone of the next-generation of Android’s multi-layered security strategy. Many of the TEEs (Trusted Execution Environments) used in the industry have not been formally certified or have only achieved lower levels of security assurance. This inconsistency creates a challenge for developers looking to build highly critical applications that require a robust and verifiable level of security. The certified pKVM changes this paradigm entirely. It provides a single, open-source, and exceptionally high-quality firmware base that all device manufacturers can build upon. Looking ahead, Android device manufacturers will be required to use isolation technology that meets this same level of security for various security operations that the device relies on. Protected KVM ensures that every user can benefit from a consistent, transparent, and verifiably secure foundation.
A Collaborative Effort This achievement represents just one important aspect of the immense, multi-year dedication from the Linux and KVM developer communities and multiple engineering teams at Google developing pKVM and AVF. We look forward to seeing the open-source community and Android ecosystem continue to build on this foundation, delivering a new era of high-assurance mobile technology for users.

Introducing OSS Rebuild: Open Source, Rebuilt to Last

July 21, 2025

Posted by Matthew Suozzo, Google Open Source Security Team (GOSST)

Today we're excited to announce OSS Rebuild, a new project to strengthen trust in open source package ecosystems by reproducing upstream artifacts. As supply chain attacks continue to target widely-used dependencies, OSS Rebuild gives security teams powerful data to avoid compromise without burden on upstream maintainers.

The project comprises:

Automation to derive declarative build definitions for existing PyPI (Python), npm (JS/TS), and Crates.io (Rust) packages.
SLSA Provenance for thousands of packages across our supported ecosystems, meeting SLSA Build Level 3 requirements with no publisher intervention.
Build observability and verification tools that security teams can integrate into their existing vulnerability management workflows.
Infrastructure definitions to allow organizations to easily run their own instances of OSS Rebuild to rebuild, generate, sign, and distribute provenance.

Challenges

Open source software has become the foundation of our digital world. From critical infrastructure to everyday applications, OSS components now account for 77% of modern applications. With an estimated value exceeding $12 trillion, open source software has never been more integral to the global economy.

Yet this very ubiquity makes open source an attractive target: Recent high-profile supply chain attacks have demonstrated sophisticated methods for compromising widely-used packages. Each incident erodes trust in open ecosystems, creating hesitation among both contributors and consumers.

The security community has responded with initiatives like OpenSSF Scorecard, pypi's Trusted Publishers, and npm's native SLSA support. However, there is no panacea: Each effort targets a certain aspect of the problem, often making tradeoffs like shifting work onto publishers and maintainers.

Our Aim

Our aim with OSS Rebuild is to empower the security community to deeply understand and control their supply chains by making package consumption as transparent as using a source repository. Our rebuild platform unlocks this transparency by utilizing a declarative build process, build instrumentation, and network monitoring capabilities which, within the SLSA Build framework, produces fine-grained, durable, trustworthy security metadata.

Building on the hosted infrastructure model that we pioneered with OSS Fuzz for memory issue detection, OSS Rebuild similarly seeks to use hosted resources to address security challenges in open source, this time aimed at securing the software supply chain.

Our vision extends beyond any single ecosystem: We are committed to bringing supply chain transparency and security to all open source software development. Our initial support for the PyPI (Python), npm (JS/TS), and Crates.io (Rust) package registries—providing rebuild provenance for many of their most popular packages—is just the beginning of our journey.

How OSS Rebuild Works

Through automation and heuristics, we determine a prospective build definition for a target package and rebuild it. We semantically compare the result with the existing upstream artifact, normalizing each one to remove instabilities that cause bit-for-bit comparisons to fail (e.g. archive compression). Once we reproduce the package, we publish the build definition and outcome via SLSA Provenance. This attestation allows consumers to reliably verify a package's origin within the source history, understand and repeat its build process, and customize the build from a known-functional baseline (or maybe even use it to generate more detailed SBOMs).

With OSS Rebuild's existing automation for PyPI, npm, and Crates.io, most packages obtain protection effortlessly without user or maintainer intervention. Where automation isn't currently able to fully reproduce the package, we offer manual build specification so the whole community benefits from individual contributions.

And we are also excited at the potential for AI to help reproduce packages: Build and release processes are often described in natural language documentation which, while difficult to utilize with discrete logic, is increasingly useful to language models. Our initial experiments have demonstrated the approach's viability in automating exploration and testing, with limited human intervention, even in the most complex builds.

Our Capabilities

OSS Rebuild helps detect several classes of supply chain compromise:

Unsubmitted Source Code - When published packages contain code not present in the public source repository, OSS Rebuild will not attest to the artifact.

Real world attack: solana/webjs (2024)

Build Environment Compromise - By creating standardized, minimal build environments with comprehensive monitoring, OSS Rebuild can detect suspicious build activity or avoid exposure to compromised components altogether.

Real world attack: tj-actions/changed-files (2025)

Stealthy Backdoors - Even sophisticated backdoors like xz often exhibit anomalous behavioral patterns during builds. OSS Rebuild's dynamic analysis capabilities can detect unusual execution paths or suspicious operations that are otherwise impractical to identify through manual review.

Real world attack: xz-utils (2024)

For enterprises and security professionals, OSS Rebuild can...

Enhance metadata without changing registries by enriching data for upstream packages. No need to maintain custom registries or migrate to a new package ecosystem.
Augment SBOMs by adding detailed build observability information to existing Software Bills of Materials, creating a more complete security picture.
Accelerate vulnerability response by providing a path to vendor, patch, and re-host upstream packages using our verifiable build definitions.

For publishers and maintainers of open source packages, OSS Rebuild can...

Strengthen package trust by providing consumers with independent verification of the packages' build integrity, regardless of the sophistication of the original build.
Retrofit historical packages' integrity with high-quality build attestations, regardless of whether build attestations were present or supported at the time of publication.
Reduce CI security-sensitivity allowing publishers to focus on core development work. CI platforms tend to have complex authorization and execution models and by performing separate rebuilds, the CI environment no longer needs to be load-bearing for your packages' security.

Check it out!

The easiest (but not only!) way to access OSS Rebuild attestations is to use the provided Go-based command-line interface. It can be compiled and installed easily:

$ go install github.com/google/oss-rebuild/cmd/oss-rebuild@latest

You can fetch OSS Rebuild's SLSA Provenance:

$ oss-rebuild get cratesio syn 2.0.39

..or explore the rebuilt versions of a particular package:

$ oss-rebuild list pypi absl-py

..or even rebuild the package for yourself:

$ oss-rebuild get npm lodash 4.17.20 --output=dockerfile | \

docker run $(docker buildx build -q -)

Join Us in Helping Secure Open Source

OSS Rebuild is not just about fixing problems; it's about empowering end-users to make open source ecosystems more secure and transparent through collective action. If you're a developer, enterprise, or security researcher interested in OSS security, we invite you to follow along and get involved!

Check out the code, share your ideas, and voice your feedback at github.com/google/oss-rebuild.
Explore the data and contribute to improving support for your critical ecosystems and packages.
Learn more about SLSA Provenance at slsa.dev

Advancing Protection in Chrome on Android

July 8, 2025

Posted by David Adrian, Javier Castro & Peter Kotwicz, Chrome Security Team Android recently announced Advanced Protection, which extends Google’s Advanced Protection Program to a device-level security setting for Android users that need heightened security—such as journalists, elected officials, and public figures. Advanced Protection gives you the ability to activate Google’s strongest security for mobile devices, providing greater peace of mind that you’re better protected against the most sophisticated threats. Advanced Protection acts as a single control point for at-risk users on Android that enables important security settings across applications, including many of your favorite Google apps, including Chrome. In this post, we’d like to do a deep dive into the Chrome features that are integrated with Advanced Protection, and how enterprises and users outside of Advanced Protection can leverage them. Android Advanced Protection integrates with Chrome on Android in three main ways:

Enables the “Always Use Secure Connections” setting for both public and private sites, so that users are protected from attackers reading confidential data or injecting malicious content into insecure plaintext HTTP connections. Insecure HTTP represents less than 1% of page loads for Chrome on Android.

Enables full Site Isolation on mobile devices with 4GB+ RAM, so that potentially malicious sites are never loaded in the same process as legitimate websites. Desktop Chrome clients already have full Site Isolation.

Reduces attack surface by disabling Javascript optimizations, so that Chrome has a smaller attack surface and is harder to exploit.

Let’s take a look at all three, learn what they do, and how they can be controlled outside of Advanced Protection. Always Use Secure Connections “Always Use Secure Connections” (also known as HTTPS-First Mode in blog posts and HTTPS-Only Mode in the enterprise policy) is a Chrome setting that forces HTTPS wherever possible, and asks for explicit permission from you before connecting to a site insecurely. There may be attackers attempting to interpose on connections on any network, whether that network is a coffee shop, airport, or an Internet backbone. This setting protects users from these attackers reading confidential data and injecting malicious content into otherwise innocuous webpages. This is particularly useful for Advanced Protection users, since in 2023, plaintext HTTP was used as an exploitation vector during the Egyptian election. Beyond Advanced Protection, we previously posted about how our goal is to eventually enable “Always Use Secure Connections” by default for all Chrome users. As we work towards this goal, in the last two years we have quietly been enabling it in more places beyond Advanced Protection, to help protect more users in risky situations, while limiting the number of warnings users might click through:

We added a new variant of the setting that only warns on public sites, and doesn’t warn on local networks or single-label hostnames (e.g. 192.168.0.1, shortlink/, 10.0.0.1). These names often cannot be issued a publicly-trusted HTTPS certificate. This variant protects against most threats—accessing a public website insecurely—but still allows for users to access local sites, which may be on a more trusted network, without seeing a warning.

We’ve automatically enabled “Always Use Secure Connections” for public sites in Incognito Mode for the last year, since Chrome 127 in June 2024.

We automatically prevent downgrades from HTTPS to plaintext HTTP on sites that Chrome knows you typically access over HTTPS (a heuristic version of the HSTS header), since Chrome 133 in January 2025.

Always Use Secure Connections has two modes—warn on insecure public sites, and warn on any insecure site. anyHTTPSOnlyModeHTTPAllowlist

Full Site Isolation

Site Isolation is a security feature in Chrome that isolates each website into its own rendering OS process. This means that different websites, even if loaded in a single tab of the same browser window, are kept completely separate from each other in memory. This isolation prevents a malicious website from accessing data or code from another website, even if that malicious website manages to exploit a vulnerability in Chrome’s renderer—a second bug to escape the renderer sandbox is required to access other sites. Site isolation improves security, but requires extra memory to have one process per site. Chrome Desktop isolates all sites by default. However, Android is particularly sensitive to memory usage, so for mobile Android form factors, when Advanced Protection is off, Chrome will only isolate a site if a user logs into that site, or if the user submits a form on that site. On Android devices with 4GB+ RAM in Advanced Protection (and on all desktop clients), Chrome will isolate all sites. Full Site Isolation significantly reduces the risk of cross-site data leakage for Advanced Protection users.

JavaScript Optimizations and Security

Advanced Protection reduces the attack surface of Chrome by disabling the higher-level optimizing Javascript compilers inside V8. V8 is Chrome’s high-performance Javascript and WebAssembly engine. The optimizing compilers in V8 make certain websites run faster, however they historically also have been a source of known exploitation of Chrome. Of all the patched security bugs in V8 with known exploitation, disabling the optimizers would have mitigated ~50%. However, the optimizers are why Chrome scores the highest on industry-wide benchmarks such as Speedometer. Disabling the optimizers blocks a large class of exploits, at the cost of causing performance issues for some websites.

Javascript optimizers can be disabled outside of Advanced Protection Mode via the “Javascript optimization & security” Site Setting. The Site Setting also enables users to disable/enable Javascript optimizers on a per-site basis. Disabling these optimizing compilers is not limited to Advanced Protection. Since Chrome 133, we’ve exposed this as a Site Setting that allows users to enable or disable the higher-level optimizing compilers on a per-site basis, as well as change the default.

Settings -> Privacy and Security -> Javascript optimization and security

This setting can be controlled by the DefaultJavaScriptOptimizerSetting enterprise policy, alongside JavaScriptOptimizerAllowedForSites and JavaScriptOptimizerBlockedForSites for managing the allowlist and denylist. Enterprises can use this policy to block access to the optimizer, while still allowlisting¹ the SaaS vendors their employees use on a daily basis. It’s available on Android and desktop platforms

Chrome aims for the default configuration to be secure for all its users, and we’re continuing to raise the bar for V8 security in the default configuration by rolling out the V8 sandbox.

Protecting All Users

Billions of people use Chrome and Android, and not all of them have the same risk profile. Less sophisticated attacks by commodity malware can be very lucrative for attackers when done at scale, but so can sophisticated attacks on targeted users. This means that we cannot expect the security tradeoffs we make for the default configuration of Chrome to be suitable for everyone.

Advanced Protection, and the security settings associated with it, are a way for users with varying risk profiles to tailor Chrome to their security needs, either as an individual at-risk user. Enterprises with a fleet of managed Chrome installations can also enable the underlying settings now. Advanced Protection is available on Android 16 in Chrome 137+.

We additionally recommend at-risk users join the Advanced Protection Program with their Google accounts, which will require the account to use phishing-resistant multi-factor authentication methods and enable Advanced Protection on any of the user’s Android devices. We also recommend users enable automatic updates and always keep their Android phones and web browsers up to date.

Notes

Allowlisting only works on platforms capable of full site isolation—any desktop platform and Android devices with 2GB+ RAM. This is because internally allowlisting is dependent on origin isolation. ↩

Mitigating prompt injection attacks with a layered defense strategy

June 13, 2025

Posted by Google GenAI Security Team

With the rapid adoption of generative AI, a new wave of threats is emerging across the industry with the aim of manipulating the AI systems themselves. One such emerging attack vector is indirect prompt injections. Unlike direct prompt injections, where an attacker directly inputs malicious commands into a prompt, indirect prompt injections involve hidden malicious instructions within external data sources. These may include emails, documents, or calendar invites that instruct AI to exfiltrate user data or execute other rogue actions. As more governments, businesses, and individuals adopt generative AI to get more done, this subtle yet potentially potent attack becomes increasingly pertinent across the industry, demanding immediate attention and robust security measures.

At Google, our teams have a longstanding precedent of investing in a defense-in-depth strategy, including robust evaluation, threat analysis, AI security best practices, AI red-teaming, adversarial training, and model hardening for generative AI tools. This approach enables safer adoption of Gemini in Google Workspace and the Gemini app (we refer to both in this blog as “Gemini” for simplicity). Below we describe our prompt injection mitigation product strategy based on extensive research, development, and deployment of improved security mitigations.

A layered security approach

Google has taken a layered security approach introducing security measures designed for each stage of the prompt lifecycle. From Gemini 2.5 model hardening, to purpose-built machine learning (ML) models detecting malicious instructions, to system-level safeguards, we are meaningfully elevating the difficulty, expense, and complexity faced by an attacker. This approach compels adversaries to resort to methods that are either more easily identified or demand greater resources.

Our model training with adversarial data significantly enhanced our defenses against indirect prompt injection attacks in Gemini 2.5 models (technical details). This inherent model resilience is augmented with additional defenses that we built directly into Gemini, including:

Prompt injection content classifiers
Security thought reinforcement
Markdown sanitization and suspicious URL redaction
User confirmation framework
End-user security mitigation notifications

This layered approach to our security strategy strengthens the overall security framework for Gemini – throughout the prompt lifecycle and across diverse attack techniques.

1. Prompt injection content classifiers

Through collaboration with leading AI security researchers via Google's AI Vulnerability Reward Program (VRP), we've curated one of the world’s most advanced catalogs of generative AI vulnerabilities and adversarial data. Utilizing this resource, we built and are in the process of rolling out proprietary machine learning models that can detect malicious prompts and instructions within various formats, such as emails and files, drawing from real-world examples. Consequently, when users query Workspace data with Gemini, the content classifiers filter out harmful data containing malicious instructions, helping to ensure a secure end-to-end user experience by retaining only safe content. For example, if a user receives an email in Gmail that includes malicious instructions, our content classifiers help to detect and disregard malicious instructions, then generate a safe response for the user. This is in addition to built-in defenses in Gmail that automatically block more than 99.9% of spam, phishing attempts, and malware.

A diagram of Gemini’s actions based on the detection of the malicious instructions by content classifiers.

2. Security thought reinforcement

This technique adds targeted security instructions surrounding the prompt content to remind the large language model (LLM) to perform the user-directed task and ignore any adversarial instructions that could be present in the content. With this approach, we steer the LLM to stay focused on the task and ignore harmful or malicious requests added by a threat actor to execute indirect prompt injection attacks.

A diagram of Gemini’s actions based on additional protection provided by the security thought reinforcement technique.

3. Markdown sanitization and suspicious URL redaction

Our markdown sanitizer identifies external image URLs and will not render them, making the “EchoLeak” 0-click image rendering exfiltration vulnerability not applicable to Gemini. From there, a key protection against prompt injection and data exfiltration attacks occurs at the URL level. With external data containing dynamic URLs, users may encounter unknown risks as these URLs may be designed for indirect prompt injections and data exfiltration attacks. Malicious instructions executed on a user's behalf may also generate harmful URLs. With Gemini, our defense system includes suspicious URL detection based on Google Safe Browsing to differentiate between safe and unsafe links, providing a secure experience by helping to prevent URL-based attacks. For example, if a document contains malicious URLs and a user is summarizing the content with Gemini, the suspicious URLs will be redacted in Gemini’s response.

Gemini in Gmail provides a summary of an email thread. In the summary, there is an unsafe URL. That URL is redacted in the response and is replaced with the text “suspicious link removed”.

4. User confirmation framework

Gemini also features a contextual user confirmation system. This framework enables Gemini to require user confirmation for certain actions, also known as “Human-In-The-Loop” (HITL), using these responses to bolster security and streamline the user experience. For example, potentially risky operations like deleting a calendar event may trigger an explicit user confirmation request, thereby helping to prevent undetected or immediate execution of the operation.

The Gemini app with instructions to delete all events on Saturday. Gemini responds with the events found on Google Calendar and asks the user to confirm this action.

5. End-user security mitigation notifications

A key aspect to keeping our users safe is sharing details on attacks that we’ve stopped so users can watch out for similar attacks in the future. To that end, when security issues are mitigated with our built-in defenses, end users are provided with contextual information allowing them to learn more via dedicated help center articles. For example, if Gemini summarizes a file containing malicious instructions and one of Google’s prompt injection defenses mitigates the situation, a security notification with a “Learn more” link will be displayed for the user. Users are encouraged to become more familiar with our prompt injection defenses by reading the Help Center article.

Gemini in Docs with instructions to provide a summary of a file. Suspicious content was detected and a response was not provided. There is a yellow security notification banner for the user and a statement that Gemini’s response has been removed, with a “Learn more” link to a relevant Help Center article.

Moving forward

Our comprehensive prompt injection security strategy strengthens the overall security framework for Gemini. Beyond the techniques described above, it also involves rigorous testing through manual and automated red teams, generative AI security BugSWAT events, strong security standards like our Secure AI Framework (SAIF), and partnerships with both external researchers via the Google AI Vulnerability Reward Program (VRP) and industry peers via the Coalition for Secure AI (CoSAI). Our commitment to trust includes collaboration with the security community to responsibly disclose AI security vulnerabilities, share our latest threat intelligence on ways we see bad actors trying to leverage AI, and offering insights into our work to build stronger prompt injection defenses.

Working closely with industry partners is crucial to building stronger protections for all of our users. To that end, we’re fortunate to have strong collaborative partnerships with numerous researchers, such as Ben Nassi (Confidentiality), Stav Cohen (Technion), and Or Yair (SafeBreach), as well as other AI Security researchers participating in our BugSWAT events and AI VRP program. We appreciate the work of these researchers and others in the community to help us red team and refine our defenses.

We continue working to make upcoming Gemini models inherently more resilient and add additional prompt injection defenses directly into Gemini later this year. To learn more about Google’s progress and research on generative AI threat actors, attack techniques, and vulnerabilities, take a look at the following resources: