AI threats in the wild: The current state of prompt injections on the web

23 aprile 2026

Posted by Thomas Brunner, Yu-Han Liu, Moni Pande

At Google, our Threat Intelligence teams are dedicated to staying ahead of real-world adversarial activity, proactively monitoring emerging threats before they can impact users. Right now, Indirect Prompt Injection (IPI) is a top priority for the security community, anticipating it as a primary attack vector for adversaries to target and compromise AI agents. But while the danger of IPI is widely discussed, are threat actors actually exploiting this vector today – and if so, how?

To answer these questions and to uncover real-world abuse, we initiated a broad sweep of the public web to monitor for known indirect prompt injection patterns. This is what we found.

The threat of indirect prompt injection

Unlike a direct injection where a user "jailbreaks" a chatbot, IPI occurs when an AI system processes content—like a website, email, or document—that contains malicious instructions. When the AI reads this poisoned content, it may silently follow the attacker's commands instead of the user's original intent.

This is not a new area of concern for us and Google has been working tirelessly to combat these threats. Our efforts involve cross-functional collaboration between researchers at Google DeepMind (GDM) and defenders like the Google Threat Intelligence Group (GTIG). We have previously detailed our work in this area and researchers have further highlighted the evolving nature of these vulnerabilities.

Despite this collective focus, a fundamental question remains: to what degree are real-world malicious actors currently operationalizing these attacks?

Proactive monitoring at Google

The landscape of IPI on the web

There are many channels through which attackers might try to send prompt injections. However, one location is particularly easy to observe - the public web. Here, threat actors may simply seed prompt injections on websites in hope of corrupting AI systems that browse them.

Public research confirms these attacks are possible; consequently, we should expect real-world adversaries to exploit these vulnerabilities to cause harm.

Thus, we ask a basic question: What outcomes are real attackers trying to achieve today?

For ease of access and reproducibility, we chose to use Common Crawl, which is a large repository of crawled websites from the English-speaking web. Common Crawl provides monthly snapshots of 2-3 billion pages each. These are mostly static websites, which includes self-published content such as blogs, forums and comments on these sites, but as a caveat it does not contain most social media content (e.g., LinkedIn, Facebook, X, …) as Common Crawl skips websites with login walls and anti-crawl directives.

This means that, while prompt injections have been observed on social media, we reserve these for an upcoming separate study. For a first look, we can observe prompt injections even in standard HTML, for which Common Crawl conveniently provides not just the source, but also the parsed plaintext.

The challenge of false positives

The task of scanning large amounts of documents for prompt injections may sound simple, but in reality is hindered by an overwhelming number of false positive detections.

Early experiments revealed a significant volume of "benign" prompt injection text, which illustrates the complexity of distinguishing between functional threats and harmless content. Many prompt injections were found in research papers, educational blog posts, or security articles discussing this very topic.

False positives: Most prompt injections in web content tend to be education material for researchers. (Source: GitHub/swisskyrepo)

When searching for prompt injections naively, the majority of detections are benign content – false positives in our case. Therefore, we opted for a coarse-to-fine filtering approach:

Pattern Matching: We initially identified candidate pages by searching for a range of popular prompt injection signatures, like “ignore … instructions”, “if you are an AI”, etc.
LLM-Based Classification: These candidates were then processed by Gemini to classify the intent of the suspicious text, and to understand whether they were part of the overall document narrative or suspiciously out of place.
Human Validation: A final round of manual review was conducted on the classified results to ensure high confidence in our findings.

While this approach is not exhaustive and might miss uncommon signatures, it can serve as a starting point for understanding the quality of prompt injections in the wild.

What we found

Our analysis revealed a range of attempts that, if successful, would try to manipulate AI systems browsing the website. Most of the prompt injections we observed fall into these categories:

Harmless pranks
Helpful guidance
Search engine optimization (SEO)
Deterring AI agents
Malicious

Data exfiltration
Destruction

Harmless Prank

This class of prompt injection aims to cause mostly harmless side effects in AI assistants reading the website. We found many instances of this – consider the source code of this website, which contains an invisible prompt injection that instructs agents reading the website to change their conversational tone:

Helpful Guidance

We also observed website authors who wanted to exert control over AI summaries in order to provide the best service to their readers. We consider this a benign example, since the prompt injection does not attempt to prevent AI summary, but instead instructs it to add relevant context.

We note that this example could easily turn malicious if the instruction tried to add misinformation or attempted to redirect the user to third party websites.

Search Engine Optimization (SEO)

Some websites include prompt injections for the purpose of SEO, trying to manipulate AI assistants into promoting their business over others:

While the above example is simple, we have also started to see more sophisticated SEO prompt injection attempts. Consider the intricate prompt below, which was seemingly generated by an automated SEO suite and inserted into website text:

Deterring AI agents

Some websites try to prevent retrieval by AI agents via prompt injection. There exist many examples of “If you are an AI, then do not crawl this website”. However, we also observed more insidious implementations:

This injection tries to lure AI readers onto a separate page which, when opened, streams an infinite amount of text that never finishes loading. In this way, the author might hope to waste resources or cause timeout errors during the processing of their website.

Malicious: Exfiltration

We were able to observe a small number of prompt injections that aim at theft of data. However, for this class of attacks, sophistication seemed much lower. Consider this example:

As we can see, this is a website author performing an experiment. We did not observe significant amounts of advanced attacks (e.g. using known exfiltration prompts published by security researchers in 2025). This seems to indicate that attackers have yet not productionized this research at scale.

Malicious: Destruction

Finally, we observed a number of websites that attempt to vandalize the machine of anyone using AI assistants. If executed, the commands in this example would try to delete all files on the user’s machine:

While potentially devastating, we consider this simple injection unlikely to succeed, which makes it similar to those in the other categories: We mostly found individual website authors who seemed to be running experiments or pranks, without replicating advanced IPI strategies found in recently published research.

What does this mean?

Our results indicate that attackers are experimenting with IPI on the web. While the observed activity suggests limited sophistication, this might be only part of the bigger picture.

For one, we scanned only an archive of the public web (CommonCrawl), which does not capture major social media sites. Additionally, even though sophistication was low, we observed an uptick in detections over time: We saw a relative increase of 32% in the malicious category between November 2025 and February 2026, repeating the scan on multiple versions of the archive. This upward trend indicates growing interest in IPI attacks.

In general, threat actors tend to engage based on cost/benefit considerations. In the past, IPI attacks were considered exotic and difficult. And even when compromised, AI systems often were not able to execute malicious actions reliably.

We believe that this could change soon. Today’s AI systems are much more capable, increasing their value as targets, while threat actors have simultaneously begun automating their operations with agentic AI, bringing down the cost of attack. As a result, we expect both the scale and sophistication of attempted IPI attacks to grow in the near future.

Moving forward

Our findings indicate that, while past attempts at IPI attacks on the web have been low in sophistication, their upward trend suggests that the threat is maturing and will soon grow in both scale and complexity.

At Google, we are prepared to face this emergent threat, as we continue to invest in hardening our AI models and products. Our dedicated red teams have been relentlessly pressure-testing our systems to ensure Gemini is robust to adversarial manipulation, and our AI Vulnerability Reward Program allows external researchers to participate.

Finally, Google’s established ability to process global-scale data in real-time allows us to identify and neutralize threats before they can impact users. We remain committed to keeping the Internet safe and will continue to share intelligence with the community.

To learn more about Google’s progress and research on generative AI threat actors, attack techniques, and vulnerabilities, take a look at the following resources:

Google Workspace’s continuous approach to mitigating indirect prompt injections (blog post) from Google’s GenAI security team
Mitigating prompt injection attacks with a layered defense strategy (blog post) from Google’s GenAI security team
Beyond Speculation: Data-Driven Insights into AI and Cybersecurity (RSAC 2025 conference keynote) from Google’s Threat Intelligence Group (GTIG)
AI Threat Tracker (report) from Google’s Threat Intelligence Group (GTIG)
Google's Approach for Secure AI Agents (white paper) from Google’s Secure AI Framework (SAIF) team
Advancing Gemini's security safeguards (blog post) from Google’s DeepMind team
Lessons from Defending Gemini Against Indirect Prompt Injections (white paper) from Google’s DeepMind team

Bringing Rust to the Pixel Baseband

10 aprile 2026

Posted by Jiacheng Lu, Software Engineer, Google Pixel TeamGoogle is continuously advancing the security of Pixel devices. We have been focusing on hardening the cellular baseband modem against exploitation. Recognizing the risks associated within the complex modem firmware, Pixel 9 shipped with mitigations against a range of memory-safety vulnerabilities. For Pixel 10, Google is advancing its proactive security measures further. Following our previous discussion on "Deploying Rust in Existing Firmware Codebases", this post shares a concrete application: integrating a memory-safe Rust DNS(Domain Name System) parser into the modem firmware. The new Rust-based DNS parser significantly reduces our security risk by mitigating an entire class of vulnerabilities in a risky area, while also laying the foundation for broader adoption of memory-safe code in other areas.Here we share our experience of working on it, and hope it can inspire the use of more memory safe languages in low-level environments.Why Modem Memory Safety Can’t WaitIn recent years, we have seen increasing interest in the cellular modem from attackers and security researchers. For example, Google's Project Zero gained remote code execution on Pixel modems over the Internet. Pixel modem has tens of Megabytes of executable code. Given the complexity and remote attack surface of the modem, other critical memory safety vulnerabilities may remain in the predominantly memory-unsafe firmware code.Why DNS?The DNS protocol is most commonly known in the context of browsers finding websites. With the evolution of cellular technology, modern cellular communications have migrated to digital data networks; consequently, even basic operations such as call forwarding rely on DNS services.DNS is a complex protocol and requires parsing of untrusted data, which can lead to vulnerabilities, particularly when implemented in a memory-unsafe language (example: CVE-2024-27227). Implementing the DNS parser in Rust offers value by decreasing the attack surfaces associated with memory unsafety.Picking a DNS libraryDNS already has a level of support in the open-source Rust community. We evaluated multiple open source crates that implement DNS. Based on criteria shared in earlier posts, we identified hickory-proto as the best candidate. It has excellent maintenance, over 75% test coverage, and widespread adoption in the Rust community. Its pervasiveness shows its potential as the de-facto DNS choice and long term support. Although hickory-proto initially lacked no_std support, which is needed for Bare-metal environments (see our previous post on this topic), we were able to add support to it and its dependencies.Adding no_std supportThe work to enable no_std for hickory-proto is mostly mechanical. We shared the process in a previous post. We undertook modifications to hickory_proto and its dependencies to enable no_std support. The upstream no_std work also results in a no_std URL parser, beneficial to other projects.

https://github.com/hickory-dns/hickory-dns/pull/2104

https://github.com/servo/rust-url/pull/831

https://github.com/krisprice/ipnet/pull/58

The above PRs are great examples of how to extend no_std support to existing std-only crates.Code size studyCode size is the one of the factors that we evaluated when picking the DNS library to use.

Code size
by category

Rust implemented Shim that calls Hickory-proto on receiving a DNS response

4KB

core, alloc, compiler_builtins
(reusable, one-time cost)

17KB

Hickory-proto library and dependencies

350KB

Sum

371KB

We built prototypes and measured size with size-optimized settings. Expectedly, hickory_proto is not designed with embedded use in mind, and is not optimized for size. As the Pixel modem is not tightly memory constrained, we prioritized community support and code quality, leaving code size optimizations as future work.However, the additional code size may be a blocker for other embedded systems. This could be addressed in the future by adding additional feature flags to conditionally compile only required functionality. Implementing this modularity would be a valuable future work.Hook-up Rust to modem firmwareBefore building the Rust DNS library, we defined several Rust unit tests to cover basic arithmetic, dynamic allocations, and FFI to verify the integration of Rust with the existing modem firmware code base.Compile Rust code to staticlibWhile using cargo is the default choice for compilation in the Rust ecosystem, it presents challenges when integrating it into existing build systems. We evaluated two options:

Using cargo to build a staticlib before the modem builds. Then add the produced staticlib into the linking step.

Directly work with rustc and integrate the Rust compilation steps into the existing modem build system.

Option #1 does not scale if we are going to add more Rust components in the future, as linking multiple staticlibs may cause duplicated symbol errors. We chose option #2 as it scales more easily and allows tighter integration into our existing build system. Our existing C/C++ codebase uses Pigweed to drive the primary build system. Pigweed supports Rust targets (example) with direct calls to rustc through rust tools defined in GN.We compiled all the Rust crates, including hickory-proto, its dependencies, and core, compiler_builtin, alloc, to rlib. Then, we created a staticlib target with a single lib.rs file which references all the rlib crates using extern crate keywords.Build core, alloc, and compiler_builtinsAndroid’s Rust Toolchain distributes source code of core, alloc, and compiler_builtins, and we leveraged this for the modem. They can be included to the build graph by adding a GN target with crate_root pointing to the root lib.rs of each crate.Pixel modem firmware already has a well-tested and specialized global memory allocation system to support some dynamic memory allocations. alloc support was added by implementing the GlobalAlloc with FFI calls to the allocators C APIs:use core::alloc::{GlobalAlloc, Layout}; extern "C" { fn mem_malloc(size: usize, alignment: usize) -> *mut u8; fn mem_free(ptr: *mut u8, alignment: usize); } struct MemAllocator; unsafe impl GlobalAlloc for MemAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { mem_malloc(layout.size(), layout.align()) } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { mem_free(ptr, layout.align()); } } #[global_allocator] static ALLOCATOR: MemAllocator = MemAllocator;Pixel modem firmware already implements a backend for the Pigweed crash facade as the global crash handler. Exposing it into Rust panic_handler through FFI unifies the crash handling for both Rust and C/C++ code.#![no_std] use core::panic::PanicInfo; extern "C" { pub fn PwCrashBackend(sigature: *const i8, file_name: *const i8, line: u32); } #[panic_handler] fn panic(panic_info: &PanicInfo) -> ! { let mut filename = ""; let mut line_number: u32 = 0; if let Some(location) = panic_info.location() { filename = location.file(); line_number = location.line(); } let mut cstr_buffer = [0u8; 128]; // Never writes to the last byte to make sure `cstr_buffer` is always zero // terminated. let (_, writer) = cstr_buffer.split_last_mut().unwrap(); for (place, ch) in writer.iter_mut().zip(filename.bytes()) { *place = ch; } unsafe { PwCrashBackend( "Rust panic\0".as_ptr() as *const i8, cstr_buffer.as_ptr() as *const i8, line_number, ); } loop {} }Link Rust staticlibThe Pixel modem firmware linking has a step that calls the linker to link all the objects generated from C/C++ code. By using llvm-ar -x to extract object files from the Rust combined staticlib and supplying them to the linker, the Rust code appears in the final modem image.There was a performance issue we experienced due to weak symbols during linking. The inclusion of Rust core and compiler-builtin caused unexpected power and performance regressions on various tests. Upon analysis, we realized that modem optimized implementations of memset and memcpy provided by the modem firmware are accidentally replaced by those defined in compiler_builtin. It seems to happen because both compiler_builtin crate and the existing codebase defines symbols as weak, linker has no way to figure out which one is weaker. We fixed the regression by stripping the compiler_builtin crate before linking using a one line shell script.llvm-ar -t <rust staticlib> | grep compiler_builtins | xargs llvm-ar -d <rust staticlib>Integrating hickory-protoExpose Rust API and calling back to C++For the DNS parser, we declared the DNS response parsing API in C and then implemented the same API in Rust.int32_t process_dns_response(uint8_t*, int32_t);The Rust function returns an integer standing for the error code. The received DNS answers in the DNS response are required to be updated to in-memory data structures that are coupled with the original C implementation, therefore, we use existing C functions to do it. The existing C functions are dispatched from the Rust implementation.pub unsafe extern "C" fn process_dns_response( dns_response: *const u8, response_len: i32, ) -> i32 { //... validate inputs `dns_response` and `response_len`. // SAFETY: // It is safe because `dns_response` is null checked above. `response_len` // is passed in, safe as long as it is set correctly by vendor code. match process_response(unsafe { slice::from_raw_parts(dns_response, response_len) }) { Ok(()) => 0, Err(err) => err.into(), } } fn process_response(response: &[u8]) -> Result<()> { let response = hickory_proto::op::Message::from_bytes(response)?; let response = hickory_proto::xfer::DnsResponse::from_message(response)?; for answer in response.answers() { match answer.record_type() { hickory_proto::RecordType:... => { // SAFETY: // It is safe because the callback function does not store // reference of the inputs or their members. unsafe { callback_to_c_function(...)?; } } // ... more match arms omitted. } } Ok(()) }In our case, the DNS responding parsing function API is simple enough for us to hand write, while the callbacks back to C functions for handling the response have complex data type conversions. Therefore, we leveraged bindgen to generate FFI code for the callbacks.Build third-party cratesEven with all features disabled, hickory-proto introduces more than 30 dependent crates. Manually written build rules are difficult to ensure correctness and scale poorly when upgrading dependencies into new versions.Fuchsia has developed cargo-gnaw to support building their third party Rust crates. Cargo-gnaw works by invoking cargo metadata to resolve dependencies, then parse and generate GN build rules. This ensures correctness and ease of maintenance.ConclusionThe Pixel 10 series of phones marks a pivotal moment, being the first Pixel device to integrate a memory-safe language into its modem.While replacing one piece of risky attack surface is itself valuable, this project lays the foundation for future integration of memory-safe parsers and code into the cellular baseband, ensuring the baseband’s security posture will continue to improve as development continues.Special thanks to Armando Montanez, Bjorn Mellem, Boky Chen, Cheng-Yu Tsai, Dominik Maier, Erik Gilling, Ever Rosales, Hungyen Weng, Ivan Lozano, James Farrell, Jeffrey Vander Stoep, Jiacheng Lu, Jingjing Bu, Min Xu, Murphy Stein, Ray Weng, Shawn Yang, Sherk Chung, Stephan Chen, Stephen Hines.

Protecting Cookies with Device Bound Session Credentials

9 aprile 2026

Posted by Ben Ackerman, Chrome team, Daniel Rubery, Chrome team and Guillaume Ehinger, Google Account Security team Following our April 2024 announcement, Device Bound Session Credentials (DBSC) is now entering public availability for Windows users on Chrome 146, and expanding to macOS in an upcoming Chrome release. This project represents a significant step forward in our ongoing efforts to combat session theft, which remains a prevalent threat in the modern security landscape. Session theft typically occurs when a user inadvertently downloads malware onto their device. Once active, the malware can silently extract existing session cookies from the browser or wait for the user to log in to new accounts, before exfiltrating these tokens to an attacker-controlled server. Infostealer malware families, such as LummaC2, have become increasingly sophisticated at harvesting these credentials. Because cookies often have extended lifetimes, attackers can use them to gain unauthorized access to a user’s accounts without ever needing their passwords; this access is then often bundled, traded, or sold among threat actors. Crucially, once sophisticated malware has gained access to a machine, it can read the local files and memory where browsers store authentication cookies. As a result, there is no reliable way to prevent cookie exfiltration using software alone on any operating system. Historically, mitigating session theft relied on detecting the stolen credentials after the fact using a complex set of abuse heuristics – a reactive approach that persistent attackers could often circumvent. DBSC fundamentally changes the web's capability to defend against this threat by shifting the paradigm from reactive detection to proactive prevention, ensuring that successfully exfiltrated cookies cannot be used to access users’ accounts. How DBSC Works DBSC protects against session theft by cryptographically binding authentication sessions to a specific device. It does this using hardware-backed security modules, such as the Trusted Platform Module (TPM) on Windows and the Secure Enclave on macOS, to generate a unique public/private key pair that cannot be exported from the machine. The issuance of new short-lived session cookies is contingent upon Chrome proving possession of the corresponding private key to the server. Because attackers cannot steal this key, any exfiltrated cookies quickly expire and become useless to those attackers. This design allows large and small websites to upgrade to secure, hardware-bound sessions by adding dedicated registration and refresh endpoints to their backends, while maintaining complete compatibility with their existing front-end. The browser handles the complex cryptography and cookie rotation in the background, allowing the web app to continue using standard cookies for access just as it always has. Google rolled out an early version of this protocol over the last year. For sessions protected by DBSC, we have observed a significant reduction in session theft since its launch.

An overview of the DBSC protocol showing the interaction between the browser and server. Private by design A core tenet of the DBSC architecture is the preservation of user privacy. Each session is backed by a distinct key, preventing websites from using these credentials to correlate a user's activity across different sessions or sites on the same device. Furthermore, the protocol is designed to be lean: it does not leak device identifiers or attestation data to the server beyond the per-session public key required to certify proof of possession. This minimal information exchange ensures DBSC helps secure sessions without enabling cross-site tracking or acting as a device fingerprinting mechanism. Engagement with the ecosystem DBSC was designed from the beginning to be an open web standard through the W3C process and adoption by the Web Application Security Working Group. Through this process we partnered with Microsoft to design the standard to ensure it works for the web and got input from many in the industry that are responsible for web security. Additionally, over the past year, we have also conducted two Origin Trials to ensure DBSC effectively serves the requirements of the broader web community. Many web platforms, including Okta, actively participated in these trials and their own testing and provided essential feedback to ensure the protocol effectively addresses their diverse needs. If you are a web developer and are looking for a way to secure your users against session theft, refer to our developer guide for implementation details. Additionally, all the details about DBSC can be found on the spec and the corresponding github. Feel free to use the issues page to report bugs or provide feature requests. Future improvements As we continue to evolve the DBSC standard, future iterations will focus on increasing support across diverse ecosystems and introducing advanced capabilities tailored for complex enterprise environments. Key areas of ongoing development include:

Securing Federated Identity: In modern enterprise environments, Single Sign-On (SSO) is ubiquitous. We are expanding the DBSC protocol to support cross-origin bindings, ensuring that a relying party (RP) session remains continuously bound to the same original device key used by the Identity Provider (IdP). This guarantees that the high-assurance security of the initial device binding is maintained throughout the entire federated login process, creating an unbroken chain of trust.

Advanced Registration Capabilities: While DBSC provides robust protection for established cookies, some environments require an even stronger foundation when the session is first created. We are developing mechanisms to bind DBSC sessions to pre-existing, trusted key material rather than generating a new key at sign-in. This advanced capability enables websites to integrate complementary technologies, such as mTLS certificates or hardware security keys, creating a highly secure registration environment.

Broader Device Support: We are also actively exploring the potential addition of software-based keys to extend protections to devices without dedicated secure hardware.

Google Workspace’s continuous approach to mitigating indirect prompt injections

2 aprile 2026

Posted by Adam Gavish, Google GenAI Security Team

Indirect prompt injection (IPI) is an evolving threat vector targeting users of complex AI applications with multiple data sources, such as Workspace with Gemini. This technique enables the attacker to influence the behavior of an LLM by injecting malicious instructions into the data or tools used by the LLM as it completes the user’s query. This may even be possible without any input directly from the user.

IPI is not the kind of technical problem you “solve” and move on. Sophisticated LLMs with increasing use of agentic automation combined with a wide range of content create an ultra-dynamic and evolving playground for adversarial attacks. That’s why Google takes a sophisticated and comprehensive approach to these attacks. We’re continuously improving LLM resistance to IPI attacks and launching AI application capabilities with ever-improving defenses. Staying ahead of the latest indirect prompt injection attacks is critical to our mission of securing Workspace with Gemini.

In our previous blog “Mitigating prompt injection attacks with a layered defense strategy”, we reviewed the layered architecture of our IPI defenses. In this blog, we’ll share more detail on the continuous approach we take to improve these defenses and to solve for new attacks.

New attack discovery

By proactively discovering and cataloging new attack vectors through internal and external programs, we can identify vulnerabilities and deploy robust defenses ahead of adversarial activity.

Human Red-Teaming

Human Red-Teaming uses adversarial simulations to uncover security and safety vulnerabilities. Specialized teams execute attacks based on realistic user profiles to exploit weaknesses, coordinating with product teams to resolve identified issues.

Automated Red-Teaming

Automated Red-Teaming is done via dynamic, machine-learning-driven frameworks to stress-test environments. By algorithmically generating and iterating on attack payloads, we can mimic the behavior of sophisticated threats at scale. This allows us to map complex attack paths and validate the effectiveness of our security controls across a much wider range of edge cases than manual testing could achieve on its own.

Google AI Vulnerability Rewards Program (VRP)

The Google AI Vulnerability Rewards Program (VRP) is a critical tool for enabling collaboration between Google and external security researchers who discover new attacks leveraging IPI. Through this VRP, we recognize and reward contributors for their research. We also host regular, live hacking events where we provide invited researchers access to pre-release features, proactively uncovering novel vulnerabilities. These partnerships enable Google to quickly validate, reproduce, and resolve externally-discovered issues.

Publicly disclosed AI attacks

Google utilizes open-source intelligence feeds to stay on top of the latest publicly disclosed IPI attacks, across social media, press releases, blogs, and more. From there, new AI vulnerabilities are sourced, reproduced, and catalogued internally to ensure our products are not impacted.

Vulnerability catalog

All newly discovered vulnerabilities go through a comprehensive analysis process performed by the Google Trust, Security, & Safety teams. Each new vulnerability is reproduced, checked for duplications, mapped into attack technique / impact category, and assigned to relevant owners. The combination of new attack discovery sources and vulnerability catalog process helps Google stay on top of the latest attacks in an actionable manner.

Synthetic data generation

After we discover, curate, and catalog new attacks, we use Simula to generate synthetic data expanding these new attacks. This process is essential because it allows the team to develop attack variants for completeness and coverage, and to prepare new training and validation data sets. This accelerated workflow has boosted synthetic data generation by 75%, supporting large-scale defense model evaluation and retraining, as well as updating the data set used for calculating and reporting on defense effectiveness.

Ongoing defense refinement

Continually updating and enhancing our defense mechanisms allows us to address a broader range of attack techniques, effectively reducing the overall attack surface. Updating each defense type requires different tasks, from config updates, to prompt engineering and ML model retraining.

Deterministic Defenses

Deterministic defenses, including user confirmation, URL sanitization, and tool chaining policies, are designed for rapid response against new or emerging prompt injection attacks by relying on simple configuration updates. These defenses are governed by a centralized Policy Engine, with configurations for policies like baseline tool calls, URL sanitization, and tool chaining. For immediate threats, this configuration-based system facilitates a streamlined process for "point fixes," such as regex takedowns, providing an agile defense layer that acts faster than traditional ML/LLM model refresh cycles.

ML-Based Defenses

After generating synthetic data that expands new attacks into variants, the next step is to retrain our ML-based defenses to mitigate these new attacks. We partition the synthetic data described above into separate training and validation sets to ensure performance is evaluated against held-out examples. This approach ensures repeatability, data consistency for fixed training/testing, and establishes a scalable architecture to support future extensions towards fully automated model refresh.

LLM-Based Defenses

Using the new synthetic data examples, our LLM-based defenses go through prompt engineering with refined system instructions. The goal is to iteratively optimize these prompts against agreed-upon defense effectiveness metrics, ensuring the models remain resilient against evolving threat vectors.

Gemini Model Hardening

Beyond system-level guardrails and application-level defenses, we prioritize ‘model hardening’, a process that improves the Gemini model's internal capability to identify and ignore harmful instructions within data. By utilizing synthetic datasets and fresh attack patterns, we can model various threat iterations. This enables us to strengthen the Gemini model's ability to disregard harmful embedded commands while following the user's intended request. Through this process of model hardening, Gemini has become significantly more adept at detecting and disregarding injected instructions. This has led to a reduction in the success rate of attacks without compromising the model's efficiency during routine operations.

Defense effectiveness

To measure the real-world impact of defense improvements, we simulate attacks against many Workspace features. This process leverages the newly generated synthetic attack data described on this blog, to create a robust, end-to-end evaluation. The simulation is run against multiple Workspace apps, such as Gmail and Docs, using a standardized set of assets to ensure reliable results. To determine the exact impact of a defense improvement (e.g., an updated ML model or a new LLM prompt optimization), the end-to-end evaluation is run with and without the defense enabled. This comparative testing provides the essential "before and after" metrics needed to validate defense efficacy and drive continuous improvement.

Moving forward

Our commitment to AI security is rooted in the principle that every day you’re safer with Google. While the threat landscape of indirect prompt injection evolves, we are building Workspace with Gemini to be a secure and trustworthy platform for AI-first work. IPI is a complex security challenge, which requires a defense-in-depth strategy and continuous mitigation approach. To get there, we’re combining world-class security research, automated pipelines, and advanced ML/LLM-based models. This robust and iterative framework helps to ensure we not only stay ahead of evolving threats but also provide a powerful, secure experience for both our users and customers.