At Google, our Threat Intelligence teams are dedicated to staying ahead of real-world adversarial activity, proactively monitoring emerging threats before they can impact users. Right now, Indirect Prompt Injection (IPI) is a top priority for the security community, anticipating it as a primary attack vector for adversaries to target and compromise AI agents. But while the danger of IPI is widely discussed, are threat actors actually exploiting this vector today – and if so, how?
To answer these questions and to uncover real-world abuse, we initiated a broad sweep of the public web to monitor for known indirect prompt injection patterns. This is what we found.
The threat of indirect prompt injection
Unlike a direct injection where a user "jailbreaks" a chatbot, IPI occurs when an AI system processes content—like a website, email, or document—that contains malicious instructions. When the AI reads this poisoned content, it may silently follow the attacker's commands instead of the user's original intent.
This is not a new area of concern for us and Google has been working tirelessly to combat these threats. Our efforts involve cross-functional collaboration between researchers at Google DeepMind (GDM) and defenders like the Google Threat Intelligence Group (GTIG). We have previously detailed our work in this area and researchers have further highlighted the evolving nature of these vulnerabilities.
Despite this collective focus, a fundamental question remains: to what degree are real-world malicious actors currently operationalizing these attacks?
Proactive monitoring at Google
The landscape of IPI on the web
There are many channels through which attackers might try to send prompt injections. However, one location is particularly easy to observe - the public web. Here, threat actors may simply seed prompt injections on websites in hope of corrupting AI systems that browse them.
Public research confirms these attacks are possible; consequently, we should expect real-world adversaries to exploit these vulnerabilities to cause harm.
Thus, we ask a basic question: What outcomes are real attackers trying to achieve today?
For ease of access and reproducibility, we chose to use Common Crawl, which is a large repository of crawled websites from the English-speaking web. Common Crawl provides monthly snapshots of 2-3 billion pages each. These are mostly static websites, which includes self-published content such as blogs, forums and comments on these sites, but as a caveat it does not contain most social media content (e.g., LinkedIn, Facebook, X, …) as Common Crawl skips websites with login walls and anti-crawl directives.
This means that, while prompt injections have been observed on social media, we reserve these for an upcoming separate study. For a first look, we can observe prompt injections even in standard HTML, for which Common Crawl conveniently provides not just the source, but also the parsed plaintext.
The challenge of false positives
The task of scanning large amounts of documents for prompt injections may sound simple, but in reality is hindered by an overwhelming number of false positive detections.
Early experiments revealed a significant volume of "benign" prompt injection text, which illustrates the complexity of distinguishing between functional threats and harmless content. Many prompt injections were found in research papers, educational blog posts, or security articles discussing this very topic.

False positives: Most prompt injections in web content tend to be education material for researchers. (Source: GitHub/swisskyrepo)
When searching for prompt injections naively, the majority of detections are benign content – false positives in our case. Therefore, we opted for a coarse-to-fine filtering approach:
Pattern Matching: We initially identified candidate pages by searching for a range of popular prompt injection signatures, like “ignore … instructions”, “if you are an AI”, etc.
LLM-Based Classification: These candidates were then processed by Gemini to classify the intent of the suspicious text, and to understand whether they were part of the overall document narrative or suspiciously out of place.
Human Validation: A final round of manual review was conducted on the classified results to ensure high confidence in our findings.
While this approach is not exhaustive and might miss uncommon signatures, it can serve as a starting point for understanding the quality of prompt injections in the wild.
What we found
Our analysis revealed a range of attempts that, if successful, would try to manipulate AI systems browsing the website. Most of the prompt injections we observed fall into these categories:
Harmless Prank
This class of prompt injection aims to cause mostly harmless side effects in AI assistants reading the website. We found many instances of this – consider the source code of this website, which contains an invisible prompt injection that instructs agents reading the website to change their conversational tone:

Helpful Guidance
We also observed website authors who wanted to exert control over AI summaries in order to provide the best service to their readers. We consider this a benign example, since the prompt injection does not attempt to prevent AI summary, but instead instructs it to add relevant context.
We note that this example could easily turn malicious if the instruction tried to add misinformation or attempted to redirect the user to third party websites.

Search Engine Optimization (SEO)
Some websites include prompt injections for the purpose of SEO, trying to manipulate AI assistants into promoting their business over others:

While the above example is simple, we have also started to see more sophisticated SEO prompt injection attempts. Consider the intricate prompt below, which was seemingly generated by an automated SEO suite and inserted into website text:

Deterring AI agents
Some websites try to prevent retrieval by AI agents via prompt injection. There exist many examples of “If you are an AI, then do not crawl this website”. However, we also observed more insidious implementations:

This injection tries to lure AI readers onto a separate page which, when opened, streams an infinite amount of text that never finishes loading. In this way, the author might hope to waste resources or cause timeout errors during the processing of their website.
Malicious: Exfiltration
We were able to observe a small number of prompt injections that aim at theft of data. However, for this class of attacks, sophistication seemed much lower. Consider this example: 
As we can see, this is a website author performing an experiment. We did not observe significant amounts of advanced attacks (e.g. using known exfiltration prompts published by security researchers in 2025). This seems to indicate that attackers have yet not productionized this research at scale.
Malicious: Destruction
Finally, we observed a number of websites that attempt to vandalize the machine of anyone using AI assistants. If executed, the commands in this example would try to delete all files on the user’s machine:

While potentially devastating, we consider this simple injection unlikely to succeed, which makes it similar to those in the other categories: We mostly found individual website authors who seemed to be running experiments or pranks, without replicating advanced IPI strategies found in recently published research.
What does this mean?
Our results indicate that attackers are experimenting with IPI on the web. While the observed activity suggests limited sophistication, this might be only part of the bigger picture.
For one, we scanned only an archive of the public web (CommonCrawl), which does not capture major social media sites. Additionally, even though sophistication was low, we observed an uptick in detections over time: We saw a relative increase of 32% in the malicious category between November 2025 and February 2026, repeating the scan on multiple versions of the archive. This upward trend indicates growing interest in IPI attacks.
In general, threat actors tend to engage based on cost/benefit considerations. In the past, IPI attacks were considered exotic and difficult. And even when compromised, AI systems often were not able to execute malicious actions reliably.
We believe that this could change soon. Today’s AI systems are much more capable, increasing their value as targets, while threat actors have simultaneously begun automating their operations with agentic AI, bringing down the cost of attack. As a result, we expect both the scale and sophistication of attempted IPI attacks to grow in the near future.
Moving forward
Our findings indicate that, while past attempts at IPI attacks on the web have been low in sophistication, their upward trend suggests that the threat is maturing and will soon grow in both scale and complexity.
At Google, we are prepared to face this emergent threat, as we continue to invest in hardening our AI models and products. Our dedicated red teams have been relentlessly pressure-testing our systems to ensure Gemini is robust to adversarial manipulation, and our AI Vulnerability Reward Program allows external researchers to participate.
Finally, Google’s established ability to process global-scale data in real-time allows us to identify and neutralize threats before they can impact users. We remain committed to keeping the Internet safe and will continue to share intelligence with the community.
To learn more about Google’s progress and research on generative AI threat actors, attack techniques, and vulnerabilities, take a look at the following resources:
No comments :
Post a Comment