February 11, 2008

All Your iFrame Are Point to Us



It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed. Our research paper is currently under peer review, but we are making a technical report [PDF] available now. Although our technical report contains a lot more detail, we present some high-level findings here:

Search Results Containing a URL Labeled as Harmful


The above graph shows the percentage of daily queries that contain at least one search result labeled as harmful. In the past few months, more than 1% of all search results contained at least one result that we believe to point to malicious content and the trend seems to be increasing.

Browsing Habits

Good computer hygiene, such as running automatic updates for the operating system and third-party applications, as well as installing anti-virus products goes a long way in protecting your home computer. However, we have been wondering if users' browsing habits impact the likelihood of encountering malicious web pages. To study this aspect, we took a sample of ~7 million URLs and mapped them to DMOZ categories. Although we found that adult web pages may increase the risk of exploitation, each DMOZ category was affected.

Malicious Content Injection

To understand if malicious content on a web server is due to poor web server security, we analyzed the version numbers reported by web servers on which we found malicious pages. Specifically, we looked at the Apache and the PHP versions exported as part of a server's response. We found that over 38% of both Apache and PHP versions were outdated increasing the risk of remote content injection to these servers.

Our "Ghost In the Browser [PDF]" paper highlighted third-party content as one potential vector of malicious content. Today, a lot of third-party content is due to advertising. To assess the extent to which advertising contributes to drive-by downloads, we analyze the distribution chain of malware, i.e. all the intermediary URLs a browser downloads before reaching a malware payload. We inspected each distribution chain for membership in about 2,000 known advertising networks. If any URL in the distribution chain corresponds to a known advertising network, we count the whole page as being infectious due to Ads. In our analysis, we found that on average 2% of malicious web sites were delivering malware via advertising. The underlying problem is that advertising space is often syndicated to other parties who are not known to the web site owner. Although non-syndicated advertising networks such as Google Adwords are not affected, any advertising networks practicing syndication needs to carefully study this problem. Our technical report [PDF] contains more detail including an analysis based on the popularity of web sites.

Structural Properties of Malware Distribution


Finally, we also investigated the structural properties of malware distribution sites. Some malware distribution sites had as many as 21,000 regular web sites pointing to them. We also found that the majority of malware was hosted on web servers located in China. Interestingly, Chinese malware distribution sites are mostly pointed to by Chinese web servers.

We hope that an analysis such as this will help us to better understand the malware problem in the future and allow us to protect users all over the Internet from malicious web sites as best as we can. One thing is clear - we have a lot of work ahead of us.

23 comments:

  1. It was just a matter of time before malware distributors started exploiting hosts. For the last several years Open Directory volunteer editors have noticed hosts they were exploited by programs that put hidden porn and drug links and text on the sites on that host.

    There are also some parking hosts that are either adding the malware themselves or are being exploited.

    Blogs may be next, if they are not a target already. We saw an explosion of "hijacked" blogs about 3-4 years ago. I assume the blog owner's password was hacked. Off-topic links and copied text was substituted for the original content. For a search engine there is little context to know what the original content was. It is quite evident to from the original title and description that the site is hacked/hijacked. Of course, once a search engine is instructed what to look for, it is effective in searching for similar sites. One example:
    --hamster-dwarf.blogspot.com-- The site was originally listed in Open Directory as " Hamster Hang Out - A general guide on the care of Campbell's Russian Dwarf hamsters. Includes information on care, diet and health." I think the content has changed :)

    Even earlier than exploiting blogs, hackers/hijackers were changing content of free-hosted sites. I imagine it is fertile ground for malware producers. One example:
    -jwscattergood.mysite.wanadoo-members.co.uk- That particular free host is not worse than others, most were exploited.

    ReplyDelete
  2. Yes it's become very bad. I really appreciate the Google Safe Browsing API being available. While I haven't gotten to use it yet, it's another tool that can be used to prevent spreading of malware.

    As for causes, I'd say most of the causes are on the web application area. There are tons of new exploits and vulnerabilities found daily and all it takes is a handful of people to forget to upgrade and there is another handful of websites with more malware.

    ReplyDelete
  3. Most of the Malware hosting runs along the same lines as spam... older domain URL's that have been purchased as place holders to serve up some kind of PPC ads.. normally about 6 mos. to a year after the first purchase a second purchase may occur when then has a refresh tag to and inside URL that has a +26 character pagename (26+.html, etc.) which has a large image of somekind at the top and drive by malware at the bottom.. by the time the image loads... it's too late..

    i think better policing of DEAD URLs will go along way to fixing this problem.

    thanks for the heads up.. good article :)

    ReplyDelete
  4. Lots of information. Thanks guys!

    On the analysis of the network connections: Did you investigate also new listening ports? I am wondering whether compromised hosts are abused as phishing sites (which might be promoted by some spam-malware that is pushed on the client machine)

    On the anti-virus scan: Would be great if you could include some stats on the classification of the malware. In our work, we mostly saw fraudulent applications (approx 37%), spyware/adware (approx 6%), and bots/ rootkits/ spam apps (< 5%). While our data set only analyzed about 200 malicious URLs, it would be interesting to see results on the gigantic data set Google has available.

    Christian

    ReplyDelete
  5. Its interesting that while Google has spent so much time researching drive-by downloads, they dont know how to test a product's protection against them. They still continue to use AV scanners to test drive-by downloads. That approach is just plain wrong.. because when you do that, you are testing only one aspect of the product - the av engine.

    I have been looking at a specific feature in NIS/NAV2008 called Browser Defender that according to Symantec was specifically designed to detect and block drive-by downloads even if they are obfuscated.

    I have to say, it works incredibly well even if you modifying the JScript to tweak the shell-code or the JScript. Google's tests did not take this into account, so the results that they have in their paper that the best protection they found was 70% is very misleading.

    Google you need to fix your test methodology. What you should do is install the entire security product under test and then launch the browser with the offending URL and see if it detects it. Oh.. one important point. If have to have the ActiveX being exploited actually installed on the machine.

    ReplyDelete
  6. Google report was interesting reading, and it was satisfying to notice that it repeated some of the findings of the recent WOT study of dangerous websites: http://www.mywot.com/en/press/february

    In this study we found out that the 3 categories of websites causing most damage to users are adult content (28% of the dangerous sites analyzed), software (27%), and entertainment (16%).

    The study is based on analysis of 17 million websites rated by the WOT user community: www.mywot.com

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. Question: when will you solve the problem with iclk script that's being used as a redirector for spam, phishing and malware?

    ReplyDelete
  13. The "malvertisement" problem has sadly been around for almost two years now (at least as far as i know) and it's worrysome that it's getting worse. One of the problems is indeed the increasing # of ad-networks and hence the longer redirect stream.

    If anyone is interesting I've written extensively about the advertising problem: http://www.mikeonads.com/what-is-errorsafe-and-how-do-we-stop-it/

    Sandi has a more up to date list of "bad ads" on her blog here: http://msmvps.com/blogs/spywaresucks/Default.aspx

    -mike

    ReplyDelete
  14. It is tough to blame the ad-networks for this problem simply because there are more of them. That is like blaming car dealers for an increase in carjackings.

    Do you (Google) contact the owner of the potentially affected host and let them know your findings? It may be helpful to give them your data so they can take measures to deal with the malware.

    And Mcafee SiteAdvisor (www.siteadvisor.com) is a tool for web-users looking to verify if sites have been infected. This along with google's own system seem to do a decent job keeping people from accessing infected sites.

    www.mbridge.com

    ReplyDelete
  15. This comment has been removed by a blog administrator.

    ReplyDelete
  16. Nice work done!!! But can we have any permanent solution to avoid this malware from internet? Can Google remove such sites from search results that will stop visitors to visit such sites?

    ReplyDelete
  17. Given the impossibility of policing the internet we believe a client side browser security solution is needed. ZoneAlarm ForceField virtualizes the browser so that any malware received in a drive by download is trapped in the virtual session. More information is available at www.zonealarm.com.
    Laura Yecies
    General Manager, Check Point ZoneAlarm Consumer Division

    ReplyDelete
  18. This comment has been removed by a blog administrator.

    ReplyDelete
  19. This comment has been removed by a blog administrator.

    ReplyDelete
  20. This comment has been removed by a blog administrator.

    ReplyDelete
  21. The trouble with this is that it becomes more of a shock if a Google result turns out to be malware! :)
    I had a malware search result today. The URL was http://www.gbminis.lhosting.info/burris-b2a/international-sim-card-uk.html
    It would be nice if there was a way of reporting a search result as potentially harmful..
    Regards
    Rick

    ReplyDelete
  22. The simple fact is that a browser, connected to the largest network in modern history, should not have the privilege to create and execute files, unattended, all over the OS system. If browser developers are unwilling to adopt a 'sandbox' security model we will continue to be vulnerable to internet-based attacks. Whether a site is trusted or not, it should not have any ability to permanently modify the browser or OS. Our security, software, and identities are continually compromised because the 'good guys' have the same interest as the 'bad guys'-- accessing detailed system/user information and exploiting it. Therefore, I assert that we will remain exposed to internet based 'attacks' because it is in the interest of browser makers to server up the greatest access to OS/User to advertisers and site traffic tools.

    ReplyDelete
  23. Questo blog รจ davvero utile e pieno di ottime informazioni. Grazie mille

    Redatto da http://www.cataniaroma.com

    ReplyDelete

You are welcome to contribute comments, but they should be relevant to the conversation. We reserve the right to remove off-topic remarks in the interest of keeping the conversation focused and engaging. Shameless self-promotion is well, shameless, and will get canned.

Note: Only a member of this blog may post a comment.