August 29, 2012

Content hosting for the modern web



Our applications host a variety of web content on behalf of our users, and over the years we learned that even something as simple as serving a profile image can be surprisingly fraught with pitfalls. Today, we wanted to share some of our findings about content hosting, along with the approaches we developed to mitigate the risks.

Historically, all browsers and browser plugins were designed simply to excel at displaying several common types of web content, and to be tolerant of any mistakes made by website owners. In the days of static HTML and simple web applications, giving the owner of the domain authoritative control over how the content is displayed wasn’t of any importance.

It wasn’t until the mid-2000s that we started to notice a problem: a clever attacker could manipulate the browser into interpreting seemingly harmless images or text documents as HTML, Java, or Flash—thus gaining the ability to execute malicious scripts in the security context of the application displaying these documents (essentially, a cross-site scripting flaw). For all the increasingly sensitive web applications, this was very bad news.

During the past few years, modern browsers began to improve. For example, the browser vendors limited the amount of second-guessing performed on text documents, certain types of images, and unknown MIME types. However, there are many standards-enshrined design decisions—such as ignoring MIME information on any content loaded through <object> , <embed> , or <applet> —that are much more difficult to fix; these practices may lead to vulnerabilities similar to the GIFAR bug.

Google’s security team played an active role in investigating and remediating many content sniffing vulnerabilities during this period. In fact, many of the enforcement proposals were first prototyped in Chrome. Even still, the overall progress is slow; for every resolved problem, researchers discover a previously unknown flaw in another browser mechanism. Two recent examples are the Byte Order Mark (BOM) vulnerability reported to us by Masato Kinugawa, or the MHTML attacks that we have seen happening in the wild.

For a while, we focused on content sanitization as a possible workaround - but in many cases, we found it to be insufficient. For example, Aleksandr Dobkin managed to construct a purely alphanumeric Flash applet, and in our internal work the Google security team created images that can be forced to include a particular plaintext string in their body, after being scrubbed and recoded in a deterministic way.

In the end, we reacted to this raft of content hosting problems by placing some of the high-risk content in separate, isolated web origins—most commonly *.googleusercontent.com. There, the “sandboxed” files pose virtually no threat to the applications themselves, or to google.com authentication cookies. For public content, that’s all we need: we may use random or user-specific subdomains, depending on the degree of isolation required between unrelated documents, but otherwise the solution just works.

The situation gets more interesting for non-public documents, however. Copying users’ normal authentication cookies to the “sandbox” domain would defeat the purpose. The natural alternative is to move the secret token used to confer access rights from the Cookie header to a value embedded in the URL, and make the token unique to every document instead of keeping it global.

While this solution eliminates many of the significant design flaws associated with HTTP cookies, it trades one imperfect authentication mechanism for another. In particular, it’s important to note there are more ways to accidentally leak a capability-bearing URL than there are to accidentally leak cookies; the most notable risk is disclosure through the Referer header for any document format capable of including external subresources or of linking to external sites.

In our applications, we take a risk-based approach. Generally speaking, we tend to use three strategies:
  • In higher risk situations (e.g. documents with elevated risk of URL disclosure), we may couple the URL token scheme with short-lived, document-specific cookies issued for specific subdomains of googleusercontent.com. This mechanism, known within Google as FileComp, relies on a range of attack mitigation strategies that are too disruptive for Google applications at large, but work well in this highly constrained use case.
  • In cases where the risk of leaks is limited but responsive access controls are preferable (e.g., embedded images), we may issue URLs bound to a specific user, or ones that expire quickly.
  • In low-risk scenarios, where usability requirements necessitate a more balanced approach, we may opt for globally valid, longer-lived URLs.
Of course, the research into the security of web browsers continues, and the landscape of web applications is evolving rapidly. We are constantly tweaking our solutions to protect Google users even better, and even the solutions described here may change. Our commitment to making the Internet a safer place, however, will never waver.

18 comments:

  1. Interesting...
    What about using a subdomain and having authentication cookies tied to *.domain.com with the HTTPOnly flag set? It does sound risky but I can't think of any attack.

    ReplyDelete
  2. It not only sounds risky, hosting user content on sub domains is risky. I've seen several times that this has opened the way to exploitation of session fixation issues. There are further attack vectors as cross domain policies, CORS or document.domain for such setups.

    So putting user provided content in a separate domain is an very good idea.

    ReplyDelete
  3. oam: it's an improvement, but there are at least two problems with just using something like http[s]://userfiles.example.com/predictable_URL.pdf:

    1) If the attacker knows the URL of any interesting private document within userfiles.example.com, and can host his own malicious file in the same origin, it is fairly easy to steal sensitive data.

    2) Although httponly cookies can't be read back by scripts (spare for semi-frequent plugin bugs), they can be typically overwritten with some minimal effort - which will often have very serious consequences, especially for complex web apps.

    ReplyDelete
  4. Yeah it makes sense. Thanks !

    ReplyDelete
  5. Was the "Byte Order Mark (BOM) vulnerability reported to us by Masato Kinugawa" described anywhere in more detail?

    ReplyDelete
  6. Probably not in English :-) But the basic idea is that Internet Explorer would give precedence to BOM indicators in the file over charset= value present in Content-Type or META, allowing many documents to suddenly become UTF-7 or so.

    I believe that Microsoft folks changed this behavior earlier this year.

    ReplyDelete
  7. To oam's question about subdomains, I believe that if you allow this and you have loose cookie rules, you are vulnerable to cookie tossing, aka "Same Origin Policy Abuse Techniques".

    http://webapp-hardening.heroku.com/cookietossing

    ReplyDelete
  8. The internet takes the path of Linux/Unix. All the design flaws will be changed in time. Changing the entire internet protocol suite is option 2. Think about writing a replacement for TCP/IP, it's a funny one.

    ReplyDelete
  9. Very informative post! Thanks a lot!

    ReplyDelete
  10. Security is one of the major issue with my website, I had my website with only HTML and was not using any dynamic feature expect some little things. After a good start I start to get success online and decide to go with a wordpress website, but within a few week after my new website launch, I felt real setback because my website was showing error and showing some hack message. Don't know enough about these, my developer fail to handle the situation so I got the website restored by my web host, but I am still worried if it will became much worse then ?

    ReplyDelete
  11. It’s great to see good information being shared and also to see fresh, creative ideas that have never been done before.

    Hosting Services Karachi

    ReplyDelete
  12. I would like to say this is an excellent site that I have ever come across. Very informative. Please write more so that we can get more details.

    Web Hosting

    ReplyDelete
  13. Hey! This post couldn’t be written any better! Reading through this post reminds me of my good old room mate! He always kept talking about this. I will forward this article to him. Pretty sure he will have a good read. Many thanks for sharing!

    Check also domeinnaam check Thanks!

    ReplyDelete
  14. Ideally i think companies begin up with shared hosting services and move up to VPS /dedicated hosting. A nice brief on all types of hosting!

    ReplyDelete
  15. I think the web takes the way of Linux/Unix. All the outline defects will be changed in time. Changing the whole web convention suite is alternative 2. Ponder composing a swap for TCP/IP, its an amusing one. for more detail Click Here

    ReplyDelete
  16. At first glance cheap web hosting may seem unenchanting, however its study is a necessity for any one wishing to intellectually advance beyond their childhood. Though cheap web hosting is a favourite topic of discussion amongst monarchs, presidents and dictators, cheap web hosting is not given the credit if deserves for inspiring many of the worlds famous painters. The juxtapositioning of cheap web hosting with fundamental economic, social and political strategic conflict draws criticism from the easily lead, many of whom blame the influence of television.

    ReplyDelete

You are welcome to contribute comments, but they should be relevant to the conversation. We reserve the right to remove off-topic remarks in the interest of keeping the conversation focused and engaging. Shameless self-promotion is well, shameless, and will get canned.

Note: Only a member of this blog may post a comment.