Security Blog
The latest news and insights from Google on security and safety on the Internet
Announcing the Google Security and Privacy Research Awards
November 29, 2018
Posted by Elie Bursztein and Oxana Comanescu, Google Security and Privacy Group
We believe that cutting-edge research plays a key role in advancing the security and privacy of users across the Internet. While we do significant in-house research and engineering to protect users’ data, we maintain strong ties with academic institutions worldwide. We provide seed funding through
faculty research grants
,
cloud credits
to unlock new experiments, and foster active collaborations, including
working with visiting scholars
and
research interns
.
To accelerate the next generation of security and privacy breakthroughs, we recently created the Google Security and Privacy Research Awards program. These awards, selected via internal Google nominations and voting, recognize academic researchers who have made recent, significant contributions to the field.
We’ve been developing this program for several years. It began as a pilot when we awarded researchers for their work in 2016, and we expanded it more broadly for work from 2017. So far, we awarded $1 million dollars to 12 scholars. We are preparing the shortlist for 2018 nominees and will announce the winners next year. In the meantime, we wanted to highlight the previous award winners and the influence they’ve had on the field.
2017 Awardees
Lujo Bauer
, Carnegie Mellon University
Research area: Password security and attacks against facial recognition
Dan Boneh
, Stanford University
Research area: Enclave security and post-quantum cryptography
Aleksandra Korolova
, University of Southern California
Research area: Differential privacy
Daniela Oliveira
, University of Florida
Research area: Social engineering and phishing
Franziska Roesner
, University of Washington
Research area: Usable security for augmented reality and at-risk populations
Matthew Smith
, Universität Bonn
Research area: Usable security for developers
2016 Awardees
Michael Bailey
, University of Illinois at Urbana-Champaign
Research area: Cloud and network security
Nicolas Christin
, Carnegie Mellon University
Research area: Authentication and cybercrime
Damon McCoy
, New York University
Research area: DDoS services and cybercrime
Stefan Savage
, University of California San Diego
Research area: Network security and cybercrime
Marc Stevens
, Centrum Wiskunde & Informatica
Research area: Cryptanalysis and lattice cryptography
Giovanni Vigna
, University of California Santa Barbara
Research area: Malware detection and cybercrime
Congratulations to all of our award winners.
Industry collaboration leads to takedown of the “3ve” ad fraud operation
November 27, 2018
Posted by Per Bjorke, Product Manager, Ad Traffic Quality
For years, Google has been waging a comprehensive, global fight against invalid traffic through a combination of technology, policy, and operations teams to protect advertisers and publishers and increase transparency throughout the advertising industry.
Last year, we identified one of the most complex and sophisticated ad fraud operations we have seen to date, working with cyber security firm
White Ops
, and referred the case to law enforcement. Today, the U.S. Attorney’s Office for the Eastern District of New York
announced
criminal charges associated with this fraud operation. This takedown marks a major milestone in the industry’s fight against ad fraud, and we’re proud to have been a key contributor.
In partnership with White Ops, we have published a white paper about how we identified this ad fraud operation, the steps we took to protect our clients from being impacted, and the technical work we did to detect patterns across systems in the industry. Below are some of the highlights from the white paper, which you can download
here
.
All about 3ve: A creative and sophisticated threat
Referred to as 3ve (pronounced “Eve”), this ad fraud operation evolved over the course of 2017 from a modest, low-level botnet into a large and sophisticated operation that used a broad set of tactics to commit ad fraud. 3ve operated on a significant scale: At its peak, it controlled over 1 million IPs from both residential malware infections and corporate IP spaces primarily in North America and Europe.
Through our investigation, we discovered that 3ve was comprised of three unique sub-operations that evolved rapidly, using sophisticated tactics aimed at exploiting data centers, computers infected with malware, spoofed fraudulent domains, and fake websites. Through its varied and complex machinery, 3ve generated billions of fraudulent ad bid requests (i.e., ad spaces on web pages that advertisers can bid to purchase in an automated way), and it also created thousands of spoofed fraudulent domains. It should be noted that our analysis of ad bid requests indicated growth in activity, but not necessarily growth in transactions that would result in charges to advertisers. It’s also worth noting that 3+ billion daily ad bid requests made 3ve an extremely large ad fraud operation, but its bid request volume was only a small percentage of overall bid request volume across the industry.
Our objective
Trust and integrity are critical to the digital advertising ecosystem. Investments in our ad traffic quality systems made it possible for us to tackle this ad fraud operation and to limit the impact it had on our clients as quickly as possible, including crediting advertisers.
3ve’s focus, like many ad fraud schemes, was not a single player or system, but rather the whole advertising ecosystem. As we worked to protect our ad systems against traffic from this threat, we identified that others also had observed this traffic, and we partnered with them to help remove the threat from the ecosystem. The working group, which included nearly 20 partners, was a key component that shaped our broader investigation into 3ve, enabling us to engage directly with each other and to work towards a mutually beneficial outcome.
Industry collaboration helps bring 3ve down
While ad fraud traditionally has been seen as a faceless crime in which bad actors don’t face much risk of being identified or consequences for their actions, 3ve’s takedown demonstrates that there are risks and consequences to committing ad fraud. We’re confident that our collective efforts are building momentum and moving us closer to finding a resolution to this challenge.
For example, industry initiatives such as the Interactive Advertising Bureau (IAB) Tech Lab’s ads.txt standard, which has experienced and continues to see very rapid adoption (over 620,000 domains have an ads.txt), as well as the increasing number of buy-side platforms and exchanges offering refunds for invalid traffic, are valuable steps towards cutting off the money flow to fraudsters.
As we announced last year
, we’ve made, and will continue to make investments in our automated refunds for invalid traffic, including our work with supply partners to provide advertisers with refunds for invalid traffic detected up to 30 days after monthly billing.
Industry bodies such as the IAB, Trustworthy Accountability Group (TAG), Media Rating Council, and the Joint Industry Committee for Web Standards, who are serving as agents of change and collaboration across our industry, are instrumental in the fight against ad fraud. We have a long history of working with these bodies, including ongoing participation in TAG and IAB leadership and working groups, as well as our inclusion in the TAG Certified Against Fraud program. That program’s value was reinforced with the IAB’s requirement that all members need to be TAG certified by the middle of this year.
Successful disruption
A coordinated takedown of infrastructure related to 3ve’s operations occurred recently. The takedown involved disrupting as much of the related infrastructure as possible to make it hard to rebuild any of 3ve’s operations. As the graph below demonstrates, declining volumes in invalid traffic indicate that the disruption thus far has been successful, bringing the bid request traffic close to zero within 18 hours of starting the coordinated takedown.
Looking ahead
We’ll continue to be vigilant, working to protect marketers, publishers, and users, while continuing to collaborate with the broader industry to safeguard the integrity of the digital advertising ecosystem that powers the open web. Our work to take down 3ve is another example of our collaboration with the broader ecosystem to improve trust in digital advertising. We are committed to helping to create a better digital advertising ecosystem — one that is more valuable, transparent, and trusted for everyone.
Combating Potentially Harmful Applications with Machine Learning at Google: Datasets and Models
November 15, 2018
Posted by Mo Yu, Damien Octeau, and Chuangang Ren, Android Security & Privacy Team
[Cross-posted from the
Android Developers Blog
]
In a
previous blog post
, we talked about using machine learning to combat
Potentially Harmful Applications (PHAs)
. This blog post covers how Google uses machine learning techniques to detect and classify PHAs. We'll discuss the challenges in the PHA detection space, including the scale of data, the correct identification of PHA behaviors, and the evolution of PHA families. Next, we will introduce two of the datasets that make the training and implementation of machine learning models possible, such as app analysis data and Google Play data. Finally, we will present some of the approaches we use, including logistic regression and deep neural networks.
Using Machine Learning to Scale
Detecting PHAs is challenging and requires a lot of resources. Our security experts need to understand how apps interact with the system and the user, analyze complex signals to find
PHA behavior
, and evolve their tactics to stay ahead of PHA authors. Every day,
Google Play Protect
(GPP) analyzes over half a million apps, which makes a lot of new data for our security experts to process.
Leveraging machine learning helps us detect PHAs faster and at a larger scale. We can detect more PHAs just by adding additional computing resources. In many cases, machine learning can find PHA signals in the training data without human intervention. Sometimes, those signals are different than signals found by security experts. Machine learning can take better advantage of this data, and discover hidden relationships between signals more effectively.
There are two major parts of Google Play Protect's machine learning protections: the data and the machine learning models.
Data Sources
The quality and quantity of the data used to create a model are crucial to the success of the system. For the purpose of PHA detection and classification, our system mainly uses two anonymous data sources: data from analyzing apps and data from how users experience apps.
App Data
Google Play Protect analyzes every app that it can find on the internet. We created a dataset by decomposing each app's APK and extracting PHA signals with deep analysis. We execute various processes on each app to find particular features and behaviors that are relevant to the PHA categories in scope (for example, SMS fraud, phishing, privilege escalation). Static analysis examines the different resources inside an APK file while dynamic analysis checks the behavior of the app when it's actually running. These two approaches complement each other. For example, dynamic analysis requires the execution of the app regardless of how obfuscated its code is (obfuscation hinders static analysis), and static analysis can help detect cloaking attempts in the code that may in practice bypass dynamic analysis-based detection. In the end, this analysis produces information about the app's characteristics, which serve as a fundamental data source for machine learning algorithms.
Google Play Data
In addition to analyzing each app, we also try to understand how users perceive that app. User feedback (such as the number of installs, uninstalls, user ratings, and comments) collected from Google Play can help us identify problematic apps. Similarly, information about the developer (such as the certificates they use and their history of published apps) contribute valuable knowledge that can be used to identify PHAs. All these metrics are generated when developers submit a new app (or new version of an app) and by millions of Google Play users every day. This information helps us to understand the quality, behavior, and purpose of an app so that we can identify new PHA behaviors or identify similar apps.
In general, our data sources yield raw signals, which then need to be transformed into machine learning features for use by our algorithms. Some signals, such as the permissions that an app requests, have a clear semantic meaning and can be directly used. In other cases, we need to engineer our data to make new, more powerful features. For example, we can aggregate the ratings of all apps that a particular developer owns, so we can calculate a rating per developer and use it to validate future apps. We also employ several techniques to focus in on interesting data.To create compact representations for sparse data, we use
embedding
. To help streamline the data to make it more useful to models, we use
feature selection
. Depending on the target, feature selection helps us keep the most relevant signals and remove irrelevant ones.
By combining our different datasets and investing in
feature engineering
and feature selection, we improve the quality of the data that can be fed to various types of machine learning models.
Models
Building a good machine learning model is like building a skyscraper: quality materials are important, but a great design is also essential. Like the materials in a skyscraper, good datasets and features are important to machine learning, but a great algorithm is essential to identify PHA behaviors effectively and efficiently.
We train models to identify PHAs that belong to a specific category, such as SMS-fraud or phishing. Such categories are quite broad and contain a large number of samples given the number of PHA families that fit the definition. Alternatively, we also have models focusing on a much smaller scale, such as a family, which is composed of a group of apps that are part of the same PHA campaign and that share similar source code and behaviors. On the one hand, having a single model to tackle an entire PHA category may be attractive in terms of simplicity but precision may be an issue as the model will have to generalize the behaviors of a large number of PHAs believed to have something in common. On the other hand, developing multiple PHA models may require additional engineering efforts, but may result in better precision at the cost of reduced scope.
We use a variety of modeling techniques to modify our machine learning approach, including supervised and unsupervised ones.
One supervised technique we use is logistic regression, which has been widely adopted in the industry. These models have a simple structure and can be trained quickly. Logistic regression models can be analyzed to understand the importance of the different PHA and app features they are built with, allowing us to improve our feature engineering process. After a few cycles of training, evaluation, and improvement, we can launch the best models in production and monitor their performance.
For more complex cases, we employ deep learning. Compared to logistic regression, deep learning is good at capturing complicated interactions between different features and extracting hidden patterns. The millions of apps in Google Play provide a rich dataset, which is advantageous to deep learning.
In addition to our targeted feature engineering efforts, we experiment with many aspects of deep neural networks. For example, a deep neural network can have multiple layers and each layer has several neurons to process signals. We can experiment with the number of layers and neurons per layer to change model behaviors.
We also adopt unsupervised machine learning methods. Many PHAs use similar abuse techniques and tricks, so they look almost identical to each other. An unsupervised approach helps define clusters of apps that look or behave similarly, which allows us to mitigate and identify PHAs more effectively. We can automate the process of categorizing that type of app if we are confident in the model or can request help from a human expert to validate what the model found.
PHAs are constantly evolving, so our models need constant updating and monitoring. In production, models are fed with data from recent apps, which help them stay relevant. However, new abuse techniques and behaviors need to be continuously detected and fed into our machine learning models to be able to catch new PHAs and stay on top of recent trends. This is a continuous cycle of model creation and updating that also requires tuning to ensure that the precision and coverage of the system as a whole matches our detection goals.
Looking forward
As part of Google's AI-first strategy, our work leverages many machine learning resources across the company, such as tools and infrastructures developed by Google Brain and Google Research. In 2017, our machine learning models
successfully detected 60.3% of PHAs identified by Google Play Protect
, covering over 2 billion Android devices. We continue to research and invest in machine learning to scale and simplify the detection of PHAs in the Android ecosystem.
Acknowledgements
This work was developed in joint collaboration with Google Play Protect, Safe Browsing and Play Abuse teams with contributions from Andrew Ahn, Hrishikesh Aradhye, Daniel Bali, Hongji Bao, Yajie Hu, Arthur Kaiser, Elena Kovakina, Salvador Mandujano, Melinda Miller, Rahul Mishra, Sebastian Porst, Monirul Sharif, Sri Somanchi, Sai Deep Tetali, and Zhikun Wang.
Introducing the Android Ecosystem Security Transparency Report
November 8, 2018
Posted by Jason Woloz and Eugene Liderman, Android Security & Privacy Team
Update: We identified a bug that affected how we calculated data from Q3 2018 in the Transparency Report. This bug created inconsistencies between the data in the report and this blog post. The data points in this blog post have been corrected.
As shared during the
What's new in Android security
session at Google I/O 2018, transparency and openness are important parts of Android's ethos. We regularly blog about new features and enhancements and publish an
annual Android Security Year in Review
, which highlights Android ecosystem trends. To provide more frequent insights, we're introducing a quarterly
Android Ecosystem Security Transparency Report
. This report is the latest addition to our
Transparency Report
site, which began in 2010 to show how the policies and actions of governments and corporations affect privacy, security, and access to information online.
This Android Ecosystem Security Transparency Report covers how often a routine, full-device scan by
Google Play Protect
detects a device with PHAs installed. Google Play Protect is built-in protection on Android devices that scans over 50 billion apps daily from inside and outside of Google Play. These scans look for evidence of
Potentially Harmful Applications
(PHAs). If the scans find a PHA, Google Play Protect warns the user and can disable or remove PHAs. In Android's first annual Android Security Year in Review from 2014, fewer than 1% of devices had PHAs installed. The percentage has declined steadily over time and this downward trend continues through 2018. The transparency report covers PHA rates in three areas: market segment (whether a PHA came from Google Play or outside of Google Play), Android version, and country.
Devices with Potentially Harmful Applications installed by market segment
Google works hard to protect your Android device: no matter where your apps come from. Continuing the trend from previous years, Android devices that only download apps from Google Play are 9 times less likely to get a PHA than devices that download apps from other sources. Before applications become available in Google Play they undergo an application review to confirm they comply with Google Play policies. Google uses a risk scorer to analyze apps to detect potentially harmful behavior. When Google’s application risk analyzer discovers something suspicious, it flags the app and refers the PHA to a security analyst for manual review if needed. We also scan apps that users download to their device from outside of Google Play. If we find a suspicious app, we also protect users from that—even if it didn't come from Google Play.
In the Android Ecosystem Security Transparency Report, the Devices with Potentially Harmful Applications installed by market segment chart shows the percentage of Android devices that have one or more PHAs installed over time. The chart has two lines: PHA rate for devices that exclusively install from Google Play and PHA rate for devices that also install from outside of Google Play. In 2017, on average 0.09% of devices that exclusively used Google Play had one or more PHAs installed. The first three quarters in 2018 averaged a lower PHA rate of 0.08%.
The security of devices that installed apps from outside of Google Play also improved. In 2017, ~0.82% of devices that installed apps from outside of Google Play were affected by PHA; in the first three quarters of 2018, ~0.68% were affected. Since 2017, we've reduced this number by expanding the auto-disable feature which we covered on page 10 in the
2017 Year in Review
. While malware rates fluctuate from quarter to quarter, our metrics continue to show a consistent downward trend over time. We'll share more details in our 2018 Android Security Year in Review in early 2019.
Devices with Potentially Harmful Applications installed by Android version
Newer versions of Android are less affected by PHAs. We attribute this to many factors, such as continued platform and API hardening, ongoing security updates and app security and developer training to reduce apps' access to sensitive data. In particular, newer Android versions—such as Nougat, Oreo, and Pie—are more resilient to privilege escalation attacks that had previously allowed PHAs to gain persistence on devices and protect themselves against removal attempts. The Devices with Potentially Harmful Applications installed by Android version chart shows the percentage of devices with a PHA installed, sorted by the Android version that the device is running.
Devices with Potentially Harmful Applications rate by top 10 countries
Overall, PHA rates in the ten largest Android markets have remained steady. While these numbers fluctuate on a quarterly basis due to the fluidity of the marketplace, we intend to provide more in depth coverage of what drove these changes in our annual
Year in Review
in Q1, 2019.
The
Devices with Potentially Harmful Applications rate by top 10 countries
chart shows the percentage of devices with at least one PHA in the ten countries with the highest volume of Android devices. India saw the most significant decline in PHAs present on devices, with the average rate of infection dropping by 34 percent. Indonesia, Mexico, and Turkey also saw a decline in the likelihood of PHAs being present on devices in the region. South Korea saw the lowest number of devices containing PHA, with only 0.1%.
Check out the report
Over time, we'll add more insights into the health of the ecosystem to the
Android Ecosystem Security Transparency Report
. If you have any questions about terminology or the products referred to in this report please review the
FAQs section of the Transparency Report
. In the meantime, check out our new
blog post
and
video
outlining Android’s performance in Gartner’s Mobile OSs and Device Security: A Comparison of Platforms report.
A New Chapter for OSS-Fuzz
November 6, 2018
Posted by Matt Ruhstaller, TPM and Oliver Chang, Software Engineer, Google Security Team
Open Source Software (OSS) is extremely important to Google, and we rely on OSS in a variety of customer-facing and internal projects. We also understand the difficulty and importance of securing the open source ecosystem, and are continuously looking for ways to simplify it.
For the OSS community, we currently provide
OSS-Fuzz
, a free continuous fuzzing infrastructure hosted on the
Google Cloud Platform
. OSS-Fuzz uncovers security vulnerabilities and stability issues, and reports them directly to developers. Since
launching
in December 2016, OSS-Fuzz has reported over
9,000
bugs directly to open source developers.
In addition to OSS-Fuzz, Google's security team maintains several internal tools for identifying bugs in both Google internal and Open Source code. Until recently, these issues were
manually reported
to various public bug trackers by our security team and then monitored until they were
resolved
. Unresolved bugs were eligible for the
Patch Rewards Program
. While this reporting process had some success, it was overly complex. Now, by unifying and automating our fuzzing tools, we have been able to consolidate our processes into a single workflow, based on OSS-Fuzz. Projects integrated with OSS-Fuzz will benefit from being reviewed by both our internal and external fuzzing tools, thereby increasing code coverage and discovering bugs faster.
We are committed to helping open source projects benefit from integrating with our OSS-Fuzz fuzzing infrastructure. In the coming weeks, we will reach out via email to critical projects that we believe would be a good fit and support the community at large. Projects that integrate are eligible for rewards ranging from $1,000 (initial integration) up to $20,000 (
ideal integration
); more details are available
here
. These rewards are intended to help offset the cost and effort required to properly configure fuzzing for OSS projects. If you would like to integrate your project with OSS-Fuzz, please submit your project for
review
.
Our goal is to admit as many OSS projects as possible and ensure that they are continuously fuzzed.
Once contacted, we might provide a sample
fuzz target
to you for easy integration. Many of these fuzz targets are generated with new technology that understands how library APIs are used appropriately. Watch this space for more details on how Google plans to further automate fuzz target creation, so that even more open source projects can benefit from continuous fuzzing.
Thank you for your continued contributions to the Open Source community. Let’s work together on a more secure and stable future for Open Source Software.
Labels
#sharethemicincyber
#supplychain #security #opensource
android
android security
android tr
app security
big data
biometrics
blackhat
C++
chrome
chrome enterprise
chrome security
connected devices
CTF
diversity
encryption
federated learning
fuzzing
Gboard
google play
google play protect
hacking
interoperability
iot security
kubernetes
linux kernel
memory safety
Open Source
pha family highlights
pixel
privacy
private compute core
Rowhammer
rust
Security
security rewards program
sigstore
spyware
supply chain
targeted spyware
tensor
Titan M2
VDP
vulnerabilities
workshop
Archive
2024
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2023
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2022
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2021
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2020
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2019
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2018
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Aug
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Sep
Aug
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
2010
Nov
Oct
Sep
Aug
Jul
May
Apr
Mar
2009
Nov
Oct
Aug
Jul
Jun
Mar
2008
Dec
Nov
Oct
Aug
Jul
May
Feb
2007
Nov
Oct
Sep
Jul
Jun
May
Feed
Follow @google
Follow
Give us feedback in our
Product Forums
.