The Android platform team is committed to securing Android for every user across every device. In addition to monthly security updates to patch vulnerabilities reported to us through our Vulnerability Rewards Program (VRP), we also proactively architect Android to protect against undiscovered vulnerabilities through hardening measures such as applying compiler-based mitigations and improving sandboxing. This post focuses on the decision-making process that goes into these proactive measures: in particular, how we choose which hardening techniques to deploy and where they are deployed. As device capabilities vary widely within the Android ecosystem, these decisions must be made carefully, guided by data available to us to maximize the value to the ecosystem as a whole.
The overall approach to Android Security is multi-pronged and leverages several principles and techniques to arrive at data-guided solutions to make future exploitation more difficult. In particular, when it comes to hardening the platform, we try to answer the following questions:
By shedding some light on the process we use to choose security features for Android, we hope to provide a better understanding of Android's overall approach to protecting our users.
We use a variety of sources to determine what areas of the platform would benefit the most from different types of security mitigations. The Android Vulnerability Rewards Program (VRP) is one very informative source: all vulnerabilities submitted through this program are analyzed by our security engineers to determine the root cause of each vulnerability and its overall severity (based on these guidelines). Other sources are internal and external bug-reports, which identify vulnerable components and reveal coding practices that commonly lead to errors. Knowledge of problematic code patterns combined with the prevalence and severity of the vulnerabilities they cause can help inform decisions about which mitigations are likely to be the most beneficial.
Types of Critical and High severity vulnerabilities fixed in Android Security Bulletins in 2019
Relying purely on vulnerability reports is not sufficient as the data are inherently biased: often, security researchers flock to "hot" areas, where other researchers have already found vulnerabilities (e.g. Stagefright). Or they may focus on areas where readily-available tools make it easier to find bugs (for instance, if a security research tool is posted to Github, other researchers commonly utilize that tool to explore deeper).
To ensure that mitigation efforts are not biased only toward areas where bugs and vulnerabilities have been reported, internal Red Teams analyze less scrutinized or more complex parts of the platform. Also, continuous automated fuzzers run at-scale on both Android virtual machines and physical devices. This also ensures that bugs can be found and fixed early in the development lifecycle. Any vulnerabilities uncovered through this process are also analyzed for root cause and severity, which inform mitigation deployment decisions.
The Android VRP rewards submissions of full exploit-chains that demonstrate a full end-to-end attack. These exploit-chains, which generally utilize multiple vulnerabilities, are very informative in demonstrating techniques that attackers use to chain vulnerabilities together to accomplish their goals. Whenever a researcher submits a full exploit chain, a team of security engineers analyzes and documents the overall approach, each link in the chain, and any innovative attack strategies used. This analysis informs which exploit mitigation strategies could be employed to prevent pivoting directly from one vulnerability to another (some examples include Address Space Layout Randomization and Control-Flow Integrity) and whether the process’s attack surface could be reduced if it has unnecessary access to resources.
There are often multiple different ways to use a collection of vulnerabilities to create an exploit chain. Therefore a defense-in-depth approach is beneficial, with the goal of reducing the usefulness of some vulnerabilities and lengthening exploit chains so that successful exploitation requires more vulnerabilities. This increases the cost for an attacker to develop a full exploit chain.
Keeping up with developments in the wider security community helps us understand the current threat landscape, what techniques are currently used for exploitation, and what future trends look like. This involves but is not limited to:
All of these data sources provide feedback for the overall security hardening strategy, where new mitigations should be deployed, and what existing security mitigations should be improved.
Analyzing the data reveals areas where broader mitigations can eliminate entire classes of vulnerabilities. For instance, if parts of the platform show a large number of vulnerabilities due to integer overflow bugs, they are good candidates to enable Undefined Behavior Sanitizer (UBSan) mitigations such as the Integer Overflow Sanitizer. When common patterns in memory access vulnerabilities appear, they inform efforts to build hardened memory allocators (enabled by default in Android 11) and implement mitigations (such as CFI) against exploitation techniques that provide better resilience against memory overflows or Use-After-Free vulnerabilities.
Before discussing how the data can be used, it is important to understand how we classify our overall efforts in hardening the platform. There are a few broadly defined buckets that hardening techniques and mitigations fit into (though sometimes a particular mitigation may not fit cleanly into any single one):
With the broad arsenal of mitigation techniques available, which of these to employ and where to apply them depends on the type of problem being solved. For instance, a monolithic process that handles a lot of untrusted data and does complex parsing would be a good candidate for all of these. The media frameworks provide an excellent historical example where an architectural decomposition enabled incrementally turning on more exploit mitigations and deprivileging.
Architectural decomposition and isolation of the Media Frameworks over time
Remotely reachable attack surfaces such as NFC, Bluetooth, WiFi, and media components have historically housed the most severe vulnerabilities, and as such these components are also prioritized for hardening. These components often contain some of the most common vulnerability root causes that are reported in the VRP, and we have recently enabled sanitizers in all of them.
Libraries and processes that enforce or sit at security boundaries, such as libbinder, and widely-used core libraries such as libui, libcore, and libcutils are good targets for exploit mitigations since these are not process-specific. However, due to performance and stability sensitivities around these core libraries, mitigations need to be supported by strong evidence of their security impact.
Finally, the kernel’s high level of privilege makes it an important target for hardening as well. Because different codebases have different characteristics and functionality, susceptibility to and prevalence of certain kinds of vulnerabilities will differ. Stability and performance of mitigations here are exceptionally important to avoid negatively impacting the user experience, and some mitigations that make sense to deploy in user space may not be applicable or effective. Therefore our considerations for which hardening strategies to employ in the kernel are based on a separate analysis of the available kernel-specific data.
This data-driven approach has led to tangible and measurable results. Starting in 2015 with Stagefright, a large number of Critical severity vulnerabilities were reported in Android's media framework. These were especially sensitive because many of these vulnerabilities were remotely reachable. This led to a large architectural decomposition effort in Android Nougat, followed by additional efforts to improve our ability to patch media vulnerabilities quickly. Thanks to these changes, in 2020 we had no internet-reachable Critical severity vulnerabilities reported to us in the media frameworks.
Some of these mitigations provide more value than others, so it is important to focus engineering resources where they are most effective. This involves weighing the performance cost of each mitigation as well as how much work is required to deploy it and support it without negatively affecting device stability or user experience.
Understanding the performance impact of a mitigation is a critical step toward enabling it. Adding too much overhead to some components or the entire system can negatively impact user experience by reducing battery life and making the device less responsive. This is especially true for entry-level devices, which should benefit from hardening as well. We thus want to prioritize engineering efforts on impactful mitigations with acceptable overheads.
When investigating performance, important factors include not just CPU time but also memory increase, code size, battery life, and UI jank. These factors are especially important to consider for more constrained entry-level devices, to ensure that the mitigations perform well across the entire Android ecosystem.
The system-wide performance impact of a mitigation is also dependent on where that mitigation is enabled, as certain components are more performance-sensitive than others. For example, binder is one of the most used paths for interprocess communication, so even small additional overhead could significantly impact user experience on a device. On the other hand, video players only need to ensure that frames are rendered at the source framerate; if frames are rendered much faster than the rate at which they are displayed, additional overhead may be more acceptable.
Benchmarks, if available, can be extremely useful to evaluate the performance impact of a mitigation. If there are no benchmarks for a certain component, new ones should be created, for instance by calling impacted codec code to decode a media file. If this testing reveals unacceptable overhead, there are often a few options to address it:
Most of these improvements involve changes or contributions to the LLVM project. By working with upstream LLVM, these improvements have impact and benefit beyond Android. At the same time Android benefits from upstream improvements when others in the LLVM community make improvements as well.
There is more to consider when enabling a mitigation than its security benefit and performance cost, such as the cost of short-term deployment and long-term support.
One important issue is whether a mitigation can contain false positives. For example, if the Bounds Sanitizer produces an error, there is definitely an out-of-bounds access (although it might not be exploitable). But the Integer Overflow Sanitizer can produce false positives, as many integer overflows are harmless or even perfectly expected and correct.
It is thus important to consider the impact of a mitigation on the stability of the system. Whether a crash is due to a false positive or a legitimate security issue, it still disrupts the user experience and so is undesirable. This is another reason to carefully consider which components should have which mitigations, as crashes in some components are worse than others. If a mitigation causes a crash in a media codec, the user’s video playback will be stopped, but if netd crashes during an update, the phone could be bricked. For a mitigation like Bounds Sanitizer, where false positives are not an issue, we still need to perform extensive testing to ensure the device remains stable. Off-by-one errors, for example, may not crash during normal operation, but Bounds Sanitizer would abort execution and result in instability.
netd
Another consideration is whether it is possible to enumerate everything a mitigation might break. For example, it is not easy to contain the risk of the Integer Overflow Sanitizer without extensive testing, as it is difficult to determine which overflows are intentional/benign (and thus should be allowed) and which could lead to vulnerabilities.
We must consider not just issues caused by deploying mitigations but also how to support them long-term. This includes the developer time to integrate a mitigation into existing systems, enable and debug it, deploy it onto devices, and support it after launch. SELinux is a good example of this; it takes a significant amount of effort to write the policy for a new device, and even once enforcing mode is enabled, the policy must be supported for years as code changes and functionality is added or removed.
We try to make mitigations less disruptive and spread awareness of how they affect developers. This is done by making documentation available on source.android.com and by improving existing algorithms to reduce false positives. Making it easier to debug mitigations when something goes wrong reduces the developer maintenance burden that can accompany mitigations. For example, when developers found it difficult to identify UBSan errors, we enabled support for the UBSan Minimal Runtime by default in the Android build system. The minimal runtime itself was first upstreamed by others at Google specifically for this purpose. When the Integer Overflow Sanitizer crashes a program, that adds the following hint to the generic SIGABRT crash message:
Abort message: 'ubsan: sub-overflow'
Developers who see this message then know to enable diagnostics mode, which prints out details about the crash:
frameworks/native/services/surfaceflinger/SurfaceFlinger.cpp:2188:32: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')
Similarly, upstream SELinux provides a tool called audit2allow that can be used to suggest rules to allow blocked behaviors:
adb logcat -d | audit2allow -p policy #============= rmt ============== allow rmt kmem_device:chr_file { read write };
A debugging tool does not need to be perfect to be helpful; audit2allow does not always suggest the correct options, but for developers without detailed knowledge of SELinux it provides a strong starting point.
With every Android release, our team works hard to balance security improvements that benefit the entire ecosystem with performance and stability, drawing heavily from the data that are available to us. We hope that this sheds some light on the particular challenges involved and the overall process that leads to mitigations introduced in each Android release.
Post a Comment
No comments :
Post a Comment