March 26, 2024

Address Sanitizer for Bare-metal Firmware

With steady improvements to Android userspace and kernel security, we have noticed an increasing interest from security researchers directed towards lower level firmware. This area has traditionally received less scrutiny, but is critical to device security. We have previously discussed how we have been prioritizing firmware security, and how to apply mitigations in a firmware environment to mitigate unknown vulnerabilities.

In this post we will show how the Kernel Address Sanitizer (KASan) can be used to proactively discover vulnerabilities earlier in the development lifecycle. Despite the narrow application implied by its name, KASan is applicable to a wide-range of firmware targets. Using KASan enabled builds during testing and/or fuzzing can help catch memory corruption vulnerabilities and stability issues before they land on user devices. We've already used KASan in some firmware targets to proactively find and fix 40+ memory safety bugs and vulnerabilities, including some of critical severity.

Along with this blog post we are releasing a small project which demonstrates an implementation of KASan for bare-metal targets leveraging the QEMU system emulator. Readers can refer to this implementation for technical details while following the blog post.

Address Sanitizer (ASan) overview

Address sanitizer is a compiler-based instrumentation tool used to identify invalid memory access operations during runtime. It is capable of detecting the following classes of temporal and spatial memory safety bugs:

  • out-of-bounds memory access
  • use-after-free
  • double/invalid free
  • use-after-return

ASan relies on the compiler to instrument code with dynamic checks for virtual addresses used in load/store operations. A separate runtime library defines the instrumentation hooks for the heap memory and error reporting. For most user-space targets (such as aarch64-linux-android) ASan can be enabled as simply as using the -fsanitize=address compiler option for Clang due to existing support of this target both in the toolchain and in the libclang_rt runtime.

However, the situation is rather different for bare-metal code which is frequently built with the none system targets, such as arm-none-eabi. Unlike traditional user-space programs, bare-metal code running inside an embedded system often doesn’t have a common runtime implementation. As such, LLVM can’t provide a default runtime for these environments.

To provide custom implementations for the necessary runtime routines, the Clang toolchain exposes an interface for address sanitization through the -fsanitize=kernel-address compiler option. The KASan runtime routines implemented in the Linux kernel serve as a great example of how to define a KASan runtime for targets which aren’t supported by default with -fsanitize=address. We'll demonstrate how to use the version of address sanitizer originally built for the kernel on other bare-metal targets.

KASan 101

Let’s take a look at the KASan major building blocks from a high-level perspective (a thorough explanation of how ASan works under-the-hood is provided in this whitepaper).

The main idea behind KASan is that every memory access operation, such as load/store instructions and memory copy functions (for example, memmove and memcpy), are instrumented with code which performs verification of the destination/source memory regions. KASan only allows the memory access operations which use valid memory regions. When KASan detects memory access to a memory region which is invalid (that is, the memory has been already freed or access is out-of-bounds) then it reports this violation to the system.

The state of memory regions covered by KASan is maintained in a dedicated area called shadow memory. Every byte in the shadow memory corresponds to a single fixed-size memory region covered by KASan (typically 8-bytes) and encodes its state: whether the corresponding memory region has been allocated or freed and how many bytes in the memory region are accessible.

Therefore, to enable KASan for a bare-metal target we would need to implement the instrumentation routines which verify validity of memory regions in memory access operations and report KASan violations to the system. In addition we would also need to implement shadow memory management to track the state of memory regions which we want to be covered with KASan.

Enabling KASan for bare-metal firmware

KASan shadow memory

The very first step in enabling KASan for firmware is to reserve a sufficient amount of DRAM for shadow memory. This is a memory region where each byte is used by KASan to track the state of an 8-byte region. This means accommodating the shadow memory requires a dedicated memory region equal to 1/8th the size of the address space covered by KASan.

KASan maps every 8-byte aligned address from the DRAM region into the shadow memory using the following formula:

shadow_address = (target_address >> 3 ) + shadow_memory_base where target_address is the address of a 8-byte memory region which we want to cover with KASan and shadow_memory_base is the base address of the shadow memory area.

Implement a KASan runtime

Once we have the shadow memory tracking the state of every single 8-byte memory region of DRAM we need to implement the necessary runtime routines which KASan instrumentation depends on. For reference, a comprehensive list of runtime routines needed for KASan can be found in the linux/mm/kasan/kasan.h Linux kernel header. However, it might not be necessary to implement all of them and in the following text we focus on the ones which were needed to enable KASan for our target firmware as an example.

Memory access check

The routines __asan_loadXX_noabort, __asan_storeXX_noabort perform verification of memory access at runtime. The symbol XX denotes size of memory access and goes as a power of 2 starting from 1 up to 16. The toolchain instruments every memory load and store operations with these functions so that they are invoked before the memory access operation happens. These routines take as input a pointer to the target memory region to check it against the shadow memory.

If the region state provided by shadow memory doesn’t reveal a violation, then these functions return to the caller. But if any violations (for example, the memory region is accessed after it has been deallocated or there is an out-of-bounds access) are revealed, then these functions report the KASan violation by:

  • Generating a call-stack.
  • Capturing context around the memory regions.
  • Logging the error.
  • Aborting/crashing the system (optional)

Shadow memory management

The routine __asan_set_shadow_YY is used to poison shadow memory for a given address. This routine is used by the toolchain instrumentation to update the state of memory regions. For example, the KASan runtime would use this function to mark memory for local variables on the stack as accessible/poisoned in the epilogue/prologue of the function respectively.

This routine takes as input a target memory address and sets the corresponding byte in shadow memory to the value of YY. Here is an example of some YY values for shadow memory to encode state of 8-byte memory regions:

  • 0x00 -- the entire 8-byte region is accessible
  • 0x01-0x07 -- only the first bytes in the memory region are accessible
  • 0xf1 -- not accessible: stack left red zone
  • 0xf2 -- not accessible: stack mid red zone
  • 0xf3 -- not accessible: stack right red zone
  • 0xfa -- not accessible: globals red zone
  • 0xff -- not accessible

Covering global variables

The routines __asan_register_globals, __asan_unregister_globals are used to poison/unpoison memory for global variables. The KASan runtime calls these functions while processing global constructors/destructors. For instance, the routine __asan_register_globals is invoked for every global variable. It takes as an argument a pointer to a data structure which describes the target global variable: the structure provides the starting address of the variable, its size not including the red zone and size of the global variable with the red zone.

The red zone is extra padding the compiler inserts after the variable to increase the likelihood of detecting an out-of-bounds memory access. Red zones ensure there is extra space between adjacent global variables. It is the responsibility of __asan_register_globals routine to mark the corresponding shadow memory as accessible for the variable and as poisoned for the red zone.

As the readers could infer from its name, the routine __asan_unregister_globals is invoked while processing global destructors and is intended to poison shadow memory for the target global variable. As a result, any memory access to such a global will cause a KASan violation.

Memory copy functions

The KASan compiler instrumentation routines __asan_loadXX_noabort, __asan_storeXX_noabort discussed above are used to verify individual memory load and store operations such as, reading or writing an array element or dereferencing a pointer. However, these routines don't cover memory access in bulk-memory copy functions such as memcpy, memmove, and memset. In many cases these functions are provided by the runtime library or implemented in assembly to optimize for performance.

Therefore, in order to be able to catch invalid memory access in these functions, we would need to provide sanitized versions of memcpy, memmove, and memset functions in our KASan implementation which would verify memory buffers to be valid memory regions.

Avoiding false positives for noreturn functions

Another routine required by KASan is __asan_handle_no_return, to perform cleanup before a noreturn function and avoid false positives on the stack. KASan adds red zones around stack variables at the start of each function, and removes them at the end. If a function does not return normally (for example, in case of longjmp-like functions and exception handling), red zones must be removed explicitly with __asan_handle_no_return.

Hook heap memory allocation routines

Bare-metal code in the vast majority of cases provides its own heap implementation. It is our responsibility to implement an instrumented version of heap memory allocation and freeing routines which enable KASan to detect memory corruption bugs on the heap.

Essentially, we would need to instrument the memory allocator with the code which unpoisons KASan shadow memory corresponding to the allocated memory buffer. Additionally, we may want to insert an extra poisoned red zone memory (which accessing would then generate a KASan violation) to the end of the allocated buffer to increase the likelihood of catching out-of-bounds memory reads/writes.

Similarly, in the memory deallocation routine (such as free) we would need to poison the shadow memory corresponding to the free buffer so that any subsequent access (such as, use-after-free) would generate a KASan violation.

We can go even further by placing the freed memory buffer into a quarantine instead of immediately returning the free memory back to the allocator. This way, the freed memory buffer is suspended in quarantine for some time and will have its KASan shadow bytes poisoned for a longer period of time, increasing the probability of catching a use-after-free access to this buffer.

Enable KASan for heap, stack and global variables

With all the necessary building blocks implemented we are ready to enable KASan for our bare-metal code by applying the following compiler options while building the target with the LLVM toolchain.

The -fsanitize=kernel-address Clang option instructs the compiler to instrument memory load/store operations with the KASan verification routines.

We use the -asan-mapping-offset LLVM option to indicate where we want our shadow memory to be located. For instance, let’s assume that we would like to cover address range 0x40000000 - 0x4fffffff and we want to keep shadow memory at address 0x4A700000. So, we would use -mllvm -asan-mapping-offset=0x42700000 as 0x40000000 >> 3 + 0x42700000 == 0x4A700000.

To cover globals and stack variables with KASan we would need to pass additional options to the compiler: -mllvm -asan-stack=1 -mllvm -asan-globals=1. It’s worth mentioning that instrumenting both globals and stack variables will likely result in an increase in size of the corresponding memory which might need to be accounted for in the linker script.

Finally, to prevent significant increase in size of the code section due to KASan instrumentation we instruct the compiler to always outline KASan checks using the -mllvm -asan-instrumentation-with-call-threshold=0 option. Otherwise, the compiler might inline

__asan_loadXX_noabort, __asan_storeXX_noabort routines for load/store operations resulting in bloating the generated object code.

LLVM has traditionally only supported sanitizers with runtimes for specific targets with predefined runtimes, however we have upstreamed LLVM sanitizer support for bare-metal targets under the assumption that the runtime can be defined for the particular target. You’ll need the latest version of Clang to benefit from this.

Conclusion

Following these steps we managed to enable KASan for a firmware target and use it in pre-production test builds. This led to early discovery of memory corruption issues that were easily remediated due to the actionable reports produced by KASan. These builds can be used with fuzzers to detect edge case bugs that normal testing fails to trigger, yet which can have significant security implications.

Our work with KASan is just one example of the multiple techniques the Android team is exploring to further secure bare-metal firmware in the Android Platform. Ideally we want to avoid introducing memory safety vulnerabilities in the first place so we are working to address this problem through adoption of memory-safe Rust in bare-metal environments. The Android team has developed Rust training which covers bare-metal Rust extensively. We highly encourage others to explore Rust (or other memory-safe languages) as an alternative to C/C++ in their firmware.

If you have any questions, please reach out – we’re here to help!

Acknowledgements: Thank you to Roger Piqueras Jover for contributions to this post, and to Evgenii Stepanov for upstreaming LLVM support for bare-metal sanitizers. Special thanks also to our colleagues who contribute and support our firmware security efforts: Sami Tolvanen, Stephan Somogyi, Stephan Chen, Dominik Maier, Xuan Xing, Farzan Karimi, Pirama Arumuga Nainar, Stephen Hines.

No comments:

Post a Comment

You are welcome to contribute comments, but they should be relevant to the conversation. We reserve the right to remove off-topic remarks in the interest of keeping the conversation focused and engaging. Shameless self-promotion is well, shameless, and will get canned.

Note: Only a member of this blog may post a comment.