July 19, 2022

DNS-over-HTTP/3 in Android

Posted by Matthew Maurer and Mike Yu, Android team

To help keep Android users’ DNS queries private, Android supports encrypted DNS. In addition to existing support for DNS-over-TLS, Android now supports DNS-over-HTTP/3 which has a number of improvements over DNS-over-TLS.

Most network connections begin with a DNS lookup. While transport security may be applied to the connection itself, that DNS lookup has traditionally not been private by default: the base DNS protocol is raw UDP with no encryption. While the internet has migrated to TLS over time, DNS has a bootstrapping problem. Certificate verification relies on the domain of the other party, which requires either DNS itself, or moves the problem to DHCP (which may be maliciously controlled). This issue is mitigated by central resolvers like Google, Cloudflare, OpenDNS and Quad9, which allow devices to configure a single DNS resolver locally for every network, overriding what is offered through DHCP.

In Android 9.0, we announced the Private DNS feature, which uses DNS-over-TLS (DoT) to protect DNS queries when enabled and supported by the server. Unfortunately, DoT incurs overhead for every DNS request. An alternative encrypted DNS protocol, DNS-over-HTTPS (DoH), is rapidly gaining traction within the industry as DoH has already been deployed by most public DNS operators, including the Cloudflare Resolver and Google Public DNS. While using HTTPS alone will not reduce the overhead significantly, HTTP/3 uses QUIC, a transport that efficiently multiplexes multiple streams over UDP using a single TLS session with session resumption. All of these features are crucial to efficient operation on mobile devices.

DNS-over-HTTP/3 (DoH3) support was released as part of a Google Play system update, so by the time you’re reading this, Android devices from Android 11 onwards1 will use DoH3 instead of DoT for well-known2 DNS servers which support it. Which DNS service you are using is unaffected by this change; only the transport will be upgraded. In the future, we aim to support DDR which will allow us to dynamically select the correct configuration for any server. This feature should decrease the performance impact of encrypted DNS.

Performance

DNS-over-HTTP/3 avoids several problems that can occur with DNS-over-TLS operation:

  • As DoT operates on a single stream of requests and responses, many server implementations suffer from head-of-line blocking3. This means that if the request at the front of the line takes a while to resolve (possibly because a recursive resolution is necessary), responses for subsequent requests that would have otherwise been resolved quickly are blocked waiting on that first request. DoH3 by comparison runs each request over a separate logical stream, which means implementations will resolve requests out-of-order by default.
  • Mobile devices change networks frequently as the user moves around. With DoT, these events require a full renegotiation of the connection. By contrast, the QUIC transport HTTP/3 is based on can resume a suspended connection in a single RTT.
  • DoT intends for many queries to use the same connection to amortize the cost of TCP and TLS handshakes at the start. Unfortunately, in practice several factors (such as network disconnects or server TCP connection management) make these connections less long-lived than we might like. Once a connection is closed, establishing the connection again requires at least 1 RTT.

    In unreliable networks, DoH3 may even outperform traditional DNS. While unintuitive, this is because the flow control mechanisms in QUIC can alert either party that packets weren’t received. In traditional DNS, the timeout for a query needs to be based on expected time for the entire query, not just for the resolver to receive the packet.

Field measurements during the initial limited rollout of this feature show that DoH3 significantly improves on DoT’s performance. For successful queries, our studies showed that replacing DoT with DoH3 reduces median query time by 24%, and 95th percentile query time by 44%. While it might seem suspect that the reported data is conditioned on successful queries, both DoT and DoH3 resolve 97% of queries successfully, so their metrics are directly comparable. UDP resolves only 83% of queries successfully. As a result, UDP latency is not directly comparable to TLS/HTTP3 latency because non-connection-oriented protocols have a different notion of what a "query" is. We have still included it for rough comparison.

Memory Safety

The DNS resolver processes input that could potentially be controlled by an attacker, both from the network and from apps on the device. To reduce the risk of security vulnerabilities, we chose to use a memory safe language for the implementation.

Fortunately, we’ve been adding Rust support to the Android platform. This effort is intended exactly for cases like this — system level features which need to be performant or low level (both in this case) and which would carry risk to implement in C++. While we’ve previously launched Keystore 2.0, this represents our first foray into Rust in Mainline Modules. Cloudflare maintains an HTTP/3 library called quiche, which fits our use case well, as it has a memory-safe implementation, few dependencies, and a small code size. Quiche also supports use directly from C++. We considered this, but even the request dispatching service had sufficient complexity that we chose to implement that portion in Rust as well.

We built the query engine using the Tokio async framework to simultaneously handle new requests, incoming packet events, control signals, and timers. In C++, this would likely have required multiple threads or a carefully crafted event loop. By leveraging asynchronous in Rust, this occurs on a single thread with minimal locking4. The DoH3 implementation is 1,640 lines and uses a single runtime thread. By comparison, DoT takes 1,680 lines while managing less and using up to 4 threads per DoT server in use.

Safety and Performance — Together at Last

With the introduction of Rust, we are able to improve both security and the performance at the same time. Likewise, QUIC allows us to improve network performance and privacy simultaneously. Finally, Mainline ensures that such improvements are able to make their way to more Android users sooner.

Acknowledgements

Special thanks to Luke Huang who greatly contributed to the development of this feature, and Lorenzo Colitti for his in-depth review of the technical aspects of this post.


  1. Some Android 10 devices which adopted Google Play system updates early will also receive this feature. 

  2. Google DNS and Cloudflare DNS at launch, others may be added in the future. 

  3. DoT can be implemented in a way that avoids this problem, as the client must accept server responses out of order. However, in practice most servers do not implement this reordering. 

  4. There is a lock used for the SSL context which is accessed once per DNS server, and another on the FFI when issuing a request. The FFI lock could be removed with changes to the C++ side, but has remained because it is low contention. 

No comments:

Post a Comment

You are welcome to contribute comments, but they should be relevant to the conversation. We reserve the right to remove off-topic remarks in the interest of keeping the conversation focused and engaging. Shameless self-promotion is well, shameless, and will get canned.

Note: Only a member of this blog may post a comment.