It's been almost five years since we released the first of multiple BeyondCorp papers, describing the motivation and design principles that eliminated network-based trust from our internal networks. With that anniversary looming and many organizations actively working to adopt models like BeyondCorp (which has also become known as Zero Trust in the industry), we thought it would be a good time to revisit topics we have previously explored in those papers, share the lessons that we have learned over the years, and describe where BeyondCorp is going as businesses move to the cloud.
This is the first post in a series that will focus on Google’s internal implementation of BeyondCorp, providing necessary context for how Google adopted BeyondCorp.
Why did we adopt BeyondCorp?With a traditional enterprise perimeter security model, access to services and resources is provided by a device being connected to a privileged network. If an employee is in a corporate office, on the right network, services are directly accessible. If they're outside the office, at home or in a coffee shop, they frequently use a VPN to get access to services behind the enterprise firewall. This is the way most organizations protect themselves.
By 2011, it became clear to Google that this model was problematic, and we needed to rethink how enterprise services are accessed and protected for the following reasons:
- A growing number of employees were not in the office at all times. They were working from home, a coffee shop, a hotel or even on a bus or airplane. When they were outside the office, they needed to connect via a VPN, creating friction and extending the network perimeter.
- The user experience of a VPN client may be acceptable, even if suboptimal, from a laptop. Use of VPN is less acceptable, from both employees and admins perspectives, when considering growing use of devices such as smartphones and tablets to perform work.
- A number of users were contractors or other partners who only needed selective access to some of our internal resources, even though they were working in the office.
- The expanded use of public clouds and software-as-a-service (SaaS) apps meant that some of our corporate services were no longer deployed on-premises, further blurring the traditional perimeter and trust domain. This introduced new attack vectors that needed to be protected against.
- There was ongoing concern about relying solely on perimeter defense, especially when the perimeter was growing consistently. With the proliferation of laptops and mobile devices, vulnerable and compromised devices were regularly brought within the perimeter.
- Finally, if a vulnerability was observed or an attack did happen, we wanted the ability to respond as quickly and automatically as possible.
How did we do it?In order to address these challenges, we implemented a new approach that we called BeyondCorp. Our mission was to have every Google employee work successfully from untrusted networks on a variety of devices without using a client-side VPN. BeyondCorp has three core principles:
- Connecting from a particular network does not determine which service you can access.
- Access to services is granted based on what the infrastructure knows about you and your device.
- All access to services must be authenticated, authorized and encrypted for every request (not just the initial access).
High level architecture for BeyondCorp
What lessons did we learn?Given this was uncharted territory at the time, we had to learn quickly and adapt when we encountered surprises. Here are some key lessons we learned.
Obtain executive support early on and keep itMoving to BeyondCorp is not a quick, painless exercise. It took us several years just to get most of the basics in place, and to this day we are still continuing to improve and refine our implementation. Before embarking on this journey to implement BeyondCorp, we got buy in from leadership very early in the project. With a mandate, you can ask for support from lots of different groups along the way.
We make a point to re-validate this buy-in on an ongoing basis, ensuring that the business still understands and values this important shift.
Recognize data quality challenges from the very beginningAccess decisions depend on the quality of your input data. More specifically, it depends on trust analysis, which requires a combination of employee and device data.
If this data is unreliable, the result will be incorrect access decisions, suboptimal user experiences and, in the worst case, an increase in system vulnerability, so the stakes are definitely high.
We put in a lot of work to make sure our data is clean and reliable before making any impactful changes, and we have both workflows and technical measures in place to ensure data quality remains high going forward.
Enable painless migration and usageThe migration should be a zero-touch or invisible experience for your employees, making it easy for them to continue working without interruptions or added steps. If you make it difficult for your employees to migrate or maintain productivity, they might feel frustrated by the process. Complex environments are difficult to fully migrate with initial solutions, so be prepared to review, grant and manage exceptions at least in the early stages. With this in mind, start small, migrate a small number of resources, apps, users and devices, and only increase coverage after confirming the solution is reliable.
Assign employee and helpdesk advocatesWe also had employee and helpdesk advocates on the team who represented the user experience from those perspectives. This helped us architect our implementation in a way that avoided putting excess burden on employees or technical support staff.
Clear employee communicationsCommunicating clearly with employees so that they know what is happening is very important. We sent our employees, partners, and company leaders regular communications whenever we made important changes, ensuring motivations were well understood and there was a window for feedback and iteration prior to enforcement changes.
Run highly reliable systemsSince every request goes through the core BeyondCorp infrastructure, we needed a global, highly reliable and resilient set of services. If these services are degraded, employee productivity suffers.
We used Site Reliability Engineering (SRE) principles to run our BeyondCorp services.
Next timeIn the next post in this series, we will go deeper into when you should trust a device, what data you should use to determine whether or not a device should be trusted, and what we have learned by going through that process.
In the meantime, if you want to learn more, you can check out the BeyondCorp research papers. In addition, getting started with BeyondCorp is now easier using zero trust solutions from Google Cloud (context-aware access) and other enterprise providers.
This post was updated on July 3 to include Justin McWilliams as an author.