by Greg Wood, Ranjit Kalidasan and Pratik Mankad
This post was written in collaboration with Amazon Web Services (AWS). We thank co-authors Ranjit Kalidasan, senior solutions architect, and Pratik Mankad, partner solutions architect, of AWS for their contributions.
Last week, we were excited to announce the release of AWS PrivateLink for Databricks Workspaces, now in public preview, which enables new patterns and functionalities to meet the governance and security requirements of modern cloud workloads. One pattern we’ve often been asked about is the ability to leverage custom DNS servers for Customer-managed VPC for a Databricks workspace. To provide this functionality in AWS PrivateLink-enabled Databricks workspaces, we partnered with AWS to create a scalable, repeatable architecture. In this blog, we’ll discuss how we implemented Amazon Route 53 Resolvers to enable this use case, and how you can recreate the same architecture for your own Databricks workspace.
Many enterprises configure their cloud VPCs to use their own DNS servers. They may do this because they want to limit the use of externally controlled DNS servers, and/or because they have on-prem, private domains that need to be resolved by cloud applications. In general, this is not an issue when using Databricks because our standard deployments, even with Secure Cluster Connectivity (i.e. private subnets), use domains that are resolvable by AWS.
AWS PrivateLink for Databricks interfaces, however, requires private DNS resolution in order to make connectivity to back-end and front-end interface work. If a customer configures their own DNS servers for their workspace VPC, they will not be able to resolve these VPC endpoints on their own, so connectivity between the Databricks Data and Control planes will be broken. In order to deploy Databricks with AWS PrivateLink and Custom DNS, Route 53 can be used to resolve these private DNS names in the Data Plane.
Amazon Route 53 is a highly-available and scalable cloud Domain Name System (DNS) web service. It is designed to give developers and businesses an extremely reliable and cost-effective way to route end users to Internet applications by translating names like www.example.com into the numeric IP addresses like 192.0.2.1 that computers use to connect to each other. Route53 consists of different components, such as hosted zones, policies and domains. In this blog, we focus on Route 53 Resolver Endpoints (specifically, Outbound Endpoints) and the applied Endpoint Rules.
At a high level, the architecture to create Private DNS names for an interface Amazon virtual private cloud (VPC) endpoint on the service consumer side is shown below:
Route53 in this case provides an outbound resolver endpoint. This essentially provides a way of resolving local, private domains with Route 53, and using the custom DNS for any remaining, unresolved domains. Technically, this architecture consists of Route 53 outbound resolver endpoints deployed in the DNS Server VPC, and Route 53 Resolver Rules that tell the service how and where to resolve domains. For more information on how Route 53 Private Hosted Zone entries are resolved by AWS, please see the documentation and user guide. For more information, refer to Private DNS for Interface Endpoints and Working with Private Hosted Zones. Note that this works similarly in the case where a DNS server is hosted on-prem. In this case, the VPC in which Outbound Resolvers are deployed should be the same VPC that is hosting the Direct Connect endpoint to your on-prem data center.
Below, we walk through the steps for setting up a Route53 Outbound Resolver with the appropriate rules. We assume that a AWS PrivateLink-enabled Databricks workspace is already deployed and running.
%sh dig region.privatelink.cloud.databricks.com
Where region
will change depending on the region you are in. For us-east-1, this will be nvirginia
. This command should return something similar to the following: