Now generally available an upgraded platform architecture that adds customer-managed IP access lists, customer managed-VPC, Account API, multiple workspaces per account, cluster level policies, IAM credential passthrough, and more.
We are excited to announce the general availability of a major upgrade to the Databricks Unified Data Analytics Platform on Amazon Web Services (AWS) that adds more security, scalability, and simpler management with features like IP Access List, Customer Managed VPC, Cluster level policies, and much more.
Data leaders are tasked to create data-driven business value by ensuring secure and manageable access of all their data for all of their users. It’s extremely difficult to do this without building a data analytics and machine learning (ML) platform that provides strong data security and governance, with simplified management of users and data initiatives, and high reliability and performance, so data teams can trust it to run business critical workloads at scale.
In this blog, you will learn about the exciting new features that are unleashing data teams to innovate faster by doing more experimentation and get business critical workloads into production at scale while enforcing the right security and governance. You’ll also learn how customers like Wejo and Expedia are starting to benefit from this major platform upgrade.
Comprehensive platform security
Enterprises need to balance data democratization with enterprise data security and governance. As the data grows, enterprises default to a defensive lock down of all data. This limits innovation and the ability to use the data to create new insights, new data products, and improve operations. To help make data accessible while enforcing the right security controls and governance, Databricks enables you with capabilities to help you securely generate value out of your data using:
- IP Access List – Databricks workspaces can be configured so that employees connect to the service only through existing corporate networks with a secure perimeter. Databricks customers can use the IP access lists feature to define a set of approved IP addresses. All incoming access to the Web application and REST APIs requires the user connect from an authorized IP address or VPN.
- Customer-managed VPC – Deploy Databricks data plane in your own enterprise-managed VPC, in order to do necessary customizations as required by your cloud engineering & security teams.
- Secure Cluster Connectivity – Databricks establishes secure connectivity between the scalable control plane and the clusters in your private VPC data plane. We don’t need a single Public IP in your cluster infrastructure to interact with the control plane.
- Customer-managed Keys for Notebooks – You can now choose to use your own AWS KMS key to encrypt those notebooks in your data plane. Databricks stores customer notebooks in the scalable control plane so as to provide a slick and fast user experience via the web interface.
- IAM Credential Passthrough – Access S3 buckets and other IAM-enabled AWS data services using the identity that you use to login into Databricks, either with SAML 2.0 Federation or SCIM.
The new capabilities are already enabling Wejo, the global leader in connected car data, to build a new connected car data platform-as-a-service. “Having the ability to effectively and efficiently digest, process, and extract value from over 15M active connected cars delivering over 2 trillion data points, is critical to our success at Wejo,” said Daniel Tibble, Head of Analytics at Wejo. “With Databricks, we are building a rich connected car data platform-as-a-service to enable our customers and partners, from global data providers to city traffic planning commissions, that simplifies analyzing and running machine learning workloads on all of our connected car data. Databricks platform is enabling our customers’ data teams to work in a more collaborative, more secure, and highly scalable solution without the need to invest in their own infrastructure.”
With the proliferation of siloed tools to do data analytics or machine learning, IT Administrators are bogged down with managing an ever growing complex infrastructure. Databricks is helping IT teams easily manage users, costs, and a single unified platform for analytics and ML with full control. With a consistent experience across clouds, you can now deliver cloud-native data environments with:
- Create on-demand data analytics workspaces in minutes – Setting up a workspace and the infrastructure for a new project can take months in some cases – the multi-workspace feature brings this down to minutes. Get a new project and team up and running with a few API calls while implementing existing policies and configuration. If you use Terraform, you could also utilize the Databricks Terraform Resource Provider to bootstrap and operate a workspace.
- Trust But Verify with Databricks – Get visibility into relevant cloud platform activity in terms of who’s doing what and when, by configuring Databricks Audit Logs and other related audit logs in AWS. See how you could process the Databricks Audit Logs for continuous monitoring.
- Cluster Policies – Implement cluster policies across multiple workspaces to make cluster creation interface relevant for different data personas, and to enforce different security and cost controls.
Your data teams can now use fully-configured data environments and API’s to quickly take initiatives from development to production, reducing the complexity and inefficiencies of manual processes that can add months to data initiatives. Once in production, they can use on-demand autoscaling to optimize performance and reduce down time of data pipelines and ML models by efficiently matching resources to demand. Exciting new features enabling this include:
- Productionize and Automate Your Data Platform at Scale – Create fully configured data environments and bootstrap them with users / groups, cluster policies, clusters, notebooks, object permissions etc. all through APIs.
- CI/CD for your Data Workloads – Streamline your application development and deployment process with integration to DevOps tools like Jenkins, Azure DevOps, CircleCI etc. Use REST API 2.0 under the hood to deploy your application artifacts and provision workspace-level objects.
- Databricks Pools – Enable clusters to start and scale faster by creating a managed cache of virtual machine instances that can be acquired for use when needed.
Customers like Expedia.com are using Databricks to engage with their customers in a whole new way. “At Expedia we are future proofing the way we think about and engage with our customers to provide more personalized, seamless, and stellar experiences across our platforms as they plan their next big adventure,” says Ashin Moodithaya, Director of Technical Product Management at Expedia. “We are expanding the way we use Databricks to now include Expedia.com. The ease of use, simple configuration, and collaborative environment of Databricks Unified Data Analytics Platform, will drastically improve our marketing data science teams productivity by using relevant data from across the enterprise in a secure and compliant manner to reimagine the customer experience across Expedia.com and our partner platforms.”
Unleash your data teams potential
Databricks Unified Data Analytics Platform is the highly secure, scalable, simple to manage, data analytics and machine learning platform enabling all your data teams to solve your toughest data problems. Securely democratize all your data to enable your data teams to extract insights, build new data products, and introduce new data-driven operational efficiencies. Get your data teams creating new value within minutes while maintaining control across workspaces, clusters, and users. Do more data analytics and machine learning, faster, securely, and at scale.
The new features for the Unified Data Analytics Platform on AWS are now available in the following AWS Regions (us-west-1, us-west-2, us-east-1, us-east-2, ca-central-1, eu-west-1, and eu-central-1 ). Learn more about how we are enabling you with comprehensive platform security, elastic scalability, and 360° administration for all your data analytics and machine learning needs.