Automate Azure Databricks Platform Provisioning and Configuration
Table of Contents Introduction Automation options Common workflow Pre-Requisites Create Azure Resource Group and Virtual Network Provision Azure Application / Service Principal Assign Role to Service Principal Configure Postman Environment Provision Azure Databricks Workspace Generate AAD Access Token Deploy Workspace using the ARM template Get workspace URL Generate Access Token for Auth Generate AAD Access...
Enterprise Cloud Service Public Preview on AWS
At Databricks, we have had the opportunity to collaborate with companies that have transformed the way people live. Some of our customers have developed life saving drugs, delivered industry-first user experiences, as well as provided edge-of-the-seat entertainment (so needed during shelter in place). These companies transformed their business by building efficiencies in how they operate,...
Azure Databricks Security Best Practices
Azure Databricks is a Unified Data Analytics Platform that is a part of the Microsoft Azure Cloud. Built upon the foundations of Delta Lake, MLflow, Koalas and Apache SparkTM, Azure Databricks is a first party PaaS on Microsoft Azure cloud that provides one-click setup, native integrations with other Azure cloud services, interactive workspace, and enterprise-grade...
Data Exfiltration Protection with Azure Databricks
In the previous blog, we discussed how to securely access Azure Data Services from Azure Databricks using Virtual Network Service Endpoints or Private Link. Given a baseline of those best practices, in this article we walkthrough detailed steps on how to harden your Azure Databricks deployment from a network security perspective in order to prevent...
Trust but Verify with Databricks
As enterprises modernize their data infrastructure to make data-driven decisions, teams across the organization become consumers of that platform. The data workloads grow exponentially, where cloud data lake becomes the centralized storage for enterprise-wide functions and different tools & technologies are used to gain insights out of it. For cloud security teams, the addition of...
Securely Accessing Azure Data Sources from Azure Databricks
Azure Databricks is a Unified Data Analytics Platform that is a part of the Microsoft Azure Cloud. Built upon the foundations of Delta Lake, MLFlow , Koalas and Apache Spark, Azure Databricks is a first party service on Microsoft Azure cloud that provides one-click setup, native integrations with other Azure services, interactive workspace, and enterprise-grade...
Azure Databricks Highlights Adoption of Delta Lake, MLflow, and Integration with Azure Machine Learning at Microsoft Ignite 2019
At Microsoft Ignite 2019, thousands of attendees participated in hands-on workshops, breakout sessions, and theater presentations to learn how customers are achieving phenomenal results with Azure Databricks! It was an action-packed week of making new connections and learning about new innovation across data science, data engineering, and business analytics. We shared the news that over...
Simplify Data Lake Access with Azure AD Credential Passthrough
Azure Databricks brings together the best of the Apache Spark, Delta Lake, an Azure cloud. The close partnership provides integrations with Azure services, including Azure’s cloud-based role-based access control, Azure Active Directory(AAD), and Azure’s cloud storage Azure Data Lake Storage (ADLS). Even with these close integrations, data access control continues to prove a challenge for...
Azure Databricks – Bring Your Own VNET
Azure Databricks Unified Analytics Platform is the result of a joint product/engineering effort between Databricks and Microsoft. It’s available as a managed first-party service on Azure Public Cloud. Along with one-click setup (manual/automated), managed clusters (including Delta), and collaborative workspaces, the platform has native integration with other Azure first-party services, such as Azure Blob Storage,...
Analyze Games from European Soccer Leagues with Apache Spark and Databricks
Introduction The global sports market is huge, comprised of players, teams, leagues, fan clubs, sponsors, etc., and all of these entities interact in myriad ways generating an enormous amount of data. Some of that data is used internally to help make better decisions, and there are a number of use cases within the media industry...