Let’s face it, the landscape of different analytics services and products is complicated and constantly evolving. The Databricks and Microsoft partnership that created Azure Databricks began 4 years ago, and in that time Azure Databricks has evolved along with other Azure services like Azure Synapse. What remains constant is a great story from Databricks and Microsoft working together to enable joint customers like Unilever, Daimler and GSK to build their analytics on Azure with the best of both. It all starts with a common vision for an analytics platform.
Check out the Why the Data Lakehouse is Your Next Data Warehouse ebook to discover the inner workings of the Databricks Lakehouse Platform.
Get your data in one place
There is a universal goal within analytics teams to establish a common data source that serves every type of analytics from one place. This eliminates the primary source of frustration and complexity for analytics, namely the separated silos of data. To build that common data source, look to cloud storage for unmatched performance, scale and value as the most compelling option. If you take away nothing else from this post, remember that getting all your data into a data lake built on cloud storage like Azure Data Lake Storage (ADLS) is the best first step in your analytics journey. And there are plenty of great options, for example, Azure Data Factory, to sync or move all your data directly into ADLS.
The next important thing to remember is data lakes built on cloud storage do not natively provide all the database-like features that are commonly needed for analytics. Historically this caused a lot of pain for teams implementing a data lake using data formats like Parquet, but in the last several years we saw innovations with transaction logs and related features (e.g. indexing) for data lakes. Delta Lake is the best example, originally created by Databricks and now an open-source project managed by the Linux Foundation. To ensure data is ready for analytics, Delta Lake provides transaction support and data quality capabilities to curate data, enforce schema and ensure reliable data. The majority of data processed with Azure Databricks is already in Delta Lake, customers like Starbucks, Grab, Mars Petcare and Cerner are more examples of companies using Delta Lake to create a foundation for their data platform.
Use Azure Databricks, Azure Synapse and Power BI together
The combination of ADLS with Delta Lake is at the heart of Databricks and Microsoft’s shared vision for analytics on Azure. Key analytics services like Databricks, Synapse and Power BI are primed and ready to tap into this data in one place, making it easy to address the spectrum of analytics scenarios across BI, data science and data engineering. Azure Databricks provides the best environment for empowering data engineers and data scientists with a productive, collaborative platform and code-first data pipelines. Azure Synapse provides high performance data warehousing for low-latency, high-concurrency BI, integrated with no-code / low-code development. Both have services for analysts to perform analytics using the most common syntax for data – SQL – directly on the lake, giving users on Azure a lot to cheer about.
These services on Azure also integrate with each other to form a mesh of interconnected analytics. Azure Databricks has a built-in and highly optimized connector to Synapse that today is the most popular service connector across all of Databricks. This is no surprise as many customers like Marks & Spencer and Rockwell Automation have used Azure Databricks and Synapse together to modernize their analytics platform into the cloud for high-performance and scalability. Power BI is already part of Synapse Studio, and the new Power BI connector to Azure Databricks makes it easier and more performant to deliver great BI visualizations and reports through the same Power BI service. The combination of these services operating together on the same underlying data lake make Azure a great place for analytics.
What makes Azure Databricks special
Delivering a cloud analytics platform is hard. The historical complexities of developing analytics software already existed, and now that is married with the subtleties and differences of architecting for a cloud-scale solution. To peek under the hood on what it takes, see what Databricks co-founder and chief technologist Matei Zaharia presented on developing large-scale cloud software and the lessons learned.
What quickly becomes apparent is how much depends on great engineering collaboration with the underlying cloud infrastructure and services. This is amplified for Azure Databricks that operates at cloud scale, spinning up millions of VM hours every day and processing Exabytes of data each month. That amount of processing driven by Azure Databricks leverages the underlying Azure services for compute, storage and networking, and it would be impossible to achieve great performance without serious joint engineering work that gets into details like compute resource request protocols and network throttling.
This is a big part of what makes Azure Databricks special. As a first-party service from Microsoft, the Databricks and Azure engineering teams work together all the time, constantly enhancing the performance and scalability across dozens of dimensions, and monitoring the fleet of environments while providing mission critical support for any issues. We jointly plan new features and releases on Azure, for example we recently hosted an exclusive public preview of the new Photon engine first on Azure. This collaboration has been underway for 4 years now, with hundreds of thousands of hours put into making Databricks run really well specifically on Azure!
The big picture
Beyond the specifics for any one service or technology, there are a few tenets that stand out. First, put data into one place with data lake services on cloud storage as the best foundation. Second, make that data open and accessible to the analytics services in the ecosystem to address any use case. When new features or services become available, as always happens, this architecture is flexible and future-ready to feed data wherever it needs to go. Databricks and Microsoft have worked together for years to make analytics on Azure a compelling platform for any organization by following these tenets and constantly innovating to provide simple, effective analytics services for Azure customers!