Upgrade Production Workloads to Be Safer, Easier, and Faster With Databricks Runtime 7.3 LTS
What a difference a year makes. One year ago, Databricks Runtime version (DBR) 6.4 was released -- followed by 8 more DBR releases. But now it’s time to plan for an upgrade to 7.3 for Long-Term Support (LTS) and compatibility, as support for DBR 6.4 will end on April 1, 2021. (Note that a new...
Top 5 Reasons to Convert Your Cloud Data Lake to a Delta Lake
If you examine the agenda for any of the Spark Summits in the past five years, you will notice that there is no shortage of talks on how best to architect a data lake in the cloud using Apache Spark™ as the ETL and query engine and Apache Parquet as the preferred file format. There...
Building Complex Data Pipelines with Unified Analytics Platform
Introduction Big data practitioners often post recurring questions on Quora: What is data engineering? How to become a data scientist? What’s a data analyst? Apart from understanding these roles and respective responsibilities, more important questions to pose are: How can three different personas, three different experiences, and three different requirements collaborate and combine their efforts?...
Managing and Securing Credentials in Databricks for Apache Spark Jobs
Since Apache Spark separates compute from storage, every Spark Job requires a set of credentials to connect to disparate data sources. Storing those credentials in the clear can be a security risk if not stringently administered. To mitigate that risk, Databricks makes it easy and secure to connect to S3 with either Access Keys via...
Apache Spark Scala Library Development with Databricks
The movie Toy Story was released in 1995 by Pixar as the first feature-length computer animated film. Even though the animators had professional workstations to work with, they started sketching out the story by hand. A practice that is still followed today for all of Pixar’s films. Whenever you’re developing an Apache Spark application, sometimes...