How to Save up to 50% on Azure ETL While Improving Data Quality
The challenges of data quality One of the most common issues our customers face is maintaining high data quality standards, especially as they rapidly increase the volume of data they process, analyze and publish. Data validation, data transformation and de-identification can be complex and time-consuming. As data volumes grow, new downstream use cases and applications...
Leveling the Playing Field: HorovodRunner for Distributed Deep Learning Training
This is a guest post authored by Sr. Staff Data Scientist/User Experience Researcher Jing Pan and Senior Data Scientist Wendao Liu of leading health insurance marketplace eHealth. None generates Taichi; Taichi generates two complementary forces; Two complementary forces generate four aggregates; Four aggregates generate eight trigrams; Eight trigrams determine myriads of phenomena. —Classic of Changes...
Data Access Governance and 3 Signs You Need it
This is a guest authored post by Heather Devane, content marketing manager, Immuta. Cloud data analytics is only as powerful as the ability to access that data for use. Yet, the data stewards responsible for managing data governance often find themselves in a holding pattern, waiting for approval from various stakeholders to operationalize data assets...
Over 200K Enrolled in Databricks’ Certification and Training
More than 200,000 individuals have participated in Databricks' certification and training over the past four years, including thousands of partners. In the past year alone, over 75,000 individuals have been trained and over 1,500 customers and partners have also earned their Databricks Academy Certifications. Today, we are pleased to announce new digital badges so you...
Lakehouse Architecture Realized: Enabling Data Teams With Faster, Cheaper and More Reliable Open Architectures
Databricks was founded under the vision of using data to solve the world’s toughest problems. We started by building upon our open source roots in Apache Spark™ and creating a thriving collection of projects, including Delta Lake, MLflow, Koalas and more. We’ve now built a company with over 1,500 employees helping thousands of data teams...
A Step-by-step Guide for Debugging Memory Leaks in Spark Applications
This is a guest authored post by Shivansh Srivastava, software engineer, Disney Streaming Services. It was originally published on Medium.com Just a bit of context We at Disney Streaming Services use Apache Spark across the business and Spark Structured Streaming to develop our pipelines. These applications run on the Databricks Runtime(DBR) environment which is quite...
Top Questions from Our Lakehouse Event
We recently held a virtual event, featuring CEO Ali Ghodsi, that showcased the vision of Lakehouse architecture and how Databricks helps customers make it a reality. Lakehouse is a data platform architecture that implements similar data structures and data management features to those in a data warehouse directly on the low-cost, flexible storage used for...
Handling Late Arriving Dimensions Using a Reconciliation Pattern
This is a guest community post authored by Chaitanya Chandurkar, Senior Software Engineer in the Analytics and Reporting team at McGraw Hill Education. Special thanks to MHE Analytics team members Nick Afshartous, Principal Engineer; Kapil Shrivastava, Engineering Manager; and Steve Stalzer, VP of Engineering / Analytics and Data Science, for their contributions. Processing facts and...
Learn How Disney+ Built Their Streaming Data Analytics Platform With Databricks and AWS to Improve the Customer Experience
Martin Zapletal, Software Engineering Director at Disney+, is presenting at re:Invent 2020 with the session How Disney+ uses fast data ubiquity to improve the customer experience (must be registered to watch but registration is free!). In this breakout session, Martin will showcase Disney+’s architecture using Databricks on AWS for processing and analyzing millions of real-time...
Databricks Is Named a Visionary in the 2020 Gartner Magic Quadrant for Cloud Database Management Systems (DBMS)
Last week, Gartner published the Magic Quadrant (MQ) for Cloud Database Management Systems, where Databricks was recognized as a Visionary in the market.1 This was the first time Databricks was included in a database-related Gartner Magic Quadrant. We believe this is due in large part to our investment in Delta Lake and its ability to...