Diving Into Delta Lake: DML Internals (Update, Delete, Merge)
In previous blogs Diving Into Delta Lake: Unpacking The Transaction Log and Diving Into Delta Lake: Schema Enforcement & Evolution, we described how the Delta Lake transaction log works and the internals of schema enforcement and evolution. Delta Lake supports DML (data manipulation language) commands including DELETE, UPDATE, and MERGE. These commands simplify change data...
Fine-Grained Time Series Forecasting At Scale With Facebook Prophet And Apache Spark
Advances in time series forecasting are enabling retailers to generate more reliable demand forecasts. The challenge now is to produce these forecasts in a timely manner and at a level of granularity that allows the business to make precise adjustments to product inventories. Leveraging Apache Spark™ and Facebook Prophet, more and more enterprises facing these...
Spark + AI in Amsterdam: European Summit Recap, Keynote Videos, & Announcements
Spark + AI Summit Europe 2019 came to Amsterdam this past week! Over 2,300 data scientists, data engineers, and global business leaders from 63 different countries descended upon the RAI Amsterdam Convention Centre, for the latest community and open source developments around Apache Spark™, Delta Lake, MLflow, Koalas, and more. Check out the keynote recordings...
Diving Into Delta Lake: Schema Enforcement & Evolution
Data, like our experiences, is always evolving and accumulating. To keep up, our mental models of the world must adapt to new data, some of which contains new dimensions - new ways of seeing things we had no conception of before. These mental models are not unlike a table's schema, defining how we categorize and...
Diving Into Delta Lake: Unpacking The Transaction Log
The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this article, we’ll explore what the Delta Lake transaction log is, how it works at the file level, and how...
Productionizing Machine Learning with Delta Lake
Try out this notebook series in Databricks - part 1 (Delta Lake), part 2 (Delta Lake + ML) For many data scientists, the process of building and tuning machine learning models is only a small portion of the work they do every day. The vast majority of their time is spent doing the less-than-glamorous (but...
Efficient Databricks Deployment Automation with Terraform
Managing cloud infrastructure and provisioning resources can be a headache that DevOps engineers are all too familiar with. Even the most capable cloud admins can get bogged down with managing a bewildering number of interconnected cloud resources - including data streams, storage, compute power, and analytics tools. Take, for example, the following scenario: a customer...
Understanding Dynamic Time Warping
This blog is part 1 of our two-part series Using Dynamic Time Warping and MLflow to Detect Sales Trends. To go to part 2, go to Using Dynamic Time Warping and MLflow to Detect Sales Trends. The phrase “dynamic time warping,” at first read, might evoke images of Marty McFly driving his DeLorean at 88...
Using Dynamic Time Warping and MLflow to Detect Sales Trends
This blog is part 2 of our two-part series Using Dynamic Time Warping and MLflow to Detect Sales Trends. The phrase “dynamic time warping,” at first read, might evoke images of Marty McFly driving his DeLorean at 88 MPH in the Back to the Future series. Alas, dynamic time warping does not involve time travel; instead, it’s...