Skip to main content
Page 1
Company blog

A Guide to Data + AI Summit Sessions: Machine Learning, Data Engineering, Apache Spark and More

April 26, 2021 by Jules Damji in Company Blog
We are only a few weeks away from Data + AI Summit , returning May 24–28. If you haven’t signed up yet, take...
Engineering blog

7 Reasons to Learn PyTorch on Databricks

April 14, 2021 by Jules Damji in Engineering Blog
What expedites the process of learning new concepts, languages or systems? When learning a new task, do you look for analogs from skills...
Engineering blog

MLflow Model Registry on Databricks Simplifies MLOps With CI/CD Features

MLflow helps organizations manage the ML lifecycle through the ability to track experiment metrics, parameters, and artifacts, as well as deploy models to...
Engineering blog

MLflow 1.12 Features Extended PyTorch Integration

MLflow 1.12 features include extended PyTorch integration, SHAP model explainability, autologging MLflow entities for supported model flavors , and a number of UI...
Company blog

A Guide to MLflow Talks at Data + AI Summit Europe 2020

November 5, 2020 by Jules Damji in Company Blog
In the last two years since its release, MLflow has seen a rapid adoption among enterprises and the data science community. With over...
Engineering blog

Ten Simple Databricks Notebook Tips & Tricks for Data Scientists

October 29, 2020 by Jules Damji in Engineering Blog
Often, small things make a huge difference, hence the adage that "some of the best ideas are simple!" Over the course of a...
Company blog

Data + AI Summit Europe Goes Virtual With a Data-Centric Agenda

October 4, 2020 by Ben Lorica, Jules Damji and Jen Aman in Company Blog
Technical conferences evolve over time. They expand beyond their initial focus, adding new technologies, attracting new attendees and broadening their range of sessions...
Company blog

Spark + AI Summit Reflections

July 15, 2020 by Jules Damji in Company Blog
Developers attending a conference have high expectations: what knowledge gaps they’ll fill; what innovative ideas or inspirational thoughts they’ll take away; who to...
Company blog

Databricks Extends MLflow Model Registry with Enterprise Features

We are excited to announce new enterprise grade features for the MLflow Model Registry on Databricks. The Model Registry is now enabled by...
Company blog

Spark + AI Summit Is Going Virtual with an Expanded Agenda

March 24, 2020 by Ben Lorica and Jules Damji in Company Blog
Over the years, technical conferences tend to expand beyond their initial focus, adding new technologies, types of attendees, and a broader range of...
Engineering blog

How to Display Model Metrics in Dashboards using the MLflow Search API

February 18, 2020 by Avesh Singh, Jules Damji and Max Allen in Engineering Blog
Machine learning engineers and data scientists frequently train models to optimize a loss function. With optimization methods like gradient descent, we iteratively improve...
Engineering blog

Managed MLflow Now Available on Databricks Community Edition

In February 2016, we introduced Databricks Community Edition , a free edition for big data developers to learn and get started quickly with...
Company blog

A Guide to Training Sessions at Spark + AI Summit, Europe

September 30, 2019 by Jules Damji and Taggart McCurdy in Company Blog
Education and the pursuit of knowledge are lifelong journeys: they never complete; there is always something new to learn; a new professional certification...
Company blog

A Guide to Developer, Deep Dive, and Apache Spark Tutorial Talks at Spark + AI Summit, Europe

September 5, 2019 by Jules Damji in Company Blog
You might have heard the famous saying, “Why software is eating the world .” But if software is eating the world, you may...
Engineering blog

MLflow v0.9.0 Features SQL Backend, Projects in Docker, and Customization in Python Models

March 28, 2019 by Sue Ann Hong and Jules Damji in Engineering Blog
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Company blog

A Guide to Developer, Deep Dive, and Continuous Streaming Applications Talks at Spark + AI Summit

February 19, 2019 by Jules Damji in Company Blog
In January 2013 when Stephen O’Grady, an analyst at RedMonk , published “The New Kingmakers: How Developers Conquered the World ,” the book’s...
Company blog

A Guide to AI, Machine Learning, and Deep Learning Talks at Spark + AI 2019

December 18, 2018 by Jules Damji in Company Blog
To a good degree, this back-of-the-envelope flowchart, by Karen Hao of MIT Technology Review , charts to elucidate what constitutes the use of...
Company blog

Preliminary Agenda Announced for Spark + AI Summit 2019

November 27, 2018 by Jules Damji and Jen Aman in Company Blog
As part of the organizing committee and program chairs for this summit, we are delighted to share two achievements with you. First, we...
Engineering blog

MLflow v0.8.0 Features Improved Experiment UI and Deployment Tools

November 21, 2018 by Aaron Davidson and Jules Damji in Engineering Blog
Last week we released MLflow v0.8.0 with multiple new features, including improved UI experience and support for deploying models directly via Docker containers...
Engineering blog

Introducing HorovodRunner for Distributed Deep Learning Training

Today, we are excited to introduce HorovodRunner in our Databricks Runtime 5.0 ML ! HorovodRunner provides a simple way to scale up your...
Company blog

MLflow v0.7.0 Features New R API by RStudio

Today, we’re excited to announce MLflow v0.7.0 , released with new features, including a new MLflow R client API contributed by RStudio...
Engineering blog

How to Use MLflow To Reproduce Results and Retrain Saved Keras ML Models

September 21, 2018 by Jules Damji in Engineering Blog
In part 2 of our series on MLflow blogs, we demonstrated how to use MLflow to track experiment results for a Keras network...
Engineering blog

New Features in MLflow v0.6.0

September 13, 2018 by Aaron Davidson and Jules Damji in Engineering Blog
Today, we’re excited to announce MLflow v0.6.0 , released early in the week with new features. Now available on PyPI and Maven...
Company blog

A Guide to Apache Spark Use Cases, Streaming, and Research Talks at Spark + AI Summit Europe

September 5, 2018 by Jules Damji in Company Blog
For much of Apache Spark’s history, its capacity to process data at scale and capability to unify disparate workloads has led Spark developers...
Engineering blog

How to Use MLflow to Experiment a Keras Network Model: Binary Classification for Movie Reviews

August 23, 2018 by Jules Damji in Engineering Blog
In the last blog post , we demonstrated the ease with which you can get started with MLflow , an open-source platform to...
Engineering blog

New Features in MLflow v0.5.2 Release

August 21, 2018 by Aaron Davidson and Jules Damji in Engineering Blog
Today, we’re excited to announce MLflow v0.5.0, MLflow v0.5.1, and MLflow v0.5.2, which were released last week with some new features. MLflow 0.5.2...
Company blog

A Guide to Data Science, Developer, and Deep Dive Talks at Spark + AI Summit Europe

August 7, 2018 by Jules Damji in Company Blog
In October 2012, Harvard Business Review put a spotlight on the data science career with a dedicated issue and a catchy claim: Data...
Company blog

Bay Area Apache Spark Meetup Summary @ Databricks HQ

July 25, 2018 by Jules Damji in Company Blog
On July 19, we held our monthly Bay Area Spark Meetup (BASM) at Databricks, HQ in San Francisco. At the Spark + AI...
Company blog

MLflow v0.3.0 Released

July 24, 2018 by Aaron Davidson and Jules Damji in Company Blog
Today, we’re excited to announce MLflow v0.3.0, which we released last week with some of the requested features from internal clients and open...
Company blog

A Guide to AI, Machine Learning, and Deep Learning Talks at Spark + AI Summit Europe

July 23, 2018 by Jules Damji in Company Blog
Within a couple of years of its release as an open-source machine learning and deep learning framework, TensorFlow has seen an amazing rate...
Engineering blog

How to Use MLflow, TensorFlow, and Keras with PyCharm

July 10, 2018 by Jules Damji in Engineering Blog
At Data + AI Summit in June, we announced MLflow , an open-source platform for the complete machine learning cycle. The platform’s philosophy...
Company blog

Spark + AI Summit Europe Agenda Announced

June 21, 2018 by Jules Damji in Company Blog
London, as a financial center and cosmopolitan city, has its historical charm, cultural draw, and technical allure for everyone, whether you are an...
Company blog

A Guide to Developer, Apache Spark Use Cases, and Deep Dives Talks at Spark + AI Summit

May 23, 2018 by Jules Damji in Company Blog
Apache Spark is tackling new frontiers through innovations by unifying new workloads. This enables developers to combine data and AI to develop intelligent...
Company blog

A Guide to AI, Machine Learning, and Data Science Talks at Spark + AI Summit

May 15, 2018 by Jules Damji in Company Blog
By any measurement today, in the digital media, technical conferences and citations, or searches on Google trends , the frequency of terms like...
Company blog

A Guide to TensorFlow Talks at Spark + AI Summit 2018

May 8, 2018 by Jules Damji in Company Blog
Within a couple of years of its release as an open-source machine learning and deep learning framework, TensorFlow has seen an amazing rate...
Engineering blog

Benchmarking Apache Spark on a Single Node Machine

Apache Spark has become the de facto unified analytics engine for big data processing in a distributed environment. Yet we are seeing more...
Company blog

5 Reasons to Attend Spark + AI Summit

April 19, 2018 by Jules Damji in Company Blog
Spark + AI Summit will be held in San Francisco on June 4-6, 2018. Check out the full agenda and get your ticket...
Company blog

Women in Big Data and Apache Spark: Bay Area Apache Spark Meetup Summary

April 17, 2018 by Jules Damji in Company Blog
In collaboration with the local chapter of Women in Big Data Meetup and our continuing effort by Databricks diversity team to have more...
Company blog

Selected Sessions to Watch for at Spark + AI Summit 2018

March 15, 2018 by Jules Damji in Events
Early last month, we announced our agenda for Spark + AI Summit 2018 , with over 180 selected talks with 11 tracks and...
Engineering blog

Introducing Apache Spark 2.3

Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want...
Company blog

Databricks and Apache Spark™ 2017 Year in Review

January 3, 2018 by Jules Damji in Company Blog
At Databricks we welcome the dawn of the New Year 2018 by reflecting on what we achieved collectively as a company and community...
Company blog

Women in Big Data, Apache Spark and AI: Bay Area Spark Meetup at Databricks Summary

November 27, 2017 by Jules Damji in Company Blog
When Fei-Fei Li , the director of Stanford’s AI Lab and now a chief scientist at Google Cloud, was asked in an interview...
Platform blog

Cloud-based Relational Database Management Systems at Databricks

Databricks and Microsoft have jointly developed a new cloud service called Microsoft Azure Databricks , which makes Apache Spark analytics fast, easy, and...
Company blog

Spark Summit EU 2017 Recap and Reflections

November 6, 2017 by Jules Damji in Company Blog
“Dublin is now a truly cosmopolitan capital, with an influx of people, energy, and ideas infusing the ever-beguiling, multi-layered city with fresh flavors...
Engineering blog

Arbitrary Stateful Processing in Apache Spark’s Structured Streaming

October 17, 2017 by Bill Chambers and Jules Damji in Engineering Blog
This is the seventh post in a multi-part series about how you can perform complex streaming analytics using Apache Spark and Structured Streaming...
Engineering blog

Building Complex Data Pipelines with Unified Analytics Platform

October 5, 2017 by Jules Damji and Jason Pohl in Engineering Blog
Introduction Big data practitioners often post recurring questions on Quora: What is data engineering? How to become a data scientist? What’s a data...
Company blog

Bay Area Apache Spark Meetup at HPE/Aruba Networks Summary

September 22, 2017 by Jules Damji in Company Blog
On September 7th, we held our monthly Bay Area Apache Spark Meetup (BASM) at HPE/Aruba Networks in Santa Clara. We had two Apache...
Company blog

Learn about Apache Spark APIs and Best Practices

September 12, 2017 by Jules Damji and Silvio Fiorito in Company Blog
Since Apache Spark 1.3, Spark and its APIs have evolved to make them easier, faster, and smarter. The goal has been to unify...
Company blog

Bay Area Apache Spark Meetup at Pinterest Summary

August 28, 2017 by Jules Damji in Company Blog
On August 22, we held our monthly Bay Area Apache Spark Meetup (BASM) at Pinterest in San Francisco. In all, we had three...
Engineering blog

Anthology of Technical Assets on Apache Spark's Structured Streaming

August 24, 2017 by Jules Damji in Engineering Blog
Older anthologies collated a collection of contributions from various authors around a theme—bounded then as a journal or periodical. Newer anthologies, however, include...
Platform blog

Best Practices for Coarse Grained Data Security in Databricks

August 23, 2017 by Bill Chambers and Jules Damji in Platform Blog
At Databricks, we work with hundreds of companies, all pushing the bleeding edge in their respective industries. We want to share patterns for...
Engineering blog

On-Demand Webinar and FAQ: Parallelize R Code Using Apache Spark

August 21, 2017 by Hossein Falaki and Jules Damji in Engineering Blog
On August 15th, Data Science Central hosted a live webinar—Parallelize R Code Using Apache Spark—with Databricks’ Hossein Falaki . This webinar introduced SparkR...
Company blog

Apache Spark’s Structured Streaming with Amazon Kinesis on Databricks

August 9, 2017 by Jules Damji in Company Blog
On July 11, 2017, we announced the general availability of Apache Spark 2.2.0 as part of Databricks Runtime 3.0 (DBR) for the Unified...
Company blog

On-Demand Webinar and FAQ: Accelerate Data Science with Better Data Engineering on Databricks

On July 13th, we hosted a live webinar — Accelerate Data Science with Better Data Engineering on Databricks . This webinar focused on...
Company blog

4 SQL High-Order and Lambda Functions to Examine Complex and Structured Data in Databricks

June 27, 2017 by Jules Damji in Company Blog
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Engineering blog

Five Spark SQL Utility Functions to Extract and Explore Complex Data Types

June 13, 2017 by Jules Damji in Engineering Blog
Try this notebook on Databricks For developers, often the how is as important as the why . While our in-depth blog explains the...
Company blog

10th Spark Summit Sets Another Record of Attendance

June 9, 2017 by Jules Damji and Wayne Chan in Company Blog
We have assembled a selected collage of highlights from Databricks’ speakers at our 10th Spark Summit, a milestone for Apache Spark community and...
Company blog

Bay Area Apache Spark Meetup Summary

May 26, 2017 by Jules Damji in Company Blog
On May 16, we held our monthly Bay Area Apache Spark Meetup (BASM) at SalesforceIQ in Palo Alto. In all, we had three...
Engineering blog

On-Demand Webinar and FAQ: Deep Learning and Apache Spark: Workflows and Best Practices

May 23, 2017 by Tim Hunter and Jules Damji in Engineering Blog
On May 4th, we hosted a live webinar — Deep Learning and Apache Spark: Workflows and Best Practices . Rather than comparing deep...
Company blog

The Tenth Spark Summit with a Terrific Agenda for All

March 30, 2017 by Jules Damji in Company Blog
The number 10 is often used as a measuring yardstick to denote achievement, attainment or accomplishment: the 10th anniversary; a perfect score of...
Engineering blog

On-Demand Webinar and FAQ: Apache Spark MLlib 2.x: How to Productionize your Machine Learning Models

On March 9th, we hosted a live webinar— Apache Spark MLlib 2.x: How to Productionize your Machine Learning Models —to address the following...
Company blog

Spark Summit East 2017: Another Record-Setting Spark Summit

February 9, 2017 by Jules Damji, Wayne Chan and Dave Wang in Company Blog
We’ve put together a short recap of the keynotes and highlights from Databricks’ speakers for Apache Spark enthusiasts who could not attend the...
Company blog

5 Reasons to Attend Spark Summit East 2017

January 10, 2017 by Jules Damji in Company Blog
Spark Summit East will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it...
Company blog

Databricks and Apache Spark 2016 Year in Review

Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...
Engineering blog

Top 10 Apache Spark Blog Posts from 2016

December 30, 2016 by Jules Damji in Engineering Blog
Spark Summit will be held in Dublin, Ireland on Oct 24-26, 2017. Check out the get your ticket before it sells out! Here’s...
Company blog

On Demand Webinar and FAQ: Apache Spark MLlib 2.x: Migrating ML Workloads to DataFrames

December 14, 2016 by Joseph Bradley and Jules Damji in Company Blog
Last week, we held a live webinar, Apache Spark MLlib 2.x: Migrating ML Workloads to DataFrames , to demonstrate the ease with which...
Engineering blog

Databricks Bi-Weekly Apache Spark Digest: 11/16/16

November 16, 2016 by Jules Damji in Engineering Blog
Spark Summit Talks and Apache Spark Roundup Databricks and partners set a new world record for CloudSort 2016 Benchmark using Apache Spark...
Company blog

Databricks Voices From Spark Summit EU 2016 Day 2

October 27, 2016 by Jules Damji and Dave Wang in Company Blog
Read the recap from Day 1 of Spark Summit EU. Update: The videos of the presentations are now available. Find them below. Spark...
Company blog

Databricks Voices From Spark Summit EU 2016 Day 1

October 26, 2016 by Jules Damji in Company Blog
Update: The videos of the presentations are now available. Find them below. Spark Summit Keynotes Brussels’ October morning overcast or morning-commute traffic did...
Engineering blog

Databricks Bi-Weekly Apache Spark Digest: 10/4/16

October 4, 2016 by Jules Damji in Engineering Blog
Here’s our recap of what’s transpired with Apache Spark since our previous digest . Databricks Apache Spark Survey 2016 Report published and now...
Company blog

Apache Spark Survey 2016 Results Now Available

September 27, 2016 by Jules Damji in Company Blog
In July 2016, we conducted our Apache Spark Survey to identify insights on how organizations are using Spark and highlight growth trends since...
Company blog

Apache Spark Earns Datanami Awards for Machine Learning, Real-time Analytics, and More

September 19, 2016 by Jules Damji in Company Blog
Today, the Datanami Readers’ and Editors’ Choice Awards recognized the sweeping changes Apache Spark is bringing to the Big Data landscape with four...
Engineering blog

Databricks Bi-Weekly Digest: 8/31/16

August 31, 2016 by Jules Damji in Engineering Blog
Here’s our recap of what’s transpired with Apache Spark since our previous digest . Databricks CTO and Co-founder Matei Zaharia presented “Unifying big...
Engineering blog

How to use SparkSession in Apache Spark 2.0

August 15, 2016 by Jules Damji in Engineering Blog
Generally, a session is an interaction between two or more entities. In computer parlance, its usage is prominent in the realm of networked...
Engineering blog

Databricks Bi-Weekly Digest: 8/8/16

August 8, 2016 by Jules Damji in Engineering Blog
Continuing with our bi-weekly digest series, here’s our recap of what’s transpired over the last two weeks with Apache Spark since our previous...
Company blog

Code 4 San Francisco Hack Nite Highlights

July 20, 2016 by Jules Damji in Company Blog
Try this notebook in Databricks For a speechwriter, JFK’s words “Ask not what your country can do for you. Ask what you can...
Engineering blog

Databricks Bi-Weekly Digest: 7/18/16

July 18, 2016 by Jules Damji in Engineering Blog
Today, we're kicking off a new series: the Databricks Bi-Weekly Digest. Our goal with this digest is to summarize Spark related content, compiled...
Engineering blog

A Tale of Three Apache Spark APIs: RDDs vs DataFrames and Datasets

July 14, 2016 by Jules Damji in Engineering Blog
Of all the developers' delight, none is more attractive than a set of APIs that make developers productive, that is easy to use...
Company blog

Introducing Getting Started with Apache Spark on Databricks

June 30, 2016 by Jules Damji and Denny Lee in Company Blog
We are proud to introduce the Getting Started with Apache Spark on Databricks Guide . This step-by-step guide illustrates how to leverage the...
Company blog

Share your Thoughts in our Apache Spark Survey Today

June 23, 2016 by Jules Damji in Company Blog
The Spark Summit Europe call for presentations is open, submit your idea today . Since our survey in 2015 , Apache Spark has...
Engineering blog

Apache Spark Key Terms, Explained

June 22, 2016 by Jules Damji and Denny Lee in Engineering Blog
This article was originally posted on KDnuggets The Spark Summit Europe call for presentations is open, submit your idea today As observed in...
Company blog

Another Record-Setting Spark Summit

The lure of San Francisco is indisputable as is its position as the preeminent high-tech hub. On day one of Spark Summit 2016...
Engineering blog

Apache Spark 2.0: An Anthology of Technical Assets

June 1, 2016 by Jules Damji in Engineering Blog
Older anthologies collated a collection of contributions from various authors around a theme—bounded then as a journal or periodical. Newer anthologies include multiple...
Company blog

6 Reasons to Attend Spark Summit 2016

May 18, 2016 by Jules Damji in Company Blog
Temples of Developer Knowledge “Developers are the new kingmakers,” wrote Stephen O’Grady in his book “ The New KingMakers: How Developers Conquered the...
Company blog

Spark Saturday DC: A Meetup Summary

May 9, 2016 by Jules Damji in Company Blog
On a rainy and foggy Saturday morning, April 30th, in McLean, VA., more than 275 Apache Spark enthusiasts, forsaking the comfort of Saturday...
Company blog

How to Process IoT Device JSON Data Using Apache Spark Datasets and DataFrames

March 28, 2016 by Jules Damji in Company Blog
Today, I joined Databricks, the company behind Apache Spark, as a Spark Community Evangelist. In the past, I've worked as an individual contributor...