7 Reasons to Learn PyTorch on Databricks
What expedites the process of learning new concepts, languages or systems? When learning a new task, do you look for analogs from skills you already possess? Across all learning endeavors, three favorable characteristics stand out: familiarity, clarity and simplicity. Familiarity eases the transition because of a recognizable link between the old and new ways of...
Databricks and University of Rochester
At Databricks, we strongly believe (“know” you could say) that data and AI are mission-critical for solving the biggest problems our world faces. From healthcare to sustainability to transportation, data is a key to understanding and analyzing these issues at the deepest level – often in real time – and in turn shapes effective solutions....
Identifying Financial Fraud With Geospatial Clustering
For most financial service institutions (FSI), fraud prevention often implies a complex ecosystem made of various components –- a mixture of traditional rules-based controls and artificial intelligence (AI) and a patchwork of on-premises systems, proprietary frameworks and open source cloud technologies. Combined with strict regulatory requirements (such as model explainability), high governance frameworks, low latency...
Efficiently Building ML Models for Predictive Maintenance in the Oil and Gas Industry With Databricks
Guest authored post by Halliburton’s Varun Tyagi, Data Scientist, and Daili Zhang, Principal Data Scientist, as part of the Databricks Guest Blog Program Halliburton is an oil field services company with a 100-year-long proven track record of best-in-class oilfield offerings. With operations in over 70 countries, Halliburton provides services related to exploration, development and production...
Benchmark: Koalas (PySpark) and Dask
Koalas is a data science library that implements the pandas APIs on top of Apache Spark so data scientists can use their favorite APIs on datasets of all sizes. This blog post compares the performance of Dask’s implementation of the pandas API and Koalas on PySpark. Using a repeatable benchmark, we have found that Koalas...
Fine-Grained Time Series Forecasting at Scale With Facebook Prophet and Apache Spark: Updated for Spark 3
Advances in time series forecasting are enabling retailers to generate more reliable demand forecasts. The challenge now is to produce these forecasts in a timely manner and at a level of granularity that allows the business to make precise adjustments to product inventories. Leveraging Apache Spark™ and Facebook Prophet, more and more enterprises facing these...
Data + AI Summit Is Back
Data + AI Summit, the global event for the data community, returns May 24-28. We are thrilled to announce that registration for this free virtual event is now open! The future is open Data and AI are rapidly opening up new possibilities and solving stubborn problems. Unifying data opens up collaboration and solutions that were...
Data Democratization: A Key to Building a Healthy Data Culture
Building a thriving data culture is a strategic priority for many organizations, but only 24% of enterprises have managed to forge a data culture. What is a thriving data culture anyway? In its purest form, it’s when the entire organization - from the C-suite to the front-line workers - are making informed business decisions every...
Introducing Delta Time Travel for Future Data Sets
We are thrilled to introduce enhanced time travel capabilities in Databricks Delta, the next-gen unified analytics engine built on top of Apache Spark, for all of our users. With this new feature, Delta can automatically extrapolate big datasets stored in your data lake, enabling access to any future version of your data today. This temporal...
Top Questions from Customers About Data Management
Last week, we hosted a virtual event highlighting Delta Lake, an open source storage layer that brings reliability, performance and security to your data lake. We had amazing engagement from the audience, with almost 200 thoughtful questions submitted! While we can’t answer all in this blog, we thought we should share answers to some of...