Engineering population scale Genome-Wide Association Studies with Apache Spark™, Delta Lake, and MLflow
The advent of genome-wide association studies (GWAS) in the late 2000s enabled scientists to begin to understand the causes of complex diseases such as diabetes and Crohn’s disease at their most fundamental level. However, academic bioinformatics tools to perform GWAS have not kept pace with the growth of genomic data, which has been doubling globally...
Guest Blog: Using Databricks, MLflow, and Amazon SageMaker at Brandless to Bring Recommendation Systems to Production
This is a guest blog from Adam Barnhard, Head of Data at Brandless, Inc., and Bing Liang, Data Scientist at Brandless, Inc. Launched in July 2017, Brandless makes hundreds of high-quality items, curated for every member of your family and room of your home, and all sold at more accessible price points than similar products on the market. We...
Building Foot-Traffic Insights Dataset
Where should I build my next coffee shop? Businesses want to understand both the physical world around them and how people interact with the physical world. Where should I build my next coffee shop? How far away are my 3 closest coffee competitors? How far are people traveling to get to my stores? Which other...
Guest Blog: How Virgin Hyperloop One reduced processing time from hours to minutes with Koalas
At Virgin Hyperloop One, we work on making Hyperloop a reality, so we can move passengers and cargo at airline speeds but at a fraction of the cost of air travel. In order to build a commercially viable system, we collect and analyze a large, diverse quantity of data, including Devloop Test Track runs, numerous...
Diving Into Delta Lake: Unpacking The Transaction Log
The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this article, we’ll explore what the Delta Lake transaction log is, how it works at the file level, and how...
Deep Learning on Medical Images at Population Scale: On-Demand Webinar and FAQ Now Available!
On June 26th, we hosted a live webinar — Deep Learning on Medical Images at Population-scale— with members of the data science and engineering teams from Human Longevity Inc (HLI), a leader in medical imaging and genomics. During the webinar, HLI shared how they use MRI images, whole-genome sequencing data, and other clinical data sets...
Protecting the Securities Market with Predictive Fraud Detection
FINRA (Financial Industry Regulatory Authority), a regulatory body charged with protecting the U.S. securities market, spoke at the Spark + AI Summit on how they use Databricks Unified Analytics Platform to analyze up to a 100 billion stock market events per day for fraud detection and prevention. This is a summary of their story from Summit....
Efficient Databricks Deployment Automation with Terraform
Managing cloud infrastructure and provisioning resources can be a headache that DevOps engineers are all too familiar with. Even the most capable cloud admins can get bogged down with managing a bewildering number of interconnected cloud resources - including data streams, storage, compute power, and analytics tools. Take, for example, the following scenario: a customer...
Tangible Impacts of AI on the Business
With 2019 in full swing, the excitement for data and AI driven innovation continues. Over the past few years, we’ve seen leading innovators - like Riot Games, Regeneron and Shell - become early adopters of the latest machine learning and AI technologies, building and deploying AI applications into production. But with great promise comes great...
Challenges of AI in the Mainstream
Artificial Intelligence (AI) adoption is picking up steam. Enterprises are looking for new ways to deploy AI to transform their business and gain a competitive edge. With the power of machine learning (ML) fueled by the insights mined from massive volumes of data, AI is no longer only for the 1%, but quickly becoming an...