Engineering | Databricks Blog

Page 42

Better Machine Learning through Active Learning

January 15, 2020 by Sean Owen in Company

Try this notebook to reproduce the steps outlined below Machine learning models can seem like magical savants. They can distinguish hot dogs from...

Processing Geospatial Data at Scale With Databricks

December 4, 2019 by Nima Razavi and Michael Johns in Solutions

This blog was written 3 years ago. Please refer to these articles for up-to-date approaches to geospatial processing and analytics with your Databricks...

Streamlining Variant Normalization on Large Genomic Datasets with Glow

December 4, 2019 by Kiavash Kianfar in Engineering

Cross posted from the Glow blog . Many research and drug development projects in the genomics world involve large genomic variant data sets...

New Databricks Integration for Jupyter Bridges Local and Remote Workflows

December 2, 2019 by Bernhard Walter in Solutions

Introduction For many years now, data scientists have developed specific workflows on premises using local filesystem hierarchies, source code revision systems and CI/CD...

Migration from Hadoop to Modern Cloud Platforms: The Case for Hadoop Alternatives

November 27, 2019 by Anand Venugopal and James Nguyen in Open Source

Companies rely on their big data and analytics platforms to support innovation and digital transformation strategies. However, many Hadoop users struggle with complexity...

Deep Learning Tutorial Demonstrates How to Simplify Distributed Deep Learning Model Inference Using Delta Lake and Apache Spark™

November 20, 2019 by Cyrielle Simeone in Platform

On October 10th, our team hosted a live webinar— Simple Distributed Deep Learning Model Inference —with Xiangrui Meng, Software Engineer at Databricks. Model...

Better Machine Learning through Active Learning

Processing Geospatial Data at Scale With Databricks

Streamlining Variant Normalization on Large Genomic Datasets with Glow

New Databricks Integration for Jupyter Bridges Local and Remote Workflows

Migration from Hadoop to Modern Cloud Platforms: The Case for Hadoop Alternatives

Deep Learning Tutorial Demonstrates How to Simplify Distributed Deep Learning Model Inference Using Delta Lake and Apache Spark™

Using AutoML Toolkit's FamilyRunner Pipeline APIs to Simplify and Automate Loan Default Predictions

Scalable Near Real-Time S3 Access Logging Analytics with Apache Spark™ and Delta Lake

Scaling Hyperopt to Tune Machine Learning Models in Python

Scaling Financial Time Series Analysis Beyond PCs and Pandas: On-Demand Webinar, Slides and FAQ Now Available!