Skip to main content
<
Page 11
>

Solving the World’s Toughest Problems with the Growing Open Source Ecosystem and Databricks

January 23, 2020 by Reynold Xin in
We started Databricks in 2013 in a tiny little office in Berkeley with the belief that data has the potential to solve the...

Better Machine Learning through Active Learning

January 15, 2020 by Sean Owen in
Try this notebook to reproduce the steps outlined below Machine learning models can seem like magical savants. They can distinguish hot dogs from...

Processing Geospatial Data at Scale With Databricks

December 4, 2019 by Nima Razavi and Michael Johns in
This blog was written 3 years ago. Please refer to these articles for up-to-date approaches to geospatial processing and analytics with your Databricks...

Streamlining Variant Normalization on Large Genomic Datasets with Glow

December 4, 2019 by Kiavash Kianfar in
Cross posted from the Glow blog . Many research and drug development projects in the genomics world involve large genomic variant data sets...

Migration from Hadoop to Modern Cloud Platforms: The Case for Hadoop Alternatives

November 27, 2019 by Anand Venugopal and James Nguyen in
Companies rely on their big data and analytics platforms to support innovation and digital transformation strategies. However, many Hadoop users struggle with complexity...

Using AutoML Toolkit's FamilyRunner Pipeline APIs to Simplify and Automate Loan Default Predictions

November 5, 2019 by Jas Bali and Denny Lee in
Try this Loan Risk with AutoML Pipeline API Notebook in Databricks Introduction In the post Using AutoML Toolkit to Automate Loan Default Predictions...

Scalable Near Real-Time S3 Access Logging Analytics with Apache Spark™ and Delta Lake

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. The original blog is...

Scaling Hyperopt to Tune Machine Learning Models in Python

October 28, 2019 by Joseph Bradley and Max Pumperla in
Try the Hyperopt notebook to reproduce the steps outlined below and watch our on-demand webinar to learn more. Hyperopt is one of the...

Scaling Financial Time Series Analysis Beyond PCs and Pandas: On-Demand Webinar, Slides and FAQ Now Available!

On Oct 9th, 2019, we hosted a live webinar — Scaling Financial Time Series Analysis Beyond PCs and Pandas — with Junta Nakai...

Democratizing Financial Time Series Analysis with Databricks

October 8, 2019 by Ricardo Portilla in
Try this notebook in Databricks Introduction The role of data scientists, data engineers, and analysts at financial institutions includes (but is not limited...