Data Science and ML | Databricks Blog

Page 24

Introducing Apache Spark 2.3

February 28, 2018 by Sameer Agarwal, Xiao Li, Reynold Xin and Jules Damji in Engineering

Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want...

Databricks Achieves AWS Machine Learning Competency Status

November 28, 2017 by Brian Dirking in Partners

Today we announced that Amazon has awarded Databricks with the Amazon Web Services (AWS) Machine Learning (ML) Competency status. This designation recognizes Databricks...

Introducing Pandas UDF for PySpark

October 30, 2017 by Li Jin in Solutions

NOTE: Spark 3.0 introduced a new pandas UDF. You can find more details in the following blog post: New Pandas UDFs and Python...

Introducing the Natural Language Processing Library for Apache Spark

October 19, 2017 by David Talby in Solutions

This is a community blog and effort from the engineering team at John Snow Labs, explaining their contribution to an open-source Apache Spark...

Accelerating R Workflows on Databricks

October 6, 2017 by Hossein Falaki in Engineering

At Databricks we strive to make our Unified Analytics Platform the best place to run big data analytics. For big data, Apache Spark...

Building Complex Data Pipelines with Unified Analytics Platform

October 5, 2017 by Jules Damji and Jason Pohl in Platform

Introduction Big data practitioners often post recurring questions on Quora: What is data engineering? How to become a data scientist? What’s a data...

Developing Custom Machine Learning Algorithms in PySpark

August 30, 2017 by Ajay Saini and Joseph Bradley in Engineering

Developing custom Machine Learning (ML) algorithms in PySpark—the Python API for Apache Spark—can be challenging and laborious. In this blog post, we describe...

On-Demand Webinar and FAQ: Parallelize R Code Using Apache Spark

August 21, 2017 by Hossein Falaki and Jules Damji in Engineering

On August 15th, Data Science Central hosted a live webinar—Parallelize R Code Using Apache Spark—with Databricks’ Hossein Falaki . This webinar introduced SparkR...

Breaking the “curse of dimensionality” in Genomics using “wide” Random Forests

July 26, 2017 by Denis C. Bauer, Lynn Langit, Oscar Luo, Piotr Szul and Aidan O’Brien in Engineering

This is a guest blog from members of CSIRO’s transformational bioinformatics team in Sydney, Australia. CSIRO, Australia’s government research agency, is in the...

A Vision for Making Deep Learning Simple

June 6, 2017 by Sue Ann Hong, Tim Hunter and Reynold Xin in Engineering

Try this notebook on Databricks When MapReduce was introduced 15 years ago, it showed the world a glimpse into the future. For the...