Databricks Blog

Page 187

Top 5 Reasons for Choosing S3 over HDFS

May 31, 2017 by Reynold Xin, Josh Rosen and Kyle Pistor in Company

At Databricks, our engineers guide thousands of organizations to define their big data and cloud strategies. When migrating big data workloads to the...

Entropy-based Log Redaction for Apache Spark on Databricks

May 30, 2017 by Weiluo Ren and Yu Peng in Engineering

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, tooling, monitoring, and provisioning. We love...

Bay Area Apache Spark Meetup Summary

May 26, 2017 by Jules Damji in Company

On May 16, we held our monthly Bay Area Apache Spark Meetup (BASM) at SalesforceIQ in Palo Alto. In all, we had three...

Using sparklyr in Databricks

May 25, 2017 by Hossein Falaki in Engineering

Try this notebook on Databricks with all instructions as explained in this post notebook In September 2016, RStudio announced sparklyr , a new...

Working with Nested Data Using Higher Order Functions in SQL on Databricks

May 24, 2017 by Herman van Hövell and Bill Chambers in Product

View this notebook on Databricks Nested data types offer Databricks customers and Apache Spark users powerful ways to manipulate structured data. In particular...

Databricks Runtime 3.0 Beta Delivers Cloud Optimized Apache Spark

May 24, 2017 by Reynold Xin in Product

A major value Databricks provides is the automatic provisioning, configuration, and tuning of clusters of machines that process data. Running on these machines...

On-Demand Webinar and FAQ: Deep Learning and Apache Spark: Workflows and Best Practices

May 23, 2017 by Tim Hunter and Jules Damji in Engineering

On May 4th, we hosted a live webinar — Deep Learning and Apache Spark: Workflows and Best Practices . Rather than comparing deep...

Running Streaming Jobs Once a Day For 10x Cost Savings

May 22, 2017 by Burak Yavuz and Tyson Condie in Engineering

This is the sixth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Traditionally, when people...

Persistent Clusters: Simplifying Cluster Management for Analytics

May 19, 2017 by Evan Ye, Haogang Chen, Henry Davidge and Prakash Chockalingam in Company

Today we are excited to announce persistent clusters for analytics in Databricks. With persistent clusters, users no longer need to go through the...

Taking Apache Spark’s Structured Streaming to Production

May 18, 2017 by Bill Chambers and Michael Lumb in Engineering

This is the fifth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. At Databricks, we’ve...