Engineering | Databricks Blog

Page 60

Making Apache Spark the Fastest Open Source Streaming Engine

June 6, 2017 by Michael Lumb in Engineering

We started building Structured Streaming in Apache Spark one year ago as a new, simpler way to develop continuous applications . Not only...

Transactional Writes to Cloud Storage on Databricks

May 31, 2017 by Eric Liang, Srinath Shankar and Bill Chambers in Platform

In another blog post published today , we showed the top five reasons for choosing S3 over HDFS. With the dominance of simple...

Entropy-based Log Redaction for Apache Spark on Databricks

May 30, 2017 by Weiluo Ren and Yu Peng in Engineering

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, tooling, monitoring, and provisioning. We love...

Using sparklyr in Databricks

May 25, 2017 by Hossein Falaki in Engineering

Try this notebook on Databricks with all instructions as explained in this post notebook In September 2016, RStudio announced sparklyr , a new...

On-Demand Webinar and FAQ: Deep Learning and Apache Spark: Workflows and Best Practices

May 23, 2017 by Tim Hunter and Jules Damji in Engineering

On May 4th, we hosted a live webinar — Deep Learning and Apache Spark: Workflows and Best Practices . Rather than comparing deep...

Running Streaming Jobs Once a Day For 10x Cost Savings

May 22, 2017 by Burak Yavuz and Tyson Condie in Engineering

This is the sixth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Traditionally, when people...

Taking Apache Spark’s Structured Streaming to Production

May 18, 2017 by Bill Chambers and Michael Lumb in Engineering

This is the fifth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. At Databricks, we’ve...

Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering

May 9, 2017 by Yun Ni, Kelvin Chu and Joseph Bradley in Solutions

This is a cross blog post effort between Databricks and Uber Engineering. Yun Ni is a software engineer on Uber’s Machine Learning Platform...

Event-time Aggregation and Watermarking in Apache Spark’s Structured Streaming

May 8, 2017 by Tathagata Das in Engineering

This is the fourth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Continuous applications often...

Processing Data in Apache Kafka with Structured Streaming in Apache Spark 2.2

April 26, 2017 by Kunal Khamar, Tyson Condie and Michael Armbrust in Engineering

This is the third post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. In this blog...