Skip to main content
<
Page 60
>

Making Apache Spark the Fastest Open Source Streaming Engine

June 6, 2017 by Michael Lumb in
We started building Structured Streaming in Apache Spark one year ago as a new, simpler way to develop continuous applications . Not only...

Transactional Writes to Cloud Storage on Databricks

In another blog post published today , we showed the top five reasons for choosing S3 over HDFS. With the dominance of simple...

Entropy-based Log Redaction for Apache Spark on Databricks

May 30, 2017 by Weiluo Ren and Yu Peng in
This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, tooling, monitoring, and provisioning. We love...

Using sparklyr in Databricks

May 25, 2017 by Hossein Falaki in
Try this notebook on Databricks with all instructions as explained in this post notebook In September 2016, RStudio announced sparklyr , a new...

On-Demand Webinar and FAQ: Deep Learning and Apache Spark: Workflows and Best Practices

May 23, 2017 by Tim Hunter and Jules Damji in
On May 4th, we hosted a live webinar — Deep Learning and Apache Spark: Workflows and Best Practices . Rather than comparing deep...

Running Streaming Jobs Once a Day For 10x Cost Savings

This is the sixth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Traditionally, when people...

Taking Apache Spark’s Structured Streaming to Production

This is the fifth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. At Databricks, we’ve...

Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering

May 9, 2017 by Yun Ni, Kelvin Chu and Joseph Bradley in
This is a cross blog post effort between Databricks and Uber Engineering. Yun Ni is a software engineer on Uber’s Machine Learning Platform...

Event-time Aggregation and Watermarking in Apache Spark’s Structured Streaming

This is the fourth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Continuous applications often...

Processing Data in Apache Kafka with Structured Streaming in Apache Spark 2.2

This is the third post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. In this blog...