Skip to main content
Page 1
Engineering blog

SQL Subqueries in Apache Spark 2.0

Try this notebook in Databricks In the upcoming Apache Spark 2.0 release, we have substantially expanded the SQL standard capabilities. In this brief...
Engineering blog

Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop

When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...
Engineering blog

Apache Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Intervals, and UDAFs

To try new features highlighted in this blog post, download Spark 1.5 or sign up Databricks for a 14-day free trial today...
Engineering blog

Improvements to Kafka integration of Spark Streaming

Apache Kafka is rapidly becoming one of the most popular open source stream ingestion platforms. We see the same trend among the users...
Engineering blog

Introducing DataFrames in Apache Spark for Large Scale Data Science

Today, we are excited to announce a new DataFrame API designed to make big data processing even easier for a wider audience. When...