• Blog
    • See All
    • Company Blog
    • Engineering Blog
  • Resources
  • Partners
  • Documentation
  • Support
  • Careers
  • Contact Us
  • Toggle Search
  • Manage Account
Databricks, Inc
  • Try Databricks
  • Manage Account
  • Product
    • Unified Analytics Platform
      • Microsoft Azure Databricks
      • Databricks on AWS
    • Databricks Runtime
    • Databricks Delta
    • Pricing
    • Security
    • Documentation
    • FAQ
    • Forums
  • Apache Spark
    • About Apache Spark
    • How to Get Started
    • Comparing Spark & Databricks
    • SparkHub (Community)
    • Developer Resources
  • Solutions
    • Industries
    • Data Engineering Teams
    • Data Science Teams
    • Use Cases
  • Customers
  • Training
    • Overview
    • Instructor-LED Training
    • Self-Paced Training
    • Certification
  • Events
    • Upcoming Events
    • Spark + AI Summit
    • Spark Live
  • Try Databricks
  • Blog
    • See All
    • Company Blog
    • Engineering Blog
  • Resources
  • Partners
  • Documentation
  • Support
  • Careers
  • Contact Us
  • Company Blog
    • Announcements
    • Customers
    • Events
    • Partners
    • Product
  • Engineering Blog
    • Apache Spark
    • Ecosystem
    • Machine Learning
    • Platform
    • Streaming
  • See All
Follow @databricks on Twitter
Collapse

Subscribe

  • Blog
  • Newsletter

Follow

  • Follow @databricks on Twitter
  • Follow Databricks on LinkedIn
  • Follow Databricks on Facebook

Posts by Reynold Xin

Page 1
Next page

Introducing Apache Spark 2.3

February 28, 2018 by Sameer Agarwal, Xiao Li, Reynold Xin and Jules Damji in Engineering Blog

Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want to thank the Apache Spark community for all their valuable contributions to Spark 2.3 release. Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.3 marks a major milestone...

Meltdown and Spectre: Exploits and Mitigation Strategies

January 16, 2018 by Chris Stevens, Nicolas Poggi, Thomas Desrosiers and Reynold Xin in Engineering Blog

  In an earlier blog post, we analyzed the performance impact of Meltdown and Spectre on big data workloads in the cloud. In this blog post, we explain these exploits, their mitigation strategies and how they impact Databricks from a security and performance perspective. Meltdown Meltdown breaks a fundamental assumption in operating system security: an...

Meltdown and Spectre’s Performance Impact on Big Data Workloads in the Cloud

January 13, 2018 by Chris Stevens, Nicolas Poggi, Thomas Desrosiers and Reynold Xin in Engineering Blog

  Last week, the details of two industry-wide security vulnerabilities, known as Meltdown and Spectre, were released. These exploits enable cross-VM and cross-process attacks by allowing untrusted programs to scan other programs’ memory. On Databricks, the only place where users can execute arbitrary code is in the virtual machines that run Apache Spark clusters. There,...

Databricks Cache Boosts Apache Spark Performance

January 9, 2018 by Alicja Luszczak, Michał Szafrański, Michał Świtakowski and Reynold Xin in Company Blog

We are excited to announce the general availability of Databricks Cache, a Databricks Runtime feature as part of the Unified Analytics Platform that can improve the scan speed of your Apache Spark workloads up to 10x, without any application code change. In this blog, we introduce the two primary focuses of this new feature: ease-of-use...

Benchmarking Big Data SQL Platforms in the Cloud

July 12, 2017 by Juliusz Sompolski and Reynold Xin in Engineering Blog

For a deeper dive on these benchmarks, watch the webinar featuring Reynold Xin. Performance is often a key factor in choosing big data platforms. Given SQL is the lingua franca for big data analysis, we wanted to make sure we are offering one of the most performant SQL platforms in our Unified Analytics Platform. In...

A Vision for Making Deep Learning Simple

June 6, 2017 by Sue Ann Hong, Tim Hunter and Reynold Xin in Engineering Blog

When MapReduce was introduced 15 years ago, it showed the world a glimpse into the future. For the first time, engineers at Silicon Valley tech companies could analyze the entire Internet. MapReduce, however, provided low-level APIs that were incredibly difficult to use, and as a result, this "superpower" was a luxury — only a small...

Top 5 Reasons for Choosing S3 over HDFS

May 31, 2017 by Reynold Xin, Josh Rosen and Kyle Pistor in Company Blog

At Databricks, our engineers guide thousands of organizations to define their big data and cloud strategies. When migrating big data workloads to the cloud, one of the most commonly asked questions is how to evaluate HDFS versus the storage systems provided by cloud providers, such as Amazon’s S3, Microsoft’s Azure Blob Storage, and Google’s Cloud...

Databricks Runtime 3.0 Beta Delivers Cloud Optimized Apache Spark

May 24, 2017 by Reynold Xin in Company Blog

A major value Databricks provides is the automatic provisioning, configuration, and tuning of clusters of machines that process data. Running on these machines are the Databricks Runtime artifacts, which include Apache Spark and additional software such as Scala, Python, DBIO, and DBES. For customers these artifacts provide value: they relieve them from the onus of...

Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?

February 16, 2017 by Reynold Xin, Ala Luszczak and Bogdan Raducanu in Engineering Blog

This blog post describes our experience debugging a failing test case caused by a cross join query running “too fast.” Because the root cause of fail test case spans across multiple layers—from Apache Spark to the JVM JIT compiler— we wanted to share our analysis in this post. Spark as a compiler The vast majority...

Databricks and Apache Spark 2016 Year in Review

January 4, 2017 by Reynold Xin, Jules Damji, Dave Wang and Matei Zaharia in Company Blog

In 2016, Apache Spark released its second major version 2.0 and outgrew our wildest expectations: 4X growth in meetup members reaching 240,000 globally, and 2X growth in code contributors reaching 1000. In addition to contributing to the success of Spark, Databricks also had a phenomenal year. We have rolled out a large number of features...

Next Page
  • Product
    • Databricks
    • Feature Comparison
    • How to Get Started
    • AWS Pricing
    • Security
    • Documentation
    • FAQ
    • Forums
  • Apache Spark
    • About Apache Spark
    • SparkHub (Community)
    • Developer Resources
    • Certification
    • Instructor-Led Apache Spark Training
  • Solutions
    • Industries
    • Data Science Teams
    • Data Engineering Teams
    • Use Cases
  • Customers
  • Company
    • About Us
    • Leadership
    • Board of Directors
    • Partners
    • Newsroom
    • Careers
    • Contact
  • Blog
    • See All
    • Company Blog
    • Engineering Blog
  • Resources

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105
1-866-330-0121

Contact Us

  • Follow @databricks on Twitter
  • Follow Databricks on LinkedIn
  • Follow Databricks on Facebook
  • Databricks Blog RSS feed
  • Follow Databricks on Youtube

© Databricks 2018. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.
Privacy Policy | Terms of Use