Product | Databricks Blog

Page 33

Serverless Continuous Delivery with Databricks and AWS CodePipeline

July 13, 2017 by Kevin Rasmussen in Product

Two characteristics commonly mark many companies' success. First, they quickly adapt to new technology. Second, as a result, they gain technological leadership and...

4 SQL High-Order and Lambda Functions to Examine Complex and Structured Data in Databricks

June 27, 2017 by Jules Damji in Product

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...

Shell Oil Use Case: Parallelizing Large Simulations with Apache SparkR on Databricks

June 23, 2017 by Wayne W. Jones, Dennis Vallinga and Hossein Falaki in Product

This blog post is a joint engineering effort between Shell’s Data Science Team ( Wayne W. Jones and Dennis Vallinga ) and Databricks...

Managing and Securing Credentials in Databricks for Apache Spark Jobs

June 20, 2017 by Jason Pohl in Platform

Since Apache Spark separates compute from storage, every Spark Job requires a set of credentials to connect to disparate data sources. Storing those...

Analysing Metro Operations Using Apache Spark on Databricks

June 14, 2017 by Even Vinge, Senior Manager - EY Advisory, Data & Analytics in Product

This is a guest blog from EY Advisory Data & Analytics team, who have been working with Sporveien in Oslo building a platform...

Databricks Serverless: Next Generation Resource Management for Apache Spark

June 7, 2017 by Greg Owen, Eric Liang, Prakash Chockalingam and Srinath Shankar in Product

As the amount of data in an organization grows, more and more engineers, analysts and data scientists need to analyze this data using...

Integrating Apache Spark with Cucumber for Behavioral-Driven Development

June 2, 2017 by Aaron Colcord and Zachary Nanfelt in Company

This is a guest blog from FIS Global One of the most difficult scenarios in data processing is ensuring that the data is...

Apache Spark Cluster Monitoring with Databricks and Datadog

June 1, 2017 by Caryl Yuhas and Ilan Rabinovitch in Company

This blog post is a joint effort between Caryl Yuhas, Databricks’ Solutions Architect, and Ilan Rabinovitch, Datadog’s ‎Director of Technical Community and Evangelism...

Top 5 Reasons for Choosing S3 over HDFS

May 31, 2017 by Reynold Xin, Josh Rosen and Kyle Pistor in Company

At Databricks, our engineers guide thousands of organizations to define their big data and cloud strategies. When migrating big data workloads to the...

Entropy-based Log Redaction for Apache Spark on Databricks

May 30, 2017 by Weiluo Ren and Yu Peng in Engineering

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, tooling, monitoring, and provisioning. We love...