Simplify Machine Learning on Apache Spark with DatabricksJune 3, 2015 by Denny Lee in Product As many data scientists and engineers can attest, the majority of the time is spent not on the models themselves but on the...
Statistical and Mathematical Functions with DataFrames in Apache SparkJune 2, 2015 by Burak Yavuz and Reynold Xin in Engineering Blog We introduced DataFrames in Apache Spark 1.3 to make Apache Spark much easier to use. Inspired by data frames in R and Python...
Databricks Launches MOOC: Data Science on Apache SparkMay 31, 2015 by Ameet Talwalkar and Anthony Joseph in Announcements For the past several months, we have been working in collaboration with professors from the University of California Berkeley and University of California...
Tuning Java Garbage Collection for Apache Spark ApplicationsMay 28, 2015 by Daoyuan Wang and Jie Huang in Partners This is a guest post from our friends in the SSG STO Big Data Technology group at Intel. Join us at the Spark...
NTT DATA: Operating Apache Spark clusters at thousands-core scale and use cases for Telco and IoTMay 14, 2015 by Masaru Dobashi, Kousuke Saruta, Toru Shimogaki and Masayoshi Tsuzuki in Company Blog This is a guest blog from our one of our partners: NTT DATA Corporation About NTT DATA Corporation NTT DATA Corporation is a...
Project Tungsten: Bringing Apache Spark Closer to Bare MetalApril 28, 2015 by Reynold Xin and Josh Rosen in Engineering Blog In a previous blog post , we looked back and surveyed performance improvements made to Apache Spark in the past year. In this...
Recent performance improvements in Apache Spark: SQL, Python, DataFrames, and MoreApril 24, 2015 by Reynold Xin in Engineering Blog Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Big Graph Analytics with LynxKite & Apache SparkApril 23, 2015 by Daniel Darabos in Company Blog This is a guest blog from our one of our partners: Lynx Analytics About Lynx Analytics Lynx Analytics is a data analytics consultancy...
Analyzing Apache Access Logs with DatabricksApril 21, 2015 by Ion Stoica and Vida Ha in Partners Databricks provides a powerful platform to process, analyze, and visualize big and small data in one place. In this blog, we will illustrate...
New MLlib Algorithms in Apache Spark 1.3: FP-Growth and Power Iteration ClusteringApril 17, 2015 by Jacky Li, Fan Jiang, Youhua Zhang, Stephen Boesch and Bing Xiao in Engineering Blog This is a guest blog post from Huawei’s big data global team. Huawei, a Fortune Global 500 private company, has put together a...