Topic modeling with LDA: MLlib meets GraphX

Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or used for featurization and dimensionality reduction in later stages of a Machine Learning (ML) pipeline. With Spark 1.3, MLlib now supports Latent Dirichlet Allocation (LDA), one of the most successful topic models.…

Read

What’s new for Spark SQL in Spark 1.3

The Spark 1.3 release represents a major milestone for Spark SQL.  In addition to several major features, we are very excited to announce that the project has officially graduated from Alpha, after being introduced only a little under a year ago.  In this blog post we will discuss exactly what this step means for compatibility…

Read

Using MongoDB with Spark

This is a guest blog from Matt Kalan, a Senior Solution Architect at MongoDB   Introduction The broad spectrum of data management technologies available today makes it difficult for users to discern hype from reality. While I know the immense value of MongoDB as a real-time, distributed operational database for applications, I started to experiment with Apache…

Read

PanTera Big Data Visualization Leverages the Power of Databricks Cloud

This is a guest blog from our one of our partners: Uncharted formerly known as Oculus Info, Inc. About PanTeraTM PanTera was created with the fundamental guiding principles that visualization, interaction, and nuance are critical to understanding data. PanTera unlocks the specific opportunity and richness presented by large amounts of data by enabling interactive visual…

Read

Databricks Launches “Jobs” Feature for Production Workloads

Databricks Cloud now includes a new feature called Jobs, enabling support for running production pipelines, consisting of standalone Spark applications. Jobs includes a scheduler that enables data scientists and engineers to specify a periodic schedule for their production jobs, which will be executed according to the specified schedule. Notebooks as Jobs In addition to supporting…

Read

Spark’ing an Anti Money Laundering Revolution

This is a guest blog from our one of our partners: Tresata Tresata and Databricks announced a real-time, Spark and Hadoop-powered Anti-Money Laundering solution earlier today. Tresata’s predictive analytics application TEAK, offers for the first time in the market an at-scale, real-time AML investigation and resolution engine. The performance, speed, predictive power and precision TEAK…

Read

Announcing Spark 1.3!

Today I’m excited to announce the general availability of Spark 1.3! Spark 1.3 introduces the widely anticipated DataFrame API, an evolution of Spark’s RDD abstraction designed to make crunching large datasets simple and fast. Spark 1.3 also boasts a large number of improvements across the stack, from Streaming, to ML, to SQL. The release has…

Read

Sharethrough Selects Databricks to Discover Hidden Patterns in Ad Serving Platform

We’re really excited to announce that Sharethrough has selected Databricks Cloud to discover hidden patterns in customer behavior data. Press release: http://www.marketwired.com/press-release/sharethrough-implements-databricks-cloud-discover-hidden-patterns-advertising-serving-1998953.htm Sharethrough builds software for delivering ads into the natural flow of content sites and apps (also known as native advertising). Because Sharethrough serves ads on some of the most popular digital properties such as Forbes and…

Read

Radius Intelligence implements Databricks Cloud for real-time insights on targeted marketing campaigns

We’re thrilled to share that Radius Intelligence has selected Databricks Cloud as its preferred big data processing platform, to deliver real-time insights in support of targeted marketing campaigns. Press release: http://www.marketwired.com/press-release/radius-intelligence-implements-databricks-cloud-maximize-data-processing-throughput-1997836.htm Radius is a marketing intelligence platform that enables B2B marketers to acquire new customers intelligently. By matching customer intelligence data to Radius’ weekly-updated data set…

Read

Databricks Cloud: From raw data, to insights and data products in an instant!

Enterprises have been collecting ever-larger amounts of data with the goal of extracting insights and creating value. Yet despite a few innovative companies who are able to successfully exploit big data, the promised returns of big data remain elusive beyond the grasp of many enterprises. One notable and rapidly growing open source technology that has…

Read