Findify’s Smart Search Gets Smarter with Spark MLlib and Databricks

Spark Summit East is just around the corner! If you haven’t registered yet, you can get tickets here with this promo code for 20% off: Databricks20 We are happy to announce that Findify has deployed Databricks as its machine learning and analytics platform, achieving faster time to complete projects, more efficient operations, and improved collaboration. You can

Read

How Elsevier Labs Implemented Dictionary Annotation at Scale with Apache Spark on Databricks

Spark Summit East is just around the corner! If you haven’t registered yet, you can get tickets here with this promo code for 20% off: Databricks20 This is a guest blog from our friend at Elsevier Labs. Sujit Pal is a Technical Research Director at Elsevier Labs. His interests are Search and Natural Language Processing. Elsevier is

Read

Reshaping Data with Pivot in Spark

Spark Summit East is just around the corner! If you haven’t registered yet, you can get tickets here and here’s a promo code for 20% off: Databricks20 This is a guest blog from our friend at Silicon Valley Data Science. Dr. Andrew Ray is passionate about big data and has extensive experience working with Spark. Andrew

Read

Auto-scaling scikit-learn with Spark

Spark Summit East is just around the corner. If you haven't registered yet, you can get tickets here and here's a promo code for 20% off: Databricks20. Data scientists often spend hours or days tuning models to get the highest accuracy. This tuning typically involves running a large number of independent Machine Learning (ML) tasks

Read

Inneractive Optimizes the Mobile Ad Buying Experience at Scale with Machine Learning on Databricks

We are happy to announce that Inneractive chose Databricks as their primary data warehousing and analytics platform — allowing them to ingest and explore data at scale without hampering performance. You can read the press release here. Inneractive is a global mobile ad exchange focused on empowering mobile publishers to realize their properties’ full potential

Read

Databricks Democratizes Data and Reduces Infrastructure Costs for Eyeview

We are happy to announce that Eyeview has selected Databricks as their enterprise data platform — doubling the pace of innovation and development of new product features through faster data processing and simpler operations. You can read the press release here. Eyeview is a video advertising technology company and a leader in providing brands with

Read

An Illustrated Guide to Advertising Analytics

To learn the latest developments in Apache Spark, register today to join the Spark community at Spark Summit in New York City! This is a joint blog with our friend at Celtra. Grega Kešpret is the Director of Engineering. He leads a team of engineers and data scientists to build analytics pipeline and optimization systems

Read

Faster Stateful Stream Processing in Spark Streaming

To learn the latest developments in Apache Spark, register today to join the Spark community at Spark Summit in New York City! Many complex stream processing pipelines must maintain state across a period of time. For example, if you are interested in understanding user behavior on your website in real-time, you will have to maintain

Read

Deep Learning with Spark and TensorFlow

To learn more about Spark, attend Spark Summit East in New York in Feb 2016. Neural networks have seen spectacular progress during the last few years and they are now the state of the art in image recognition and automated translation.  TensorFlow is a new framework released by Google for numerical computations and neural networks.

Read

MLlib Highlights in Spark 1.6

To learn more about Spark, attend Spark Summit East in New York in Feb 2016. With the latest release, Apache Spark’s Machine Learning library includes many improvements and new features.  Users can now save and load ML Pipelines, use extended R and Python APIs, and run new ML algorithms.  This blog post highlights major developments

Read