Skip to main content

We’re really excited to announce that Sharethrough has selected Databricks to discover hidden patterns in customer behavior data.

Sharethrough builds software for delivering ads into the natural flow of content sites and apps (also known as native advertising). Because Sharethrough serves ads on some of the most popular digital properties such as Forbes and People, the need for a high-performance big data scale processing platform permeates every aspect of their business.

Initially, Sharethrough attempted to establish a big data platform with self-hosted Hadoop clusters, leveraging Hive as the ad hoc query tool. However, this initial platform severely impeded the productivity of the Sharethrough team because the self-hosted clusters were too labor intensive to maintain and Hive was too slow for ad hoc querying.

To overcome these challenges, Sharethrough turned to Databricks to implement a big data platform that is simultaneously high performance and easy to maintain. They were able to deploy Databricks in Sharethrough’s Virtual Private Cloud (VPC) in AWS within days. The cluster management interface in Databricks was simple enough to enable their engineering team to create, scale, and terminate Spark clusters with a few clicks, instead of dedicating full-time engineers to this task, as was the case with the self-hosted Hadoop clusters.

Once the Apache Spark clusters were in place, Sharethrough was able to easily bring their clickstream data from AWS S3 into the interactive workspace of Databricks. The interactive workspace provides “notebooks”, enabling users to work with the data in their preferred language - SQL, Python, Java, or Scala.

As a result of deploying Databricks, Sharethrough gained a number of benefits:

  • Faster prototyping of new applications
  • Easier debugging of complex pipelines
  • Improved overall engineering team productivity.
Try Databricks for free

Related posts

Announcing Brickbuilder Solutions for Migrations

August 11, 2022 by Michael Lumb in
Today, we're excited to announce that Databricks has collaborated with key partners globally to launch the first Brickbuilder Solutions for migrations to the...

Top Considerations When Migrating Off of Hadoop

July 22, 2021 by Manveer Sahota and Ron Guerrero in
Apache Hadoop was created more than 15 years ago as an open source, distributed storage and compute platform designed for large data sets...

Accelerating developers by ditching the data center

June 10, 2020 by R Tyler Croy in
Guest blog by R Tyler Croy, Director of Platform Engineering at Scribd People don’t tend to get excited about the data platform. It...
See all Company Blog posts