We’re really excited to announce that Sharethrough has selected Databricks to discover hidden patterns in customer behavior data.
Sharethrough builds software for delivering ads into the natural flow of content sites and apps (also known as native advertising). Because Sharethrough serves ads on some of the most popular digital properties such as Forbes and People, the need for a high-performance big data scale processing platform permeates every aspect of their business.
Initially, Sharethrough attempted to establish a big data platform with self-hosted Hadoop clusters, leveraging Hive as the ad hoc query tool. However, this initial platform severely impeded the productivity of the Sharethrough team because the self-hosted clusters were too labor intensive to maintain and Hive was too slow for ad hoc querying.
To overcome these challenges, Sharethrough turned to Databricks to implement a big data platform that is simultaneously high performance and easy to maintain. They were able to deploy Databricks in Sharethrough’s Virtual Private Cloud (VPC) in AWS within days. The cluster management interface in Databricks was simple enough to enable their engineering team to create, scale, and terminate Spark clusters with a few clicks, instead of dedicating full-time engineers to this task, as was the case with the self-hosted Hadoop clusters.
Once the Apache Spark clusters were in place, Sharethrough was able to easily bring their clickstream data from AWS S3 into the interactive workspace of Databricks. The interactive workspace provides “notebooks”, enabling users to work with the data in their preferred language - SQL, Python, Java, or Scala.
As a result of deploying Databricks, Sharethrough gained a number of benefits:
- Faster prototyping of new applications
- Easier debugging of complex pipelines
- Improved overall engineering team productivity.