Kik

Customer Case Study

Kik

Kik is a messaging platform, already at 300 million users and is incredibly popular in the youth market, where it is growing fastest among teenagers. Kik doesn’t use phone numbers, but instead, usernames for friends to connect and chat with one another. Based in Waterloo, Ontario, this fast startup is valued at over $1B.

The Challenges

  • Limited Personnel Resource: Have to compete with giants like Google and Facebook with small team
  • Poor Choices for Engineering: Developing fast came at the expense of scalable engineering
  • Scalability Volumes a Concern: Scaling to 300 million users and anticipating future fast growth
  • Disjointed Data Science Efforts: Multiple systems and tools for data analysis led to complexity and lack of standard
  • Overload of Data: Amazon Redshift was starting to break at the seams with 300 data pipelines with 5TB of new data per day

The Solution

Databricks has allowed Kik to improve team productivity while significantly reducing data engineering overhead at scale:
  • Support for Multiple Languages: Familiarity with SQL – finding expertise in SQL is much more cost effective than other languages
  • Performant and Reliable Data Pipelines: Ability to handle 5TB new data per day streaming into the data lake
  • Fully Managed Platform: Managed Apache Spark™ takes the burden off of needing Spark expertise and having to understand the internals of Spark.
  • Collaborative Workspace: Streamlined data analysis and fostered collaboration through support for multiple languages, easy search, and real-time commenting.

The Results

  • Massive Improvement in Development Demands: Data engineering efforts reduced by 70%
  • Performance Gains for Jobs: Big jobs now run 2x faster
  • Improved Efficiency of Workflows and Collaboration: Sharing data is instant via links to notebooks vs putting together a powerpoint
  • Better Data Analysis: Ability to combine and ensure clean data to trust for analysis

“Moving to Databricks was a huge part of easing a lot of the pain of running a 300 node cluster on EMR; we moved those exact same jobs to Databricks and it ran in half the time.”

Joel Cumming
Senior Principal Data Scientist