The Little Warehouse That Couldn’t, Or: How We Learned to Stop Worrying and Move to Spark

Download Slides

Shopify’s commerce platform now powers over 150K merchants and continues to grow. The huge volume and variety of our data pushed our homegrown reporting and warehousing system to the edge. As the maintenance and performance costs became too much we moved to HDFS and Spark, using both python and scala to transform our vast amounts of operational data into dimensional models to provide better insight. Find out the lessons we’ve learned moving our entire organization onto fully conformed facts and dimensions, the clusters we’ve cratered, the walls we’ve hit, and what we did to overcome them to build our Starscream framework.