Cloud Cost Management and Apache Spark

Download Slides

The cloud computing market is growing faster than virtually any other IT market today, according to Gartner [1]. Providing a unified analytics platform in public clouds, Databricks invests heavily in cloud computing. As a result, cloud expense becomes an imperative category of our cost of goods sold (COGS) and operating expense (OPEX). Many companies share the same story as ours, embracing the cloud while facing the raising challenge of managing its cost.

In this session, we will share our experience on cloud cost management, from mistakes we made, data garnered, lessons learned, to the solutions we built. We will discuss general principles of managing accounts and services and assigning budget and attributing cost to internal teams. Using AWS as a concrete example, with Databricks and Spark as part of our solution, we will show how we: 1) make AWS cost and usage data available to finance and budget owners, 2) build data products that help budget owners to monitor the cost and take actions by buying reserved instances and setting retention policies, 3) use data science techniques to detect changes and do forecast. The general principles and solutions we built are applicable to other cloud providers too.

Session hashtag: #DSSAIS13

« back
About Xuan Wang

Xuan Wang is a data scientist/engineer at Databricks. He is working on building data products and ETL pipelines on top of Databricks’ Unified Analytic Platform and Apache Spark. Prior to joining Databricks, he was a postdoctoral researcher working on probabilistic models in random graphs and random medium. He received his Ph.D. in Statistics from The University of North Carolina at Chapel Hill in 2014.