MyFitnessPal Delivers New Feature, Speeds up Pipeline, and Boosts Team Productivity with Databricks
To learn more about how Databricks helped MyFitnessPal with analytics, check out an earlier article in Wall Street Journal (log-in required).
We are excited to announce that MyFitnessPal (An Under Armour company) uses Databricks to build the production pipeline for its new “Verified Foods” feature, gaining many performance and productivity benefits in the process.
MyFitnessPal aims to build the largest health and fitness community online, by helping people to achieve healthier lifestyles through better diet and more exercise. Health-conscious people can use the MyFitnessPal website or the smartphone app to track their diet and exercise patterns and use the information to reach their fitness goals. MyFitnessPal wanted to further streamline the diet tracking functionality by offering a feature called “Verified Foods”, where one can get accurate and up-to-date nutritional information of food items by simply typing the name of the food in the MyFitnessPal application.
To deliver the functionality of “Verified Foods”, MyFitnessPal needed to create an accurate food database with a set of sophisticated algorithms. Prior attempts to implement these algorithms without Databricks proved to be not scalable, nor fast enough: They took weeks to run due to the enormous volume of data and their extreme complexity.
MyFitnessPal chose Databricks to implement these algorithms in a production pipeline based on Apache Spark because Databricks delivers the speed and flexibility of Apache Spark in a simple-to-use, zero management platform. Because of the high reliability and fast performance of the data pipeline powered by Databricks, the “Verified Foods” database now includes a comprehensive list of items with readily available and highly accurate nutritional information.
In addition to powering the “Verified Foods” feature, Databricks also delivered a number of key benefits to the Data Engineering & Science team at MyFitnessPal:
- 10X speed improvement, reducing the algorithm run time from weeks to mere hours.
- Dramatically higher team productivity as measured by the number of projects completed in the past quarter.
- Improved team efficiency due to the availability of mature libraries in Spark, and the ability to easily share and re-use code in the Databricks platform.