This talk will walk you through the typical workflow of a data scientist or a data analyst at Uber, how they get access to Uber’s Big data and fast data sources for ad hoc and experimental analysis, how the data platforms will make it easy to discover datasets, run interactive queries against our petabyte scale data lake to identify the features you’re interested in, wrangle and prepare data for advanced analytics and machine learning. Our platforms also provide capabilities to do iterative machine learning and deep learning training seamless on single nodes and distributed on our Big data and GPU clusters, analyze, visualize and share the results of their experiments with colleagues and peers to get feedback, and even productionize data analytics jobs and ML models all without a degree in CS. Interested? Come, learn how Uber’s Big data platforms and Data science workbench put the power of Spark in the hands of our Data scientists and data analysts for advanced analytics and ML/DL use cases.
Hari Subramanian leads several efforts within Uber data infrastructure including building a highly scalable cross-datacenter data processing layer using Apache Spark and SQL analytics-as-a-Service using Apache Hive. He also works in the intersection of infrastructure, machine learning, and data science and builds platforms that help Uber's data scientists take machine learning models seamlessly from conception to production. Prior to Uber, Hari led software engineering teams at Amazon and built and operated cloud compute infrastructure for AWS EC2. Hari is also an inventor with over 10 granted software patents to his name.