Vickye Jain

, ZS Associates

Vickye jointly runs the Big Data expertise center within ZS and has extensive experience implementing large scale Big Data platforms for Fortune 200 companies in the US. He and his team have implemented very large scale ETL offloading use cases, Data Lakes, and high performance data processing platforms that have had transformation business impact on Commercial, R&D, and Operations organizations within LifeSciences.

SESSIONS

High Performance Enterprise Data Processing with Apache Spark

Data engineering to support reporting and analytics for commercial Lifesciences groups consists of very complex interdependent processing with highly complex business rules (thousands of transformations on hundreds of data sources). We will talk about our experiences in building a very high performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. We will touch upon optimizing enterprise grade Spark architecture for data warehousing and data mart type applications, optimizing end to end pipelines for extreme performance, running hundreds of jobs in parallel in Spark, orchestrating across multiple Spark clusters, and some guidelines for high speed platform and application development within enterprises. Key takeaways: - example architecture for complex data warehousing and data mart applications on Spark - architecture to build high performance Spark platforms for enterprises that balance functionality with total cost of ownership - orchestrating multiple elastic Spark clusters while running hundreds of jobs in parallel - business benefits of high performance data engineering, especially for Lifesciences. Session hashtag: #EUde3