Sandeep is a Principal at ZS Associates and heads ZS’ Big Data practice. He has been helping enterprises for over 17 years to build cutting edge technology solutions. He’s the chief architect and technology leader focused on Big Data and has helped clients shape their vision, define roadmaps, and deliver on large scale enterprise platforms.
Data engineering to support reporting and analytics for commercial Lifesciences groups consists of very complex interdependent processing with highly complex business rules (thousands of transformations on hundreds of data sources). We will talk about our experiences in building a very high performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. We will touch upon optimizing enterprise grade Spark architecture for data warehousing and data mart type applications, optimizing end to end pipelines for extreme performance, running hundreds of jobs in parallel in Spark, orchestrating across multiple Spark clusters, and some guidelines for high speed platform and application development within enterprises. Key takeaways: - example architecture for complex data warehousing and data mart applications on Spark - architecture to build high performance Spark platforms for enterprises that balance functionality with total cost of ownership - orchestrating multiple elastic Spark clusters while running hundreds of jobs in parallel - business benefits of high performance data engineering, especially for Lifesciences. Session hashtag: #EUde3