Accelerating Machine Learning and Deep Learning At Scale…With Apache Spark

Download Slides

Deep learning is a fast growing subset of machine learning. There is an emerging trend to conduct deep learning in the same cluster along with existing data processing pipelines to support feature engineering and traditional machine learning. As the leading framework for Distributed ML, we believe that the addition of deep learning to the super-popular Spark framework is important, because it allows Spark developers to perform a range of data analysis tasks within a single framework that helps avoid the complexity inherent in using multiple frameworks and libraries. As one of the early and top contributors to Apache Spark, Intel is thrilled to share with the community a big deal contribution to open source Spark…”BigDL” -… A distributed deep Learning framework organically built on Big Data (Apache Spark) platform. It combines the benefits of “high performance computing” and “Big Data” architecture for rich deep learning support. With BigDL on Spark, customers can eliminate large volume of unnecessary dataset transfer between separate systems, eliminate separate HW clusters and move towards a CPU cluster, reduce system complexity and the latency for end-to-end learning. Ultimately, customers can achieve better scale, higher resource utilization, ease of use/development, and better TCO. Feature parity with Caffe and Torch, significant performance boost when combined with Intel’s Math Kernel Library (MKL), scale-out, fault tolerance, elasticity and dynamic resource sharing are some of the prominent features of BigDL.
BigDL open source project will be launched at the 2017 Spark Summit East and this keynote will help spotlight this new contribution and benefits to the Spark developer community and encourage their wide contribution and collaboration. We will also showcase some real world applications of Big DL from early customers’ adoption.

« back
About Ziya Ma

Ziya Ma is Intel vice president and the director of Big Data Software Technologies organization in Intel's Software and Services Group (SSG), System Technologies and Optimization (STO) Division. Her organization focuses on optimizing big data software on Intel platforms, leading open source efforts in the Apache community, linking innovation in industry analytics to bring about the best and the most complete big data experiences. Her organization has provided many consultations to industry companies on implementation and optimization on Intel platforms for Hadoop and Spark ecosystems. Ziya started her career at Motorola and moved to Intel in 2000. She has held various management positions in Intel’s Technology Manufacturing Group (TMG), where she was responsible for delivering embedded software for factory equipment, databases for manufacturing execution and process control, UI software, and more. Ziya was also product development software director of Intel IT, where she delivered software life-cycle management tools and infrastructure and analytics solutions to Intel software teams worldwide before her recent role within SSG. Ziya has received her Ph.D. and M.S. in CSE from Arizona State.