Jason is currently a Sr. Principle Engineer and Chief Architect of Big Data Technologies at Intel, leading the development of advanced Big Data analytics (incl. distributed machine learning and deep learning). He is an internationally recognized expert on big data, cloud and distributed machine learning; he is the co-chair of Strata Data Conference Beijing, a committer and PMC member of Apache Spark project, and the chief architect of BigDL project (https://github.com/intel-analytics/BigDL/), a distributed deep learning framework on Apache Spark.
While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Spark, there are still many challenges in applying it to an Internet-scale, production setting (e.g., graph algorithms and underlying frameworks optimized for billions of graph edges and 1000s of iterations). In this talk, we will present our efforts in building real-world, large-scale graph analysis applications using GraphX for some of the largest organizations/websites in the world, including both algorithm level and framework level optimizations (e.g., minimizing graph state replications, optimizing long RDD lineages, etc.)
There are increasing interest and applications for running deep learning on Apache Spark platform (e.g., BigDL, TensorFrames, Caffe/TensorFlow-on-Spark, etc.) in the community. In this BoF discussion, we would like to cover related topics such as experience and wish list, best practices and pitfalls, architectural tradeoffs, etc., for running deep learning on Spark.
BigDL is a distributed deep learning framework for Apache Spark open sourced by Intel. BigDL helps make deep learning more accessible to the Big Data community, by allowing them to continue the use of familiar tools and infrastructure to build deep learning applications. With BigDL, users can write their deep learning applications as standard Spark programs, which can then directly run on top of existing Spark or Hadoop clusters. In this session, we will introduce BigDL, how our customers use BigDL to build End to End ML/DL applications, platforms on which BigDL is deployed and also provide an update on the latest improvements in BigDL v0.1, and talk about further developments and new upcoming features of BigDL v0.2 release (e.g., support for TensorFlow models, 3D convolutions, etc.).