While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Spark, there are still many challenges in applying it to an Internet-scale, production setting (e.g., graph algorithms and underlying frameworks optimized for billions of graph edges and 1000s of iterations). In this talk, we will present our efforts in building real-world, large-scale graph analysis applications using GraphX for some of the largest organizations/websites in the world, including both algorithm level and framework level optimizations (e.g., minimizing graph state replications, optimizing long RDD lineages, etc.)
Jason is currently a Sr. Principle Engineer and Chief Architect of Big Data Technologies at Intel, leading the development of advanced Big Data analytics (incl. distributed machine learning and deep learning). He is an internationally recognized expert on big data, cloud and distributed machine learning; he is the co-chair of Strata Data Conference Beijing, a committer and PMC member of Apache Spark project, and the chief architect of BigDL project (https://github.com/intel-analytics/BigDL/), a distributed deep learning framework on Apache Spark.