Yu is a senior manager at EMC, where he works at a combo role of data scientist, solution architect, and technical leader on customer specific Big Data solution innovation for verticals like social intelligence, smart-grid, video surveillance and healthcare, as well as applied research like infrastructure (IaaS, PaaS, SDDC) analytics and software-defined storage intelligence. He also works on collaborative relationships between EMC and the academic community across APJ region. He holds a Ph.D. in computer science from National University of Singapore.
Video surveillance is critical to public security. The fast-growing scale of online cameras fosters automated and real-time video analytics across multiple video streams. We build a large-scale video security analytics platform, with Spark at the core, which online recognizes, classifies, stores and indexes humans/faces in cameras, so as to real-time detect suspected humans/faces in pre-defined blacklist, identify abnormal/unsafe user behaviors based on our home-grown security analytics model, as well as answer ad-hoc security questions such as "search all locations a suspect appeared". This talk will share our recent efforts and experience with Spark ecosystem to realize such platform. We equip Spark with rich video processing/analytics capabilities. We leverage Spark Streaming to real-time recognize and classify humans/faces, and utilize deep learning, with DeepDist, to extract visual features. We enable fast human/face search and security analytics at scale with the power of Spark execution engine, novel visual feature indexing, and multi-tier video data storage atop Tachyon, SparkSQL and HDFS.
Deep learning is a raging fire in recent years. In industry, a large number of deep learning computing frameworks were emerged such as Tensorflow and Caffe. Most of these are single-machine and leverage GPU's computing capability to handle the compute-intensive task in deep learning like convolution operation. Alternatively, some of the mature distributed computing frameworks like Spark are applied in deep learning. We can leverage the good scalability of these frameworks and distribute the computing task to multiple nodes to process the data efficiently which hardly be handled in single-machine. There have been many industry colleagues began to participate in this area such as Deepdist, Sparknet and DL4j. MLlib also has a corresponding algorithm under development. This talk will horizontally compare these products or technologies described above and list several important technical points for using distributed computing framework to do deep learning. KEY TAKE-AWAYS: (1) Introduction of most of mainstream deep learning frameworks atop Spark, (2) Comparison between these frameworks.