Jianfeng Qian is a researcher at Noah’s Ark Lab, Huawei Technologies. He obtained a PhD degree in Computer Science and Technology from Zhejiang University. His main research interests are mobile data analysis and stream machine learning.
We present StreamDM, a new real-time analytics open source software library built on top of Spark Streaming, developed at Huawei Noah's Ark Lab. The tools and algorithms in StreamDM are specifically designed for the data stream setting. Due to the large amount of data that is created - and must be processed - in real-time streams, such methods need to be extremely time-efficient while using very small amounts of memory. StreamDM is the first library to include advanced stream mining algorithms for Spark Streaming, and is intended to be the open-source gathering point of the research and implementation of data streams, while designed to allow practical deployments on real-world datasets. This new library contains methods for classification, regression, clustering, and frequent pattern mining. In this talk, we will show how these advanced methods work in practice, discuss some big data analytics applications in telecommunication networks, compare them with the methods available in MLLib and spark.ml, and show their ease of use and extensibility. We also present the ongoing use cases of StreamDM at Huawei, including recommendation at Huawei App Store and big data analytics at Huawei Global Technical Service.