We present StreamDM, a new real-time analytics open source software library built on top of Spark Streaming, developed at Huawei Noah’s Ark Lab. The tools and algorithms in StreamDM are specifically designed for the data stream setting. Due to the large amount of data that is created – and must be processed – in real-time streams, such methods need to be extremely time-efficient while using very small amounts of memory. StreamDM is the first library to include advanced stream mining algorithms for Spark Streaming, and is intended to be the open-source gathering point of the research and implementation of data streams, while designed to allow practical deployments on real-world datasets. This new library contains methods for classification, regression, clustering, and frequent pattern mining. In this talk, we will show how these advanced methods work in practice, discuss some big data analytics applications in telecommunication networks, compare them with the methods available in MLLib and spark.ml, and show their ease of use and extensibility. We also present the ongoing use cases of StreamDM at Huawei, including recommendation at Huawei App Store and big data analytics at Huawei Global Technical Service.
Jianfeng Qian is a researcher at Noah's Ark Lab, Huawei Technologies. He obtained a PhD degree in Computer Science and Technology from Zhejiang University. His main research interests are mobile data analysis and stream machine learning.
Cheng He is a principal engineer and research manager in Noah’s Ark Lab of Huawei. He joined Huawei Research Institute in 2006 and his research interests include Traffic Measurement and Modeling, Distributed Stream Computing, and Big Data Stream Mining and Online learning. He has led important projects like MBB traffic measurement and modeling, system design for distributed stream big data processing, stream mining and online learning of massive telecom data for intelligent network management, etc. Until now, he has applied for more than 20 patents in China, the EU, and US in his research area. His current research focuses on designing and developing online ML and stream mining algorithms oriented distributed streaming system to support intelligent management of large-scale telecom networks.