This is a guest blog from one of our partners: Huawei
Join us at the Spark Summit to hear from Intel and other companies deploying Apache Spark in production. Use the code Databricks20 to receive a 20% discount!
It’s not unusual that one or more terabytes data flows in a telco network every second – this translates to roughly exabytes every month. In fact, the challenges go beyond the speed and volume of network flow data. For example, the location data is in original wireless coding format with complex nested structure, and leaves little room for compression; the signaling data, derived from multi-interfaces device of multi vendors in real-time and batch mode, requires complex association rules to make it meaningful and easily interpretable. Finally, the dynamic relationships across those data layers and among data entities of each horizontal layer create an exceedingly complex analytical problem. An effective and inherently unified data processing framework is the key to address this set of challenges.
Why We Chose Spark
To solve telco data issues and meet data analytics needs in a cost effective manner, two factors matter the most: First, a scale-out, parallel data flow model based platform that can simultaneously handle different processing modes while efficiently supporting diverse workloads on the same execution engine – from SQL to running machine learning algorithms, from streaming to graph computing.
Second, an open framework can support diverse complex data sources in a consistent way, support multiple APIs intuitively, and have rich libraries in easy extension. In this way, the IT, business, data science and network users can continue using their existing skills without tackling steep learning curves. It significantly shortens the lifecycle of application development and onsite deployment.
Apache Spark allows us to address both of these needs with a single powerful platform, without the burden of coding, managing and integrating with multiple processing frameworks.
How Huawei leverages Spark
Spark is core to the data processing and analytics platform of Huawei’s big data solution, FusionInsight, which is used by more than 100 enterprise customers globally.
With Spark, the raw data from multi-systems of multi-vendors (e.g., CRM, billing, OSS and network) can be easily loaded into a single data processing layer. Data scientists and data engineers can also use Spark SQL to explore the data, extract and group features, and develop models by leveraging MLlib algorithms. Application developers can leverage the output of these models or features to build specific applications (e.g. base station investment optimization), and publish dashboards or reports for subscriber profiling and network monitoring. Finally, business users can use Spark SQL for ad-hoc query, or continue to use existing BI systems or tools like SAS, R or Python with Spark’s powerful APIs.
As Huawei continues to build cutting-edge telecom solutions, we will increasingly adopt Spark as the core framework of our solutions since it provides a robust programming framework, rich set of APIs and libraries, vibrant ecosystem, and unparalleled pace of technology innovation.
Business Value Realized
In one of top 5 mobile carriers in the world (who has more than 300 million subscribers) Huawei deployed Spark in its operating branch across mission-critical business areas. The system supports near real time analysis, ad-hoc query, especially over multiple data sources of CRM, billing, OSS (Operational Support System), and wireless network. It also allows analysts and data scientists to build models over large data set more effectively, in some cases improving the time to deliver a product from months to mere weeks.
We have also had success in leveraging Spark to plan recommendation and churn prediction. The conversion rate from pre-paid to post-paid customers improved by 10-20% in each month after the project going live the prediction for top K churned customers enhanced by ~30%, and each month it helped retain over 30,000 subscribers. It translates into multi-million dollars business benefit to this flagship branch.
Huawei and this customer are working together to further expand Spark into other operating branches, and to unlock the potential of data in other new business areas (e.g. providing site recommendation to leading ads agencies and retailers).
Huawei’s Commitment to Spark
Huawei’s relationship with Spark can be traced back to 2011 when AMP Lab was founded. Huawei was convinced by the vision of AMP Lab and became corporate sponsor in early stage. Over years, Huawei has put together a global team to actively participate in the community and contribute things back. In Spark 1.2 release, there’re 10 contributors from Huawei and 11 contributors in 1.3 release.
To further the adoption of Spark in vertical industries, we have developed Spark SQL on HBase, a community package project, designed to accelerate online data query and analytics for large data sets, and contributed thousands of lines code back. Huawei team has also contributed two new features into Spark 1.3 release: The FP-growth algorithm is utilized to solve the frequent pattern mining problem and Power Iteration Clustering algorithm to identify similar behaviors among subscribers, network clusters or other combinations.
Huawei will continue to contribute to Spark and work on community projects, some of our planned efforts include: adding co-processor and custom filter into Spark SQL on HBase; participating on Project Tungsten while exploring the possibility to bring vectorized processing and compilation on LLVM; bringing business case driven new algorithms into MLlib under pipeline API and support MLlib feature transformer; planning to support CEP processing in Spark streaming. In short, Huawei is deeply committed to Spark and intends participate extensively in joint community and industry efforts.