We launched on-premises Hadoop cluster consisting of 1000 nodes with NTT DOCOMO, the leading mobile carrier company in Japan, and have used it for 5 years without any data loss. Our particular emphasis was on the fault tolerance and the scalability to compute vast amount of data in the mobile carrier.
Though Hadoop made it possible to deal with petabytes of data, we need more speed and flexibility these days. Demand for the parallel distributed processing frameworks based on the computational model other than MapReduce was steadily increasing. In response to these demands, we launched feasibility study of Spark, because we considered Spark as a promising candidate which works along with Hadoop, provides us fast multi-stage computation, and simplifies the application development. NTT DOCOMO gave us the opportunity to evaluate the scalability and the operability of Spark on the 1000 nodes cluster.
In this talk, we will show you the result of the evaluation, as well as challenges and observations from the view point of the enterprise Hadoop user and developer.