Parallelizing Large Simulations with Apache SparkR

Download Slides

Across all assets globally, Shell carries a huge stock of spare part inventory which ties up large quantities of working capital. Over the past 2 years an interdisciplinary project team has produced a tool, Inventory Optimization Analytics solution (IOTA), based on advanced analytical methods, that helps assets optimise stock levels and purchase strategies. To calculate the recommended stocking inventory level requirement for a material the Data Science team have written a Markov Chain Monte Carlo (MCMC) bootstrapping statistical model in R. Cumulatively, the computational task is large but, fortunately, is one of an embarrassingly parallel nature because the model can be applied independently to each material. The original solution which utilised the R “parallel” package was deployed on a single 48 core PC and took 48 hours to run. In this presentation, we describe how we moved the original solution to a distributed cloud-based Apache Spark framework. Using the new R User Defined Functions API in Apache Spark and with only a minimal amount of code changes the computational run time was reduced to 4 hours. A restructuring of the architecture to “pipeline” the problem resulted in a run time of less than 1 hour. This use case is important because it verifies the scalability and performance of SparkR.
Session hashtag: #EUds8

About Wayne Jones

Wayne Jones is a senior Data Scientist in the Shell Advanced Analytics Centre of Excellence. He joined Shell in 2007 and during his ten years in Shell has worked on a wide variety of Data Science and statistical projects across many areas of the business, e.g. Upstream Materials Management, Treasury Cash Forecasting, Downstream Aviation, Gas and Power Trading. Wayne is a chartered statistician, has a BSc honours degree in mathematics from Bangor University of Wales, a MSc in 'Mathematical Modelling for Industry' from the University of Loughborough and a PhD in Ecological Modelling from the University of Strathclyde.

About Daniel Jeavons

Dan is passionate about innovation from data & analytics (a recurring theme throughout his career) but also has extensive experience in business process design and improvement, business transformation and large system (SAP) implementation. He began his working life as an Accenture consultant working in their Upstream practice before joining Shell in 2008, performing a variety of roles in SAP implementation programmes, the Group CIO office and in architecture. Led the Advanced Analytics CoE within TaCIT innovation from its formation in 2013, growing the team from nothing to around 80 people. The Advanced Analytics CoE now has active projects in most parts of the Shell group and has shown significant value from projects which are now publicly referenced – in particular spare part inventory optimization, carbon capture and storage (CCS) monitoring and subsurface analogue identification.