Andy Starzhinsky received a Master’s degree in Applied Mathematics and System Analysis in 2007 from National Technical University of Ukraine. Since 2010 he’s been working with SoftElegance, gradually switching from web development to marketing, then becoming a marketing director. Now Andy is a VP at SoftEleganceData, a company that has created a framework to proceed with petabytes of industrial data. At Spark Summit Europe, Andy will speak about big data in the oil and gas industry.
SoftElegance Data Department is building unified Data Lake for oil and gas industry. Spark is important part of that infrastructure. Using up-to-date capabilities of Big Data technologies and IoT (different sensors on the oil rigs) only recent years it is possible to proceed GB's or TB's of raw data, that might be collected from the rigs (transferred and stored properly), in near real time and make predictive analytics. The introduction of presentation will include architectural overview of Data Lake with short description of technologies that are used, and what is the reason for business to develop it. The main part of the presentation will show the practical example how to use Spark Streaming for data collection and preprocessing from oil rigs and than reuse it through Apache Spark MLlib for building predictive maintenance. It would be presented the math model to predict failure of rod pumps. Also, it would be shown the full cycle of data flow, with the technologies that are used for each process: injection data, preprocessing, analyze, and prediction, that will be executed during data streaming. With the most focus on Spark Streaming batch processing and MLlib. As the conclusion a few words about why it was not possible to develop predictive models before.