Yaroslav Nedashkovsky got Master’s degree in Applied Mathematics and System Analysis in 2008 from National Technical University of Ukraine, however made the first steps in software development around 2004. From 2004 to 2008 worked under a contract for Institute of Petroleum UAS, Kyiv, Ukraine. Started as a software engineer in projects related with data mining and visualization of seismic data. Since 2011 works in SoftElegance. He has a profound experience in building various successful SaaS solutions, data lake, mostly specialized in distributed system, IoT, and Big Data. From 2015 works as a System Architect. Big fan of machine learning.
SoftElegance Data Department is building unified Data Lake for oil and gas industry. Spark is important part of that infrastructure. Using up-to-date capabilities of Big Data technologies and IoT (different sensors on the oil rigs) only recent years it is possible to proceed GB's or TB's of raw data, that might be collected from the rigs (transferred and stored properly), in near real time and make predictive analytics. The introduction of presentation will include architectural overview of Data Lake with short description of technologies that are used, and what is the reason for business to develop it. The main part of the presentation will show the practical example how to use Spark Streaming for data collection and preprocessing from oil rigs and than reuse it through Apache Spark MLlib for building predictive maintenance. It would be presented the math model to predict failure of rod pumps. Also, it would be shown the full cycle of data flow, with the technologies that are used for each process: injection data, preprocessing, analyze, and prediction, that will be executed during data streaming. With the most focus on Spark Streaming batch processing and MLlib. As the conclusion a few words about why it was not possible to develop predictive models before.