I am a senior tech adviser in Halliburton with the focus on predictive maintenance and process improvement. Before joining Halliburton, I had worked on power plant predictive maintenance, gas turbine simulation and modeling in GE and Siemens for over 8 years. I graduated from Georgia Institute of Technology with a PhD degree in Aerospace Engineering and a master degree in Statistics.
For each drilling site, there are thousands of different equipment operating simultaneously 24/7. For the oil & gas industry, the downtime can cost millions of dollars daily. As current standard practice, the majority of the equipment are on scheduled maintenance with standby units to reduce the downtime. Scheduled maintenance treats each equipment similarly with simple metrics, such as calendar time or operating time. Using machine learning models to predict equipment failure time accurately can help the business schedule the predictive maintenance accordingly to reduce the downtime and maintenance cost. We have huge sets of time series data and maintenance records in the system, but they are inconsistent with low quality. One particular challenge we have is that the data is not continuous and we need to go through the whole data set to find where the data are continuous over some specified window. Transforming the data for different time windows also presents a challenge: how can we quickly pick the optimized window size among the various choices available and perform transformation in parallel? Data transformations such as the Fourier transforms or wavelet transforms are time consuming and we have to parallelize the operation. We adopted Spark dataframes on Databricks for our computation.
Here are the two major steps we took to carry out the efficient distributed computing for our data transformations: