Varun is a tech adviser in Halliburton. Before joining Halliburton, Varun worked at TGS-Nopec Geophysical Company for four years and CGG for six years in Houston, as a seismic data processing and imaging Geophysicist. His last role at TGS was as an advising Geophysicist and team leader for cleaning, processing, and analyzing large 3D seismic datasets in the Gulf of Mexico, Canada, West Africa, and Brazil. He holds a B.S. degree in Electrical Engineering and a M.S. degree in Engineering Science, both from Penn State University.
For each drilling site, there are thousands of different equipment operating simultaneously 24/7. For the oil & gas industry, the downtime can cost millions of dollars daily. As current standard practice, the majority of the equipment are on scheduled maintenance with standby units to reduce the downtime. Scheduled maintenance treats each equipment similarly with simple metrics, such as calendar time or operating time. Using machine learning models to predict equipment failure time accurately can help the business schedule the predictive maintenance accordingly to reduce the downtime and maintenance cost. We have huge sets of time series data and maintenance records in the system, but they are inconsistent with low quality. One particular challenge we have is that the data is not continuous and we need to go through the whole data set to find where the data are continuous over some specified window. Transforming the data for different time windows also presents a challenge: how can we quickly pick the optimized window size among the various choices available and perform transformation in parallel? Data transformations such as the Fourier transforms or wavelet transforms are time consuming and we have to parallelize the operation. We adopted Spark dataframes on Databricks for our computation.
Here are the two major steps we took to carry out the efficient distributed computing for our data transformations: