This is a collaborative post between Databricks and ARC Resources. We thank Ala Qabaja, Senior Cloud Data Scientist, ARC Resources, for their contribution.
As a leader in responsible energy development, Canadian company ARC Resources Ltd. (ARC) was looking for a way to optimize drilling performance to reduce time and costs, while also minimizing fuel consumption to lower carbon emissions.
To do so, they required a data analytics solution that could ingest and visualize field operational data, such as well logs, in real-time to optimize drilling performance. ARC's data team was tasked with delivering an analytics dashboard that could provide drilling engineers with the ability to see key operational metrics for active well logs compared side-by-side against historical well logs. In order to achieve near real-time results, the solution needed the right streaming and dashboard technologies.
ARC has deployed the Databricks Lakehouse Platform to enable its drilling engineers to monitor operational metrics in near real-time, so that we can proactively identify any potential issues and enable agile mitigation measures. In addition to improving drilling precision, this solution has helped us in reducing drilling time for one of our fields. Time saving translates to reduction in fuel used and therefore a reduction in CO2 footprint that result from drilling operations.
For the project, ARC needed a streaming solution that would make it easy to ingest an ongoing stream of live events, as well as historical data points. It was critical that ARC's business users could see metrics from an active well(s), in addition to selected historical wells at the same time.
With these requirements, the team needed to create data alignment normalized on drilling depth between streaming and historical well logs. Ideally, the data analytics solution wouldn't require replaying and streaming of historical data for each active well, instead leveraging Power BI's data integration features to provide this functionality.
This is where Delta Lake, an open storage format for the data lake, provided the necessary capabilities for working with the streaming and batch data required for well operations. After researching potential solutions, the project team determined that Delta Lake had all of the features needed to meet ARC's streaming and dashboarding requirements. During the process, the team identified four main advantages provided by Delta Lake that made it an appropriate choice for the application:
These characteristics solved all the pieces of the puzzle and enabled seamless data delivery to Power BI.
For active well logs, data is received into ARC's Azure tenant through internet of things (IoT) edge devices, which are managed by one of ARC's partners. Once received, messages are delivered to an Azure IoT Hub instance. From there, all data ingestion, calculation, and cleaning logic is done through Databricks.
First, Databricks reads the data through a Kafka connector, and then writes it to the Bronze storage layer. Once there, another structured stream process picks it up, applies de-duplication and column renaming logic, and finally lands the data in the Silver layer. Once in the Silver layer, a final streaming process picks up changed data, applies calculations and aggregations, and directs the data into the active stream and the historical stream. Data in the active stream is landed in the Gold layer and gets consumed by the dashboard. Data in the historical stream also lands in the Gold layer where it gets consumed for machine learning experimentation and application, in addition to being a source for historical data for the dashboard.
The goal for the dashboard was to refresh the data every minute, and for a complete refresh cycle to finish within 30 seconds, on average. Below are some of the obstacles the team overcame in the journey to deliver real-time analysis.
In the first version of the report, it took 3-4 minutes for the report to make a complete refresh, which was too slow for business users. To achieve the 30-second SLA, the team implemented the following changes:
Performing near real-time BI is challenging in and of itself when you are streaming logs or IoT data in real-time. It is just as challenging to construct a near real-time dashboard that combines high-speed insight with large historical analytics in one view. ARC utilized Spark Structured Streaming, the lakehouse architecture, and Power BI to do just that: create a unified dashboard that allows monitoring of key operational parameters for active well logs, and compare them to well log data for historical wells of interest. The ability to combine real-time streaming logs from live oil wells with enriched historical data from all wells supported the key use case.
As a result, the team was able to derive operational metrics in near real-time by utilizing the power of structured streaming, Delta Lake architecture, the speed and scalability of Databricks SQL, and the advanced dashboarding capabilities that Power BI provides.
ARC Resources Ltd. (ARC) is a global leader in responsible energy development, and Canada's third-largest natural gas producer and largest condensate producer. With a diverse asset portfolio in the Montney resource play in western Canada, ARC provides a long-term approach to strategic thinking, which delivers meaningful returns to shareholders.
Learn more at arcresources.com.
Acknowledgment:
This project was completed in collaboration with Databricks professional services, NOV – MD Totco and BDO Lixar.