Skip to main content

Accelerating Grid-Edge Analytics using COMTRADE Files with Apache Spark

Dan Sabin
Colton Peltier
Eric Golinko
Nichole Lu
Share this post

The Data Intelligence Platform for Energy

 

This solution accelerator and blog were created in collaboration with Schneider Electric. We'd like to thank Dan Sabin, a Schneider Electric Distinguished Technical Expert and secretary of the IEEE/IEC Dual Logo Maintenance Team focused on the revision of the COMTRADE-2013 standard for lending his expertise.

 

Stable, reliable electric power generation and delivery is essential for our modern way of life. But power grids are constantly evolving, more so now than in the first 100 years of their existence, making it essential that power generators adhere to established standards. Through adherence to standards, the transmission and distribution lines, substations, transformers and other components that comprise the grid work in concert to deliver predictable and high quality power to the consumer.

To ensure power delivery adheres to standards, Intelligent Electronic Devices (IEDs) including protection relays, digital fault recorders, phasor measuring units (PMUs), and power quality monitors are deployed in electrical substations to closely monitor the characteristics of the power flowing through the grid. These IEDs rapidly sample and/or derive electrical quantities like voltage, current, power, frequency, phase angle, rms values, harmonics, and more. When abnormal conditions are detected, like voltage sags or electrical faults, the IEDs bundle the measurements taken just prior to and during the anomalous event as industry-standard COMTRADE files. These files are an essential element of diagnosing and correcting the conditions that lead to such events.

The COMTRADE file format was first specified by the Institute of Electrical and Electronics Engineers in IEEE Std C37.111 in 1991 as a "Common Format for Transient Data Exchange for Power Systems". IEEE approved revisions in 1999 and 2013 to maintain the format's relevance in the face of evolving technical requirements. The 2013 edition was completed in collaboration with the International Electrotechnical Commission, where it was published simultaneously as IEC 60255-24:2013.

Today, a variety of specialized software packages support the analysis of COMTRADE formatted files but these software packages typically process the data in a single process. As more practitioners in the field of electronics and electrical engineering explore the use of machine learning, and as the volume of data continues to grow, there is a growing need for support of COMTRADE data in distributed computing systems like Apache Spark.

Understanding COMTRADE Files

Many organizations in the electrical engineering space have accumulated vast troves of COMTRADE data. To record a single event, the COMTRADE format specifies the creation of a mandatory configuration (.cfg) file and a mandatory data (.dat) file within which higher-level details affecting the interpretation of the various readings and the readings themselves, respectively, are recorded. Additional files may be included in the collection of files surrounding a specific electrical disturbance (e.g. header files and info files), but these are optional and not always captured.

The variable structure of these files necessitates the inclusion of a configuration file. Additionally, it may be necessary to read multiple measurements stored in separate COMTRADE files spanning many seconds, minutes or even hours. These challenges combine to make the reading of COMTRADE files a formidable task for general-purpose data and analytics platforms. Specialized libraries, like the open source comtrade library in Python, have emerged to assist organizations with the reading of individual sets of data, but reading large volumes of these files can still be challenging.

Using COMTRADE Files with Apache Spark

Thankfully, modern analytics platforms, like Apache Spark, are highly extensible. Through a user-defined function, data from the various files can be read using a specialized library and a unified set of data outputs can be returned. With support for complex data structures, this data can be returned in a manner that respects the relationship between higher-level configuration elements and lower-level readings, while still allowing analysts the ability to easily access elements from different parts of the data structure.

While this data can be persisted in something more aligned with the native COMTRADE data structures, many analysts may find that further simplifications, such as the separation of configuration details into one table and lower-level readings into another, can simplify the retrieval of data for use in machine learning exercises without impacting the integrity of the information itself. The adoption of this pattern allows organizations to make these data files more widely available to a broader array of analysts and data scientists, allowing them to explore new and novel analysis approaches.

Putting It Into Practice

To demonstrate this pattern, we have developed a solution accelerator focused on the use of data in the COMTRADE format. Using a large collection of COMTRADE formatted files, we demonstrate how these may be accessed and processed at scale using Apache Spark running in the Databricks platform. The data extracted from these files is persisted to a small set of easy-to-access tables, and the data in those tables is leveraged in training a simple fault detection model using a convolutional neural network approach that has been popularized as of late by electrical engineers working in combination with data scientists. It is our hope that organizations focused on electrical power generation, power delivery, power quality, and power utilization will be able to take patterns demonstrated in this accelerator and adopt them to enable greater use of their COMTRADE information assets.

Download now

Try Databricks for free

Related posts

Anomaly Detection to Prevent Energy Loss

April 24, 2023 by Ashley Johnson and David Radford in
Energy loss in the utility space is primarily broken down into two categories: fraud and leakage. Fraud (or energy theft) is malicious and...

The Lakehouse for Manufacturing

Every industry is being challenged in how they think about topics like generative AI, data sharing, productivity, predictive analytics. But what does this...

How to Build Scalable Data and AI Industrial IoT Solutions in Manufacturing

This is a collaborative post between Bala Amavasai of Databricks and Tredence, a Databricks consulting partner. We thank Vamsi Krishna Bhupasamudram, Director –...
See all Industries posts