Dataset of distribution transformers for predictive maintenance

In electricity sector is possible to collect large quantities of data that contain information on relevant processes and events that occur in a given period. It gives a knowledge of the different operation conditions of the electrical network and its components. Through the treatment and analysis of these data is possible to propose market, cost reduction, reduction of failures and repairs in machines and inventory decrease strategies. Grid operator can implement strategies to improve indicators of reliability and quality of service. From a maintenance point of view, the equipment operating time is a relevant aspect to identify and solve failures without service suspensions. This paper aims to show distribution transformers failures characteristics data using historical data collected by the grid operator (Compañia Energética de Occidente) at Cauca Department (Colombia), under the cooperation of the Universidad del Cauca and Universidad del Valle. The dataset could be helpful to researchers and data scientists who use machine learning to develop applications that help engineers in predictive maintenance.


a b s t r a c t
In electricity sector is possible to collect large quantities of data that contain information on relevant processes and events that occur in a given period. It gives a knowledge of the different operation conditions of the electrical network and its components. Through the treatment and analysis of these data is possible to propose market, cost reduction, reduction of failures and repairs in machines and inventory decrease strategies. Grid operator can implement strategies to improve indicators of reliability and quality of service. From a maintenance point of view, the equipment operating time is a relevant aspect to identify and solve failures without service suspensions. This paper aims to show distribution transformers failures characteristics data using historical data collected by the grid operator (Compañia Energética de Occidente) at Cauca Department (Colombia), under the cooperation of the Universidad del Cauca and Universidad del Valle. The dataset could be helpful to researchers and data scientists who use machine learning to develop applications that help engineers in predictive maintenance. ©

Value of the Data
• The data provide a collection of electric power distribution transformers characteristics at Cauca Department (Colombia). • The dataset could be useful for researchers and data scientist studying the predictive maintenance by machine learning algorithms. • The dataset are suitable for classification and regression models by machine learning algorithms.

Data Description
The dataset consists in two archives in Excel:

Experimental Design, Materials and Methods
The dataset contains all distribution transformers connected to 13.2 kV and 34.5 kV voltage levels, located at Cauca Department rural and urban areas, owned by Compañia Energética de Occidente. It is necessary to emphasize that transformers of private property (third parties), the government, and anyone different to network operator are excluded. The universe is made up of 15.869 transformers that meet the operating context and the interests of the company in residential, commercial, industrial and official sectors.
The dataset was used for predicting the failure of distribution transformers is addressed using Machine Learning techniques, Alvarez [1] . From a Machine Learning point of view this is a binary classification problem. The predictive model obtained through this approach allows the construction of a predictive maintenance plan reducing operating costs and optimizing the resources assigned to the maintenance area of Compañía Energética de Occidente.
The binary classification algorithm used was the Support Vector Machine (SVM), which shows a lower percentage of error in the predictive capacity of failure in distribution transformers. The predictor variables X i of the training data set that most contribute to the predicted variable Y i of the model obtained through a binary classification with the SVM algorithm can be seen in Fig. 1 .
Dataset is pre-filtered and does not contains data outliers or data missing, Bravo et al. [2] .

Ethics Statement
This work did not involve any human or animal subjects, nor data from social media platforms.

CRediT Author Statement
All authors contributed equally in this work.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.