Data-based investigation on the performance of an independent gas turbine for electricity generation using real power measurements and other closely related parameters

Generally, sub-Saharan countries possess abundant energy resources including renewables and fossil sources, with natural gas potentially being among the more abundant resource second only to solar power. For conventional electrical energy generation, gas turbines are one of the most prominent technologies being adopted in producing electricity from natural gas. Nigeria, for instance has the largest natural gas reserves in Africa, and the 9th largest in the World. Thus, more than 80% of her electricity generation utilizes gas turbines. To effectively monitor the state of these gas turbines, several sensors are located on the turbines to acquire data in real time. In this data article, we present the acquired data from a 5.68-MW gas turbine installed as an independent power producing unit in a community in Ogun State, Nigeria over a period of six months. Performing various descriptive analysis on the dataset, the real power measurements were taken as the target parameters, and based on a threshold correlation co-efficient of 0.5, only sixteen (16) parameters were shown to be more closely positively correlated with the real power measurements. Thus, any variation in the real power supplied by the gas turbine would have a commensurate effect on any of the other 16 parameters identified, and could thus help in troubleshooting or scheduling maintenance.


a b s t r a c t
Generally, sub-Saharan countries possess abundant energy resources including renewables and fossil sources, with natural gas potentially being among the more abundant resource second only to solar power. For conventional electrical energy generation, gas turbines are one of the most prominent technologies being adopted in producing electricity from natural gas. Nigeria, for instance has the largest natural gas reserves in Africa, and the 9th largest in the World. Thus, more than 80% of her electricity generation utilizes gas turbines. To effectively monitor the state of these gas turbines, several sensors are located on the turbines to acquire data in real time. In this data article, we present the acquired data from a 5.68-MW gas turbine installed as an independent power producing unit in a community in Ogun State, Nigeria over a period of six months. Performing various descriptive analysis on the dataset, the real power measurements were taken as the target parameters, and based on a threshold correlation coefficient of 0.5, only sixteen (16) parameters were shown to be more closely positively correlated with the real power measurements. Thus, any variation in the real power supplied by the gas turbine would have a commensurate effect on any of the other 16 parameters identified, and could thus help in troubleshooting or scheduling maintenance. © 2019 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data
Descriptive findings from the correlation matrix of the entire dataset [1] reveal nineteen features of the total fifty features which portray significant positive correlation metrics with the real power produced by the gas turbine. The real power produced by the gas turbine is taken as the target feature or parameter in this data investigation. Prior studies on the effects of ambient temperature on gas powered plants were performed by Refs. [2,3]. Results obtained indicates significant reductions in turbine's efficiency and electricity production capacities when ambient temperature increases. Thus, other researchers such as [4,5,and6] proposed novel cooling strategies for natural gas combined cycle power plants (NGCPP). Although natural gas being used in gas turbines denotes a form of fossil-based energy resource, another way through which these sorts of conventional energy resource are being utilized is in hybrid energy systems in terms of micro/mini grids where the conventional source (in this case gas turbines) combined with other renewable energy resources could be harnessed to consistently supply consumer energy demands [7,8]. In this dataset, the sample space size for every hour of the six Specifications Table   Subject Energy Specific subject area Energy Engineering and Power Technology Type of data

Value of the Data
This data could help data-scientists seeking for ways to utilize machine learning algorithms in identifying fault or scheduling maintenance in a gas turbine. Independent Power Producers (IPPs) could utilize these data in understanding key features or areas of the gas turbine contributing most to its reliability and stability. The data shared are relevant for research in the area of power system control and especially for power engineers in troubleshooting and to facilitate the localization of system dysfunctions in gas turbines. The data is also relevant for energy researchers in proposing novel techniques to curtail the effects of the ambient temperature surrounding gas turbines so as to increase its efficiency. months period (July 1st, 2017 till December 31st, 2017) in which this data was recorded is 4416 [1]. However, due to the data clean-ups conducted on the raw data to remove outliers and eliminate null values (when the turbine was shut-down due to scheduled maintenance or gas constraints), the sample space size became 2946. Hence, each of the 19 related parameters had a total of 2946 observations, and have been divided into five different sets as seen in Tables 1e5. Each of these 19 related parameters considered for the analysis are briefly explained below: Tables 1e4 present the descriptive statistics of the first four sets of related parameters for the real power produced by the gas turbine, where each set had four independent features respectively. Table 5, however, presented similar descriptive statistics but had only three features. The descriptive statistics considered were the mean, median, median absolute deviation (MAD), skewness, kurtosis, standard error (SE), Interquartile range (IQR), and standard deviation (SD).
Figs. 1, 6, 11, 15 and 18show the graphical correlation matrix of the first, second, third, fourth, and fifth related parameters respectively. A threshold correlation co-efficient of 0.5 was selected in this data investigation. Hence, from the graphical correlation matrix obtained, only correlation co-efficient above 0.5 was selected as having strong correlation with the real power produced. Thus, from Fig. 1, since the correlation co-efficients of all features were greater than 0.5, a boxplot of each feature against the time duration of operation of the gas turbine (H1 to H24) was graphed as shown in Figs. 2e5. In Fig. 6, all four features also had correlation co-efficients greater than 0.5, hence, Figs. 7e10 show the boxplot of each of these features against the hour time variable (H1 to H24). Considering the third set of related parameters, from Fig. 11, only three out of the four features considered had correlation coefficients greater than the threshold value. Hence, Figs. 12e14 show a boxplot of the features against the 24-h time duration of the gas turbine. In the fourth set of related parameters, out of all four features as shown in Fig. 15, only the compressor T5 average parameter and that of the ceiling temperature had a correlation co-efficient greater than 0.5. Hence, Figs. 16 and 17 reveal the boxplots of these parameters against the 24-h time duration of the gas turbine's operation. Lastly, Fig. 18 shows the graphical correlation matrix of the fifth set of related parameters. All three features considered had correlation co-efficients above the threshold value of 0.5. Hence, Figs. 19e21 depict the boxplots of these features against the hour time variable (H1 to H24). As these data was provided by an Independent Gas Turbine Power Plant in Ogun State, the dataset is more representative of most South-Western States in Nigeria due to the relatively similar atmospheric climatic conditions at these locations. It may also prove representative of some regions in sub-Saharan countries like Benin Republic and Togo which possess similar climatic conditions as those experienced in south-western states of Nigeria.

Experimental design, materials, and methods
The relationships existing between other system parameters/features and the real power measured from the 5.68-MW gas turbine was ascertained using the R statistical programming software (version 3.5.3). A total of 50 features were recorded by the Turbomach Turbotronic 4 SCADA application running on a core i5 2.40 GHz workstation with 1TB of hard-disk space and 4GB RAM. The Turbomach Turbotronic 4 application monitored the gas turbine in real time over a 24-h time period (H1 to H24) and at times when the turbine was shut down due to gas constraints or scheduled maintenance, no values were recorded for the total 50 parameters monitored. The data was recorded by various temperature            and pressure sensors installed at various points on the turbine during operation. The data recorded by each sensor changed with every change in load demand and ambient temperature. A transmitter transfers the measured data to a remote Human-Machine Interface (HMI) in the control room via an ethernet cable and the displayed data is collated hourly from the HMI. The complete R markdown code utilized in running the descriptive analysis on the raw turbine dataset is shown in Ref. [9]. Among the library packages used in this code are 'ggplot2' which was used for all the plots in this data article. The 'psych' library provided the descriptive statistics for each set of related parameters considered. In filtering the dataset to remove all 'not available (na)' values and all non-significant features, the 'dplyr' library was utilized. The 'readxl' library read in the excel spreadsheet, and the 'writexl' library were used to write data-frames into excel spreadsheets. From the descriptive analysis performed on the raw dataset obtained, a total of nineteen (19) features were deduced. Of all 19 related parameters considered, only 16 parameters had a correlation co-efficient greater than 0.5 with respect to the target variable (real power).