Quality control test for unreliable meteorological and electrical photovoltaic measurements

Meteorological and electrical measurements using predictive computational techniques have been used in the analysis of photovoltaic system operation and maintenance. International standards establish general and no standardized criteria on the quality control and validation of these measurements. In the present work, a methodology has been developed to correct erroneous photovoltaic experimental measurements: radiation, temperature, current, and voltage. We validated the proposed approach with 12 case studies with more than 5,000 mete-orological and electric measurements from an experimental 3 kWp photovoltaic system. The approach is based on a set of non-intrusive criteria developed from the one diode model, the approach allowed to correct about 80% of the erroneous data, 30% more using polynomial regression. As for the regression methodology, we have shown that the proposed methodology includes 4 meteorological-electrical variables allowing a more rigorous analysis. For 75% of the cases evaluated, the proposed methodology achieves a better data correction. This is an open access article under the CC BY-SA license.


INTRODUCTION
The power generation of a photovoltaic system (PVS) depends on weather conditions; therefore, the real-time monitoring of the PVS is a key factor for system operator. PVS rooftops, farms, parking lots and others, presents a challenge in real-time monitoring due to multiple measurements [1]. The monitoring of meteorological and electrical variables are essential in PVS operation, e.g., model validation, performance analysis, lifetime prediction, power prediction and optimization [2]- [10]. So, in this context radiation is considered the reference variable in PVS [11], [12]. Pyranometers are the most reliable source due to the use of high accuracy sensors, cleaning, maintenance, and calibration processes [13]- [15]. A measurement can be affected by incomplete or erroneous information; statistical validation and data filtering must be performed [2], [8], [11], [16].
The accuracy of measurements in a PVS associated with sensors and meteorological instruments can be affected by calibration, sensor failure and limitations of the measurement system [11]. Inaccuracy of measurements does not allow proper modeling of the PVS and the power grid [3], [5], [16]- [18]. Therefore, data filtering and quality control is mandatory, it depends on the installation site and instrumentation used [11], [16]. The use of linear regression based on coefficient of determination (R 2 ) greater than 0.8 allows the collection of ❒ ISSN: 2088-8708 accurate radiation, PVS temperature and electrical variables in alternating current (AC) and direct current (DC) like AC power, DC current, and DC voltage. The use of cubic spline interpolation for redundant measurements presents a R 2 greater than 0.99 and allows following the trend of the data [5], [17]. However, this value does not have a standardized calculation process [4], [6]. Morover, the use of multivariate adaptive spline regression have shown encouraging results [19]. In general the information filter criterion can be performed in stages, with radiation as the base variable [11], [12], then the DC voltage and current values and finally the AC variables [11]. The present research propose a methodology to correct unreliable measurements in PVS. The approach is based on the International Electrotechnical Commission (IEC) standard, IEC 61724-3 for a contribution about the new criteria and knowledge. In the scientific literature, a regression techniques are used for filter data. Our approach build a distance map of current and irradiance measurements, in this map, we are estimating the optimal distance between measurements, then filtering data and finally the wrong data is replaced.

EXPERIMENTAL PHOTOVOLTAIC INSTALLATION
The experimental data has been collected from a photovoltaic sytem of 3 kWp installed on the Faculty of Electrical Engineering, in the city of Huancayo (Mantaro Valley). The PVS has one Kipp & Zonen SMP3. It's located 1.5 m from the solar panels and is installed with 12º, the datasheet is available in Table 1.  For the PVS operation, the climate plays important role. Based on information from the National Meteorology and Hydrology Service of Peru, the Mantaro Valley has a semi-dry climate. The rainfall periods are January-March, the dry period between April and August. The maximum temperatures are recorded at 19 ºC in February and 22 ºC in August. Meteorological frosts are more frequent and intense from May-September. In summer, Huancayo has the highest level and frequency of precipitation, which occurs after midday.

IDENTIFICATION OF INCONSISTENT MEASUREMENTS
Through time series and trend, abnormal measurements can be detected in a practical way [19]. Data quality ensures no false data result of filters or logical rules [13], [19], [20]. Based on the approaches of IEC and others studies, Table 2 shows the valid range of radiation and temperature measurements [16], [20]- [22]. The justification is that the current intensity is derived from the radiation [11]. In both cases there are inconsistent measurements. Through the one diode model parameter identification and PVS approaches [16], [21], [23]- [26], the procedure to determine inconsistent measurements is based on the expression in (1).
where I dc is the value of current measured at the load and I ph is the current from PVS as a result of the incidence of solar radiation.

PROPOSED METHODOLOGY
The process of eliminating erroneous data leads to a simple solution that involves the loss of information. Other possibility is the real limits of measurements [11]. In the case of blank data, it is necessary to first evaluate the size of the missing information to then proceed with an estimation criteria [19]. A valid option for tackling missing data is to replace erroneous values from a set of estimated data from model, and use historical information [22]. If the information is null, multiple methods can be used to reconstruct the missing information: interpolation, secondary sensor, historical data, or PVS models [20]. In addition the linear interpolation criterion can be implemented. Based on the experience of [4], [6] the threshold of R 2 = 0.95 is acceptable to identify erroneous measurements. The criterion of similarity with historical measurements can be implemented, it is defined as: i) if the set of measurements does not allow a linear regression model with R 2 greater than 0.95, a correction of measurements will be made for data with no more than 5% difference with respect to the inconsistent measurement. This applies to radiation and temperature simultaneously; and ii) the criterion allows some flexibility, when the requirements of the criterion are met, the current, temperature, voltage and radiation measurements are replaced. The methodology described above allows correcting the PVS data-mainly the current measurements based on radiation and temperature. However, as described in the  ❒  ISSN: 2088-8708 previous section, the problem lies in that the I ph calculations could be incorrect despite correcting the current measurements. For this reason, a methodology involving the variables I dc and I ph has been proposed based on one diode model. The accuracy for I ph is based on this model, it is a non-complex model accepted for its accuracy. A direct relationship between I ph and G b , is shown in (2) [23].
where G ST C , T ST C , and I SC are the radiation, temperature and short circuit current at standar test condition (STC), T cell is the cell temperature, and K T is the temperature factor. These parameter are based in [23] and PVS datasheet available in section 2. The proposed methodology is presented in the next five steps.
Step 1. Determine the optimal distance: Determine the smallest distance between (I ph , G b ) with respect to the other measurements. The other measurements not include the measurements under analysis and the measurements whose values have been considered defective. The analysis plane corresponds to a plot of I ph (y-axis) vs G b (x-axis). Figure 2 shows the location of the pairs of measurements and the determination of the optimum distance. This distance corresponds to the minimum distance from the set of all possible distances that correspond to the established criterion.
Step 2. Evaluate the optimal distance values: The criteria used is a distance interval (d i ) less than 10. The d i has been defined between values 0 and 10. The sub-index i represent the measurement analized. For the case of the ith irradiance measured (G i ) and ith current (I phi ). An upper limit of 10 has been set for the worst case between the differences G b − G i and I ph − I phi . The d i means that the measurement can be corrected because there are measurement with higher accuracy for correcting the measurement with error. Figure 2. Estimation of the distance for current and radiation measurements Step 3. Evaluate the irrandiance: Determine the residual (u) defined like u = G b − G i . This step is based in the irradiance like a main PVS parameter. This residual help to choose the most convenient criterion to correct the measurements.
Step 4. Parameter correction for (u > 0): Correct the current, temperature, voltage, and radiation measurements by the factors 1.02; 1.01; 0.98; 1.01 respectively. After choosing the measurements closest to (I ph , G b ) the temperature, current, radiation and voltage measurements were corrected based on the radiation level and the minimum distance d i . In the case when the inconsistent radiation measurement is greater than the radiation measurement that belongs to the neighborhood with the minimum value d i , the corrections of the erroneous measurements are made with the weighting factor greater than 1 for the measurements that are correct. For current the weighting factor is 1.02; for temperature the factor is 1.01; for voltage the power value is kept constant at 0.98; and for radiation being directly proportional to the current, the factor used is 1.01.
Step 5. Parameter correction for (u < 0): Correct the current, temperature, voltage, and radiation measurements by the factors 0.98; 0.99; 1.02; 0.99 respectively. For the case when the inconsistent radiation measurement is less than the radiation measurement that belongs to the neighborhood with the minimum value d i , the corrections of the erroneous measurement are performed with the weighting factor of value less than 1 for the measurements that are correct. For current, the weighting factor is 0.98; for temperature the factor of 0.99; for voltage the power value is kept constant at 1.02; and for radiation being directly proportional to the current, the factor used is 0.99. A deviation can be introduced in the analysis depending on the volume of information lost or the data elimination method used [27]. This may cause a false alarm and unnecessary maintenance activities. Satellite data could be used to retrieve information when dealing with large volumes of data. However, these are low resolution due to clouds or aerosols covering the atmosphere [19]. This is the reason for the criterion of the lower value for d i and the lower modification of the existing measurements. After applying the proposed methodology, the V dc varies in +/-2% , I dc , G b , and T amb all of them vary in +/-1%. Now, this variation can be considered conservative because the uncertainty criteria is 5% of certain sensors.
Step 6. Replacing the wrong measurement: In the case that the measurements are similar, proceed to replace the wrong measurement by the similar measurement. Apply this step for the current, temperature, voltage, and radiation variables. Among the set of variables, radiation can be considered the main variable to allow the operation of the PVS, in the case that the absolute value between the measurements of radiation with and without defect is less than or equal to 3, proceed to replace the measurements of temperature, current, radiation and voltage. The five steps with the proposed methodology has been summarized in Figure 3 where the 5-step procedure is shown in a block diagram.

Identification of inconsistent data
In the present project, 12 data set for each month in 2020 have been used. The measurements cover the variables: i) current, ii) temperature, iii) radiation, and iv) voltage. The measurements have been recorded in 5-minute intervals during the day, then the current difference is estimated based on (1) described in section 3. According with this equation, negative values are part of the erroneous data detected from the PVS model. Figures 4(a) and 4(b) show the current difference expressed in section 3 for the different radiation values presented during the first and second half of 2020, respectively. For both semester the current difference is in the range of 0 to -3. The cases farther from the null value correspond to situations where there is a greater measurement error of radiation, current, temperature and voltage. In Figure 4(a), it is evident that the 2020-I measurements show a greater error with compared to the 2020-II measurements in Figure 4(b), due to the fact that during the first half of the year there was rainy weather, cloudy days and some days with few clear skies.

Data correction process 5.2.1. Relation between variables
In the present investigation it has been determined that radiation is the main variable. Based on this variable, the current, temperature and voltage measurements in the PVS are analyzed. The one diode model and the relationships of meteorological and electrical variables allow to obtain a linear relationship between I ph and G b . Figure 5(a) shows the measurements used in the analysis of the present work corresponding to 2020. All the data are presented in a linear way. That is because the data is derived from the PVS one diode model, in this context, the Figure 5(a) does not allow to visualize erroneous measurements that are out of trend.

❒ ISSN: 2088-8708
Regarding the correlation between I dc and G b measurements seen in Figure 5(b), there are erroneous measurements. Some measurements show a poor correlation between current and radiation in cases of low radiation and high radiation; and there is dispersion of some of them. The analysis through time series allows to visually identify significant errors in the set of measurements and contextualized analysis [19].

Regression methodology and historical measurements
The analysis through the time series has been considered from 7:00 to 17:00 hours for the case of 2020-01-06 measurements as shown in Figure 6. The DC current behavior in grey refers to the regression method and in black the base measurements. As shown in Figure 6, there are measurements between 9:00 and 11:00 hours in the range from 3 A to 5 A with error according to the established criteria. They need a correction, the corrected measurements are lower than the original value based on regression. Figure 7(a) shows the final measurements corrected with the regression process through measurements marked with an asterisk, and base measurements marked with a dot. As show, wrong measurements are out of the trend of the historical data. Consequently, they were corrected according to the criterion exposed in section 4. The same situation can be observed in the other cases evaluated. Figure 7(b) shows the measurements corresponding to 2020-02-21. As shown in the range from 1 A to 3 A some measurements are corrected with dots. By regression procedure the measurements were corrected. With asterisk in the range from 1 A to 3 A, the corrected measurements and historical measurements have the same trend. In the case of some measurements, there is stacked data, where the correction process through linear regression performs the correction in the expected range. Figure 7(c) shows the total cases for 2020-03-30, where the set of measurements obey a linear regression model. New corrected measurements shown with asterisks follow the trend of the whole data set. Defective measurements are marked with dots, clearly showing that they do not follow the trend. In section 4, the proposed methodology for the correction of inconsistent data was presented. We proceeded to evaluate each of the 12 case studies, correcting the radiation, current, temperature and voltage data. The evaluation is based in the correlation of variables and regression analyses. The analysis is focused on the relationship between irradiance-current, irradiance-voltage and irradiance-voltage. Figure 8 shows the results of applying both methods to the 12 case studies. In general, the proposed methodology allows for a greater correction of the inconsistent data. In case #1, both methods can detect and correct all the erroneous data of the base measurements, which is why the final number of inconsistent measurements for the proposed and regression methodologies are null. In case #2, the regression method corrects 100% of the faulty measurements. In the other cases, the advantage of using the proposed methodology is clearly shown. In cases 3, 4, 6, 7, 11, and 12, a correction of up to 80% of the erroneous data is shown. Cases 5, 8 and 10 also show a correction of more than 20% of the erroneous data. Case #9 corresponds to a particular case where the measurements do not show any inconsistency.

❒
ISSN: 2088-8708 Figure 9(a) shows a comparison of the initial current and radiation measurements, as well as the corrected measurements with the proposed method. As shown, the measurements between proximity of 3 A and 6 A have been corrected. The proposed methodology seeks minimum variations for the initial measurements, and as shown in Figure 9(a) the trend and the grouping of the data for both scenarios are very similar. The use of corrected data through the proposed methodology could achieve a real performance index and not generate alarms due to inaccurate indicator values. Figure 9(b) shows the comparison of corrected temperature measurements and the original measurements, and the correction of measurements in the range of 290 K and less than 500 W.m −2 . In general, the trend of the data in the 280-300 K range is the same in both situations. Similarly, Figure 9(c) shows that the voltage variations are minimal for the original and corrected measurements, the proposed methodology preserves the trend of the initial data because the corrections correspond to measurements in the vicinity of the base data established.
The analysis of the experimental measurements has made it possible to accurately detect the measurements to be corrected. Table 4 shows the erroneous data, i.e., the group of measurements that have been catalogued as erroneous for correction. The first semester presents more than 50% of erroneous data, due to weather conditions and rain conditions as described in section 2. In the case of March and December, there are even more erroneous data since there are different rain and cloud conditions during the day in those months. Table 4 allows concluding that the correction process is better achieved through the proposed methodology. That is, 80% of corrected measurements are recorded for December, a value much higher than that achieved by the regression methodology. Similarly, for the months of rain and cloudiness corresponding February, March, and April, more than 60% of corrected data has been achieved. As shown in Table 4 diode model, and it comprises measurement errors. Thus, data variation analysis is performed to analyze the level of variation from the initial measurements and the measurement values used to correct erroneous data. As shown in Table 5, measurement #111 corresponds to the base measurement for the correction of measurement #26. This measurement does not meet the criteria described in section 4 since it has been catalogued as erroneous. After applying the proposed methodology, using measurement #111 as a reference, measurement #26 has varied by 2%. In the case of the regression methodology, the same reference measurements have been used, and measurement #26 shows a variation of 3.3%. The corrected values for both methodologies are shown in Table 5. The proposed methodology shows that the values of radiation and photovoltaic current are directly proportional and higher than the base measurement value accordingly. Table 4. Variation (%) of measurements after correction process, date 2020-01-06 In the work described in [4], abnormal data was detected by gap. In our methodology, the measurements are compared with a model for filtering the data. In [5], the reconstruction data is performed with a group of two sensors, a spline interpolation is applied. The accuracy is based on the second set of measurements, but they could also fail. In the case of [5], [11] they removed the bad data, based on the evaluation of one parameter. In PVS the data variation is very complex, the evaluation is necessary for all parameters [5], [11], removing data could be a bad decision. So, meteorological and electrical variables are analyzed for the proposed approach.

CONCLUSION
This work has developed a methodology to correct deficient meteorological and electrical experimental measurements associated with PVS: i) radiation, ii) current, iii) temperature, and iv) voltage. Through the analysis based on radiation as a base parameter, this work has also proposed the correction of measurements based on a set of non-intrusive criteria developed from the one diode model of a photovoltaic cell. The proposal of this research is based on the IEC 61724-3 standard, which establishes the need to incorporate rules and criteria for the correction and filtering of measurements. The proposed approach has been validated with 5,808 measurements, referring to daily measurements of radiation, current, temperature and voltage organized in 12 case studies, one for each month of 2020. The results show that through the proposed methodology of correction of measurements through the proximity of base measurements, it was possible to correct more than 60% of measurements for the months of February, March, and April, considering that these months are ones with the highest cloudiness and variability of radiation. For six case studies, it was possible to correct more than 80% of wrong measurements. Regarding the regression methodology, we have shown that the proposed methodology encompasses 4 meteorological-electrical variables allowing for a more rigorous analysis. Thus, in 75% of the evaluated cases, the proposed methodology leads to a greater proportion of corrected defective measurements.