Photo-Voltaic (PV) Monitoring System, Performance Analysis and Power Prediction Models in Doha, Qatar

This study aims developing customized novel data acquisition for photovoltaic systems under extreme climates by utilizing off-the-shelf components and enhanced with data analytics for performance evaluation and prediction. Microcontrollers and sensors are used to measure meteorological and electrical parameters. Customized signal conditioning, which can withstand high-tempera-ture along with microcontrollers ’ development boards enhanced with appropriate interfacing shields and wireless data transmission to iCloud IoT platforms, is developed. In addition, an automatically controllable in-house electronic load of the PV system was developed to measure the maximum power possible from the system. LabVIEW ™ program was used to allow ubiquitous access and processing of the recorded data over the used IoT. Furthermore, machine learning algorithms are utilized to predict the PV output power by utilizing data collected over a two-year span. The result of this study is the commissioning of original hardware for PV study under extreme climates. This study also shows how the use of specific ML algorithms such as Artificial Neural Network (ANN) can successfully provide accurate predictions with low root-mean-squared error (RMSE) between the predicted and actual power. The results support reliable integration of PV systems into smart-grids for efficient energy planning and management, especially for arid and semi-arid regions.


Introduction
Qatar's rapid development over the past decade led to a remarkable growth on its economy and population. Hence, increasing the demands on food, water, electronics and services. All of which relies on electricity to power the industries such as desalinization plants, farms, commercial infrastructures, semiconductor factories and more. According to the Qatar Water and Electricity Corporation or QWEC, a foremost power generation plant in the country stated that the electricity demand in the country is increasing at an estimated yearly average growth rate of 6-7% in the coming years [1]. In order to address the increasing electricity demand, the state is considering a new energy strategy that would foster sustainability, but also contribute to the reduction of the greenhouse gas emission levels. Fortunately, the gulf region where the country resides, experiences 6 kWh/m 2 /day amounting to 4449 h/year where 70% comes from sunshine, thus, focusing on optimization of energy extraction from sunlight is a viable solution [2]. In fact, renewable energy sources such as those from photovoltaic cell (PV) plants are estimated to contribute 11% to the global demand by 2050 according to the International Energy Agency (IEA) [3].
Another possible source of renewable energy in Qatar can be harnessed from wind turbines. An assessment on wind energy potential in Qatar conducted by Qatar Petroleum [4] revealed that Qatar may employ use of small and medium wind turbines since 80% of the time wind speed over the country exceeds the critical speed of 3 m/s with annual mean speed over land and offshore of 4.3 and 5.7 m/s, respectively. It was estimated that 150 W/m 2 may be harnessed from a 5 m/ s wind speed but the power generated from wind turbines may be 8% less compared to the gas fired electricity. The cost projected for an offshore wind turbine is 10% less than the gas-based counterpart. Although wind turbines sound promising as a potential source of renewable energy, it does present several disadvantages compared to PV plants such as: annual maintenance on the turbine's gear box in contrast to minimal maintenance for the PV, loud noise during operation for nearby inhabitants, and smaller life span of 20-25 years compared to 30 year life span of PV [5]. Qatar does not have immediate plans for installing wind turbines yet, instead it has been focusing on solar energy by allocating US $1 billion investment for the project which includes desalinization plants and a 200 MW power plant by Kahramaa [4]. With the upcoming 2022 FIFA cup, the country aims to be the first carbon neutral world cup utilizing solar energy to power air conditioning and fan zones. Since the state is leaning towards utilizing mostly solar energy to help power its industry, this study was conducted to primarily focus on PV alternative that was designed specifically for Qatar's environment to test and understand its performance through measurment, prediction and analysis that should provide possible references for its solar industry.
Large-scale PV farms are usually situated where maximum solar energy conversion can be generated which are either semi-arid lands or a desert. However, soaring temperatures reaching 50°C or more, high humidity and heavy sandstorms are some examples of environmental factors that may significantly reduce the efficiency in power generation of the PV systems. These issues are region-specific and may differ from one place to another even within the Gulf region, Hence, it is significant to investigate the modern PV technology under these harsh conditions that are specifically present in Qatar so that performance could be strongly correlated to it [6]. One apparent benefit from this is that the uncertainty of PV performance will be greatly reduced leading to a more predictable and profitable solar megaprojects that are planned to be constructed in the area [7][8][9]. The results could also cater to the interests of the manufacturers, researchers and technology enthusiasts in order to develop or innovate solutions.
Efficient energy management is among the benefits from understanding PV performance since some modern communities now use hybrid systems where they integrate renewable sources of energy such as solar PV to determine how it behaves in such systems. In [10], the authors discussed modeling and optimization of urban integrated energy systems to provide an energy plan or policy for a better energy efficiency aiming to mitigate energy crisis experienced in urban communities. In addition, Menetti et al. [11] proposed an efficient energy management that effectively use energy storage systems for renewable energy sources and the electric grid to reduce energy exchanged and power peaks on the grid. The data from the monitoring system becomes a necessary tool for conducting important analysis on the system for a region such as [12] to determine its costs and profit throughout its operation to assess its financial sustenance and feasibility for its possible application to other regions. In addition, it would also aid in contributing to the continuing development of efficient operations in industries to yield better results through exergy and energy analysis such as in [13,14] and techno-economic analysis in [15,16]. With increasing amount of studies being conducted centered on renewable energy especially on solar energy and PV, this study will prove useful to the scientific community and may serve as a significant reference to the ones conducted similarly in Qatar.
Several similar investigations in Qatar with same line of inquiries [17][18][19][20][21][22][23][24][25][26] were conducted but none has been able to provide a cost-effective yet reliable system that satisfies the requirement for accessing, monitoring and predicting PV yield. Another major concern is the data acquisition system (DAS); most available commercial DAS tend to be costly when implemented for large solar PV plants. In addition, commercial DAS are inflexible for reconfigurations and modifications for various scenarios, thus, limiting its use. Furthermore, numerous efforts have been conducted in designing and implementing PV monitoring systems that utilize several sensors and data acquisition [27]. The system in [28] included an off-shelf component of Agilent 24902A, wherein the data were transmitted to the wired general purpose instrumentation bus to a computer that is running a LabVIEW™ program to determine the impact of solar irradiance and ambient temperature. Haba [29] developed a designated monitoring system for several PV panels that utilizes three gateways intended for weather station, current and voltage readings and storm detection which were then sent and hosted to online cloud specifically freeboard.io. A readily available commercial DAS was used for investigating the impact of module temperature and solar irradiance on PV efficiency and transmits to a server through the use of GPIB bus and cloud service [30]. Study [31] used a system consisting of LM35 temperature sensor and LDRs (light dependent resistors) for measuring ambient temperature and solar irradiance of PV module, respectively. The data is then transmitted to the computer wirelessly via Wi-Fi by connecting the microcontroller with EGSR7150 modem through its serial interface.
Forecasting of PV performance were recently introduced to improve the quality of the systems such as providing dispatch management, control operations, power ramp and flicker prediction on hourly basis; and load consumption and production monitoring on daily basis [32]. Parametric models were also utilized for forecasting which are mostly affected by the execution of the component models and factors that are not readily available, thus, affects the accuracy of the system [33]. Recently, ML was introduced to overcome the above drawbacks; which is driven by the interactions between the input and output variables according to the data. Several studies were already conducted like in [34] were they determined the solar potential from rooftops in Switzerland by utilizing ML. Li et al. [35] used ML to predict solar irradiance to precisely determine the PV output utilizing Markov model and regression. Most of these forecasts were conducted on a specific environment, hence it would not be able to provide the same accruacy when used in another locations that exhibits different environmetal parameters like in Doha were it experiences unique intense heat and heavy dust storms that lasts year long. Therefore, we planned to deliberately harness ML for predicting the performance of PV systems from the various environmetal parameters that are present in Doha along the year for viability and bankability of PV energy source.
This study describes the development of an in-house customized DAS system that is viable for monitoring PV systems under Qatar's climate and which comprises of two parts: hardware and software. Also, the study is enhanced by describing the calibration tools that are necessary in such studies. The remainder of the study is as follows: Section 2 describes the hardware and signal acquisition. Section 3 depicts the ML used for the data gathered throughout the duration of the study. Section 4 discusses the results from the developed system and the ML results. Finally, the conclusion and future work is provided in Section 5.

Hardware and signal acquisition
The hardware and signal acquisition system were installed in the Solar Lab facility under the College of Engineering, Qatar University. The ground floor of the solar lab facility houses computer workstation and wireless access point while its rooftop emulates the PV panel remote site where PV panels and data acquisition hardware system are mounted along with all environmental sensors and transducers. Qatar, having an arid environment with extreme ambient temperature easily surpassing 38°C during summer and often approaches 50°C with a humidity of 90% [36].
The authors developed an in-house and customized DAS that acquires six environmental parameters and two electrical parameters enhanced by analog filters with gain and offset adjustments for calibration purposes. The in-house DAS was designed to allow flexibility in order to construct a customized signal conditioning circuit suitable for each sensor that are deemed appropropriate for the range of parameter values in an arid environment. The selected sensors along with the signal conditioning circuit and topology were chosen in order to implement a robust DAS that is appropriate to Doha's harsh weather condition. Figure 1 depicts the overall data acquisition framework. Data acquisition starts from the PV panel remote site where the PV panels are installed to ensure maximum exposure to sun's irradiance, free from shadows due to obstructions. Selection of azimuth and tilt angle of PV panels are also important mounting details that needs to be considered. Two polycrystalline PV panels connected in series were installed in the remote site where the electrical and environmental parameters are needed to be monitored periodically in a specified sequence of steps as shown in the generalized flowchart in Figure 2. Periodic acquisition are normally spaced 15 minutes apart to ensure seamless wireless transmission between the PV panel remote site to the research lab site due to the considering the response time of the hardware. Information collected in the research lab site are stored locally and to the file hosting service of Dropbox™ along with the visualization facility of ThingSpeak™ through and iCloud™ server.    A detailed illustration of the connection diagram exhibiting important components of the PV panel remote site is shown in Figure 3. Six environmental and two electrical parameters, namely; (1) ambient temperature, (2) irradiance level, (3) wind speed, (4) surface temperature, (5) relative humidity, (6) dust levels, along with PV voltage and current are carefully studied and chosen by the authors in [37,38] in order to provide highest probable impact contributing to the correlation to PV panel performance and efficiency, thus, allowing higher reliability when applying ML algorithms in [37,38,39]. The system specifications of each sensor are enumerated in Table 1 that includes actual part number of the off-the-shelf sensors along with the manufacturer and range of operation. The details of DAS design and operation were presented by the authors in [37,38,39,40]. Figure 4 exhibits the simplified connection of various elements to process the required signal for redundant storage and visualization in the research lab set-up. The computer workstation uses LabVIEW™ program to process data that allows visualization of recently acquired data as depicted in Figure 5.

Power prediction using machine learning
ML is the process of training a system to automatically predict output from given inputs. The system is trained using available set of inputs and their respective outputs. The concept of ML is useful in biomedical applications [41,42], power prediction [43] and in general for any data processing and analysis studies. ML will be used to learn from the large amount of monitoring data collected from the setup discussed in the previous section and this phase is the training phase. During the training phase a part of the input data used for training is kept for validation purposes of the trained network. The validation accuracy is a metric used to determine how good or bad a trained ML network is. This ML trained network is then used for testing some data, which was unknown to the ML network, and is used to check if the ML trained network can actually predict the output correctly. The best performing ML network can later be used to predict the PV performance in the future based on the environmental and electrical inputs. The various stages that are involved in the ML are shown in Figure 6 and will also be discussed in details in the sub sections below.

Pre processing
It is always important to make sure that the data given to the ML network for training is correctly formatted, making sure all outliers in the data or data which are incorrect and not trustable are removed. The data should be made in a format which is acceptable to the ML network in whichever platform it is being operated on. The ML Toolbox in Matlab 2019a version was used in the study. There are many other popular ML platforms available such as TensorFlow, Keras, Shogun, and RapidMiner.

Feature selection
Once the data (input and output) for the training and testing purpose is ready, it is important to select the inputs that can help in predicting the output better. Sometimes giving more input or options to help in prediction can lead to overfitting problem. Overfitting is an issue where a ML network is trained to work the best for only the trained dataset and predicts mostly wrong outputs in the testing phase. This process of selecting the input data that can increase the testing accuracy is called feature selection. Selection of features is the process of selecting a subset of relevant, high-quality and non-redundant features to create learning models with better accuracy [44,45]. Well known feature selection techniques -Correlation feature selection (CFS) and Relief feature selection (ReliefF) was used in this study. CFS technique selects feature sub-sets based on correlation-based heuristic evaluation function and ReliefF is an instance-based algorithm that assigns a relevance weight to each feature that reflects its ability to differentiate class values [43].

Prediction models
Once the data that will be given as input to the ML training phase is selected, then there are several ML techniques that can be used to see which techniques help in reaching better performance. The techniques used in this study can be broadly classified into two categories: Classical ML Technique and Artificial Neural Network. These techniques are compared in the performance in prediction during the testing phase and the best performing technique is archived for future use.

Classical machine learning
Several simple and popular regression and prediction models are stated in this work to estimate the PV output power. These are namely Simple Linear Regression [46], Gaussian Process Regression (GPR) [47] from the regression learner, and M5P regression tree [37,48]. Simple linear regression model has a linear relationship between the output response and the input parameters. GPR involves a Gaussian process using lazy learning and a measure of the point similarity (kernel function) to predict the value from the training data for an unseen point. The M5P regression tree uses algorithm which contains if and else statements [48,49] . In other words, predicted power will be the result of "if … then … else … " statements.

Artificial neural network
Artificial Neural Network (ANN) (Figure 7) can be thought of a replication of how the human nervous system works, but as it is artificial thus it gets its name [50]. ANN has three major layers: (1) Input Layer, Output Layer and the Hidden Layer. The input layer are the artificial neurons where the actual learning happens and is also the layer where the input is fed. Each neuron in this layer has specific weights, which are details used to solve a specific problem. These weighted summed inputs are used in the hidden layers or in the transfer functions. Transfer functions are then inputs to activation function which tries to predict the output or provides the error back to the network as a feedback. This feedback acts as learning for the input layers again to try providing inputs to the activation function to help in better prediction.
There are several Training Algorithms (TA) available in the Matlab implementation of ANN and each of them have their advantages and disadvantages and each application can have a specific TA giving better results than the others due to the nature of the data. It is always important to explore various combinations of number of hidden layers and training functions to find the best combination that predicts the PV power most accurately, as shown in Figure 8. The algorithm first varies the training algorithms, then the number of hidden layers and then does many tries using the combination. During each trial the algorithm stores the network with best performance for testing purpose. The final best network is used for predicting the PV power using the input variables. Figure 9 summarizes the network settings for the ANN based PV power prediction. The optimum number of hidden layers providing the best model were different for all features (60), CFS technique (260) and ReliefF technique (180) and were found using the algorithm stated in Figure 8.
In order to compare between the various categories, techniques of ML and also the various feature selection techniques the below statistical parameters were used as performance metrics [51].
Root mean square error, RMSE ¼ MSE Baseline ð Þis calculated by where X is the actual data vector, Y and Y are the predicted data vector and mean of the predicted data vector.

Results from the setup and machine learning
The prototype system (setup shown in Figures 1 and 3) was used for collecting the PV and environmental parameters and PV power output data from the period November 2014 until October 2016. Summary of the PV and environmental parameters and the data used for deriving the predictive model of the PV power is shown in Table 2. Table 3 summarizes the parameters selected based on the feature selection techniques CFS and relief F. Table 4 summarizes the performance of the different classical ML techniques with the different feature selection techniques. It shows both the Training and Testing Phase performance metrics. It can be clearly seen the best performance is the CFS feature selection technique using the GPR algorithm with RMSE of 12.7144 watts compared to the maximum power of 114.2017 watts generated from the setup, as shown in Table 2. Table 5 summarizes the performance of the ANN best trained network found using the algorithm in Figure 8 and with the different feature selection techniques. It can be clearly seen that the ANN trained network outperforms the techniques in the classical ML techniques. In ANN, without feature selection techniques provides the best testing performance with RMSE of 5.48 watts compared to the maximum power of 114.20 watts generated from the setup, as shown in Table 2.

Conclusion and future work
A customized PV system was developed at Qatar University to monitor, analyze and evaluate the performance of PV using various weather factors. The study also showed details of how the data collected could be used for training different ML algorithms which were compared using different statistical analytical tools. Several feature selection techniques were also used to avoid the problem of overfitting. Comparison between the different ML techniques and different feature selection techniques helped in concluding an ANN model to be used for predicting PV performance using different environment and electrical parameters. The paper also showed the opportunity of tuning the ANN by varying the number of hidden layers and changing the training algorithm. This study describes the development of an inhouse customized DAS system that is viable for monitoring PV systems under Qatar's climate and which comprises of two parts: hardware and software. Also, the study is enhanced by describing the calibration tools that are necessary in such studies. The remainder of the study is as follows: Section 2 describes the hardware and signal acquisition. Section 3 depicts the ML used for the data gathered throughout the duration of the study. Section 4 discusses the results from the developed system and the ML results. Finally, the conclusion and future work is provided in Section 5.