Data on Support Vector Machines (SVM) model to forecast photovoltaic power

The data concern the photovoltaic (PV) power, forecasted by a hybrid model that considers weather variations and applies a technique to reduce the input data size, as presented in the paper entitled “Photovoltaic forecast based on hybrid pca-lssvm using dimensionality reducted data” (M. Malvoni, M.G. De Giorgi, P.M. Congedo, 2015) [1]. The quadratic Renyi entropy criteria together with the principal component analysis (PCA) are applied to the Least Squares Support Vector Machines (LS-SVM) to predict the PV power in the day-ahead time frame. The data here shared represent the proposed approach results. Hourly PV power predictions for 1,3,6,12, 24 ahead hours and for different data reduction sizes are provided in Supplementary material.


a b s t r a c t
The data concern the photovoltaic (PV) power, forecasted by a hybrid model that considers weather variations and applies a technique to reduce the input data size, as presented in the paper entitled "Photovoltaic forecast based on hybrid pca-lssvm using dimensionality reducted data" (M. Malvoni Table   Subject  The pre-processed data of outdoor measurements and PV output power represent the input set to implement the forecasting models. LS-SVM in combination with technique to resize the training data is implemented to predict the PV power in ahead day time frame. Data source location Lecce, Italy (40°19'32"'16 N, 18°5'52"'44 E)

Specifications
Data accessibility Data are within this article

Value of the data
The provided data may be useful to compare the PV system performance in different climatic conditions.
The PV power predictions performed through LS-SVM, in combination with a technique to dimensionally reduce the training data, may be used to compare the prediction methods accuracy.
The forecasted PV power can be applied in the planning, operation and management of power systems in view of the future development of smart grids.
The prediction of PV power can be used to optimize the electric vehicles integration and to manage the recharging status [2].

Data
The hourly PV power predicted by LS-SVM, combined with a technique to reduce the size of the training data, is provided for different input data dimensions in Supplementary file (online). For each size, hourly PV power predictions are referred at 1,3,6,12, 24 ahead hours.

Experimental data
The shared data refer to a grid connected 960 kWp PV system sited in Lecce, Southern Italy (40°19'32''16N, 18°5'52''44E), to supply the electricity to the utilities of the University of Salento Campus. The 3000 monocrystalline silicon PV modules are installed on shelters as car parking roof. The PV modules present the same azimuth angle of 10°, but they are tilted of two different angle, 3°and 15°r espectively. More technical details may be found in [3]. An integrated web-connected system is available to measure and monitor the weather parameters and the PV output power. The PV field is equipped of a pyranometer to measure of solar radiation on the module's plane every 1 minute. Temperature probes provide measurements of the ambient temperature and the module temperature every 10 min. PV output power is sampled every 1 min. The data as monitored are collected in web site with private access.

Data pre-processing
For more accurate predictions, the PV energy forecasting models request the knowledge of meteo parameters [4,5]. Therefore the weather data, referred to one year of measurements from 06/03/2012 to 30/03/2013, was considered for a total number of 9.192 hourly samples. In order to align them to the same sample step of 1 h, a data pre-processing was performed. For this aim, hourly average values of temperatures, solar irradiance and PV power were computed, as defined in [6].
The same meteo data have already been applied by the authors in [7] to investigate a hybrid statistical models based on LS-SVM with Wavelet Decomposition (WD) and in [8] to propose a novel hybrid algorithm GLSSVM (Group Least Square Support Vector Machine), based on the combination of the LS-SVM and a unexplored neural network known as GMDH (Group Method of Data Handling) implementing different strategies (Direct, Recursive and DirRec) for multi-step ahead forecast.

Data processing method
In [1] a hybrid method based on an active selection of the support vectors, using the quadratic Renyi entropy criteria in combination with the PCA is proposed to forecast the PV power output at different ahead hours.
In order to implement the forecasting methods, the input dataset of the following parameters was considered: S, variable to represent the season in which the data was measured; H, hourly instant to refer the data; T a , ambient temperature (°C); T mod , module temperature (°C); G 3,n and G 15,n solar irradiance difference at tilt 3°and 15°respectively, defined as the difference between the solar irradiance on tilted plane and the related hourly clear sky irradiance for each hour i, and computed as follow: where I k is the measured irradiance for each plane of array, I sky,k is the corresponded solar irradiance in clear sky condition and I 0 ¼1.000 W/m 2 is the solar radiance at Standard Test Condition. Difference time series G k (i) was normalized as follows T h , target of PV output power at the h prediction time horizon given as the cumulated PV power in h consecutive hours and scaled to at its maximum value, defined by: where P(i) is the hourly average values PV power as computed during the data pre-processing. The hourly mean ambient temperature, module temperature, solar irradiance measured on two tilted planes (I 3 and I 15 ) and the hourly mean PV power are provided in [6]. The time series of G k,n and T h were computed. The input dataset was divided in two subsets. The samples referred to May, August, November and February (6336 about the 35% of the total records) were applied to test and the remain months (2856, about 65% of records) to train them. The outcomes of the forecasting models represent the cumulated and normalized power, predicted for each time horizonT h i ð Þ. The output data referred to the train set and the test set are shared in Supplementary material, File 1. The time seriesT h i ð Þ is provided at five time horizons, implementing the proposed methodology for four cases, as summarized in Table 1. More details are provided in [1].

Transparency document. Supplementary material
Transparency data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2016.08.024.