German country-wide renewable power generation from solar plus wind mined with an optimized data matching algorithm utilizing diverse variables

Wood, David A.

doi:10.1007/s12667-019-00347-x

German country-wide renewable power generation from solar plus wind mined with an optimized data matching algorithm utilizing diverse variables

Original Paper
Published: 16 July 2019

Volume 11, pages 1003–1045, (2020)
Cite this article

Energy Systems Aims and scope Submit manuscript

David A. Wood ORCID: orcid.org/0000-0003-3202-4069¹

207 Accesses
4 Citations
Explore all metrics

Abstract

Country-wide, hourly-averaged solar plus wind power generation (MW) data (8784 data records) published for Germany in 2016 is compiled to include ten influential variables related weather, ground-surface environmental and a specifically calculated day-head electricity price index. The transparent open box (TOB) learning network, a recently developed optimized nearest neighbour, data matching, prediction algorithm, accurately predicts MW and facilitates data mining for this historical dataset. The TOB analysis results in MW prediction outliers for about 1.5% of the data records. These outliers are revealed via TOB analysis to be related to uncommon conditions occurring on a few specific days typical over hourly sequences involving rapid change in weather-related conditions. Such outliers are readily identified and explained individually by the TOB algorithm’s data mining capabilities. A slightly filtered dataset (excluding 129 identified outliers) improves TOB’s prediction accuracy. The TOB algorithm facilitates accurate predictions and detailed evaluation over a range of historical temporal scales on a country-wide basis that could also be applied to regional spatial predictions. These attributes of the TOB method are conducive with it eventually being incorporated into forward-looking renewable forecasting frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solar and wind power data from the Chinese State Grid Renewable Energy Generation Forecasting Competition

Article Open access 21 September 2022

Solar plus wind country-wide electrical power forecasts across successive years by optimized data matching

Article 27 April 2020

Data Resource Library for Renewable Energy Prediction/Forecasting

References

Alfadda, A., Adhikari, R., Kuzlu, M., Rahman, S.: Hour-ahead solar PV power forecasting using SVR based approach. In: 2017 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, pp 1–5 (2017)
Al-Shamisi, M.H., Assi, A.H., Hejase, H.A.N.: Artificial neural networks for predicting global solar radiation in Al-Ain City. UAE Int. J. Green Energy 10, 443–456 (2013)
Google Scholar
Amarasinghe, P.A.G.M., Abeygunawardane, S.K.: Application of machine learning algorithms for solar power forecasting in Sri Lanka. In: 2nd International conference on electrical engineering (EECon) 28 Sep 2018, Sri Lanka, pp. 87–92 (2018)
Antonanzas, J., Osorio, N., Escobar, R., Urraca, R., Martinez-de Pison, F.J., Antonanzas-Torres, F.: Review of photovoltaic power forecasting. Sol. Energy 136, 78–111 (2016)
Google Scholar
Arora, S., Singh, S.: The firefly optimization algorithm: convergence analysis and parameter selection. Int. J. Comput. Appl. 69(3), 48–52 (2013)
Google Scholar
Arora, S., Singh, S.: Performance research on firefly optimization algorithm with mutation. In: International Conference on Communication, Computing and Systems, pp. 168–172 (2014)
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning. Artif. Intell. Rev. 11(1–5), 11–73 (1997)
Google Scholar
Bacher, P., Madsen, H., Nielsen, H.: Online short-term solar power forecasting. Sol. Energy 83, 1772–1783 (2009)
Google Scholar
Birattari, M., Bontempi, G., Bersini, H.: Lazy learning meets the recursive least squares algorithm. Adv. Neural Inf. Process. Syst. 11, 375–381 (1999). (MIT Press, Cambridge, MA)
MATH Google Scholar
Bontempi, G., Birattari, M., Bersini, H.: Lazy learning for local modeling and control design. Int. J. Control 72(7/8), 643–658 (1999)
MATH Google Scholar
Brown, B.G., Katz, R.W., Murphy, A.H.: Time series models to simulate and forecast wind speed and wind power. J. Clim. Appl. Meteorol. 23(8), 1184–1195 (1984)
Google Scholar
Catalao, J.P.S., Pousinho, H.M.I., Mendes, V.M.F.: An artificial neural network approach for short-term wind power forecasting in Portugal. In: 15th International Conference on Intelligent System Applications to Power Systems (2009)
Chen, J.L., Liu, H.B., Wu, W.: Estimation of monthly solar radiation from measured temperatures using support vector machines. Renew. Energy 36(1), 413–420 (2011)
Google Scholar
Chen, G.H., Shah, D.: Explaining the success of nearest neighbor methods in prediction. Found. Trends Mach. Learn. 10(5–6), 337–588 (2018)
MATH Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
MATH Google Scholar
Dowell, J., Pinson, P.: Very-short-term probabilistic wind power forecasts by sparse vector autoregression. IEEE Trans. Smart Grid 7, 763–770 (2016)
Google Scholar
Energinet. The Danish national transmission system operator for electricity and natural gas published Elspot Prices. https://www.energidataservice.dk/en/dataset/elspotprices (2019). Accessed 26 Mar 2019
Erdem, E., Shi, J.: ARMA based approaches for forecasting the tuple of wind speed and direction. Appl. Energy 88, 1405–1414 (2011)
Google Scholar
Ezzat, A.A., Jun, M., Ding, Y.: Spatio-temporal asymmetry of local wind fields and its impact on short-term wind forecasting. IEEE Trans. Sustain. Energy 9, 1437–1447 (2018)
Google Scholar
Ezzat, A.A., Jun, M., Ding, Y.: Spatio-temporal short-term wind forecast: a calibrated regime-switching method. The Annals of Applied Statistics. Accepted. https://www.imstat.org/journals-and-publications/annals-of-applied-statistics/annals-of-applied-statistics-next-issues/ (2019). Accessed 12 July 2019
Ferreira, H.: Predicting wind and solar generation from weather data using machine learning. https://nbviewer.jupyter.org/github/hugorcf/Renewable-energy-weather/blob/master/renewable.ipynb (2018). Accessed 26 Mar 2019
Filipe, J.M., Bessa, R.J., Sumaili, J., Tomé, R., Sousa, S.N.: A hybrid short-term solar power forecasting tool. In: 2015 18th International Conference on Intelligent System Application to Power Systems (ISAP), Porto, pp. 1–6 (2015)
Fix, E., Hodges, Jr., J.L.: Discriminatory analysis, nonparametric discrimination: consistency properties. Technical report, USAF School of Aviation Medicine (1951)
Focken, U., Lange, M., Mönnich, K., Waldl, H., Beyer, H., Luig, A.: Short-term prediction of the aggregated power output of wind farms—a statistical analysis of the reduction of the prediction error by spatial smoothing effects. J. Wind. Eng. Ind. Aerodyn. 90(3), 231–246 (2002)
Google Scholar
Foley, A.M., Leahy, P.G., Marvuglia, A., McKeogh, E.J.: Current methods and advances in forecasting of wind power generation. Renew. Energy 37, 1–8 (2012)
Google Scholar
Frontline Solvers: Standard excel solver—limitations of nonlinear optimization. https://www.solver.com/standard-excel-solver-limitations-nonlinear-optimization (2019). Accessed 26 Mar 2019
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Google Scholar
Gensler, A., Henze, J., Sick, B., Raabe, N: Deep learning for solar power forecasting—an approach using AutoEncoder and LSTM Neural Networks. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, pp. 002858–002865 (2016)
Giebel, G., Brownsword, R., Kariniotakis, G., Denhard, M., Draxl, C.: The state-of-the-art in short-term prediction of wind power: a literature overview, 2nd edn. Tech. Rep., ANEMOS.plus. (2011). https://doi.org/10.13140/rg.2.1.2581.4485
Gneiting, T., Larson, K., Westrick, K., Genton, M., Aldrich, E.: mCalibrated probabilistic forecasting at the stateline wind energy center. J. Am. Stat. Assoc. 101, 968–979 (2006)
MATH Google Scholar
Golestaneh, F., Pinson, P., Gooi, H.B.: Very short-term nonparametric probabilistic forecasting of renewable energy generation—with application to solar energy. IEEE Trans. Power Syst. (2016). https://doi.org/10.1109/TPWRS.2015.2502423
Article Google Scholar
Gul, A., Perperoglou, A., Khan, Z., Mahmoud, O., Miftahuddin, M., Adler, W., Lausen, B: Ensemble of a subset of kNN classifiers. Adv. Data. Anal. Classif. 12(4), 827–40 (2018). https://doi.org/10.1007/s11634-015-0227-5
Han, S., Liu, Y., Yan, J.: Neural network ensemble method study for wind power prediction. In: Asia Pacific Power and Energy Engineering Conference (APPEEC) (2011)
Heinermann, J., Kramer, O.: Precise wind power prediction with SVM ensemble regression. In: Artificial Neural Networks and Machine Learning—ICANN, pp. 797–804. Springer, Switzerland (2014)
Hering, A., Genton, M.: Powering up with space-time wind forecasting. J. Am. Stat. Assoc. 105, 92–104 (2010)
MathSciNet MATH Google Scholar
Hong, T., Pinson, P., Fan, S., Zareipour, H., Troccoli, A., Hyndman, R.J.: Probabilistic energy forecasting: global energy forecasting competition 2014 and beyond. Int. J. Forecast. 32(3), 896–913 (2016)
Google Scholar
Inman, R.H., Pedro, H.T.C., Coimbra, C.F.M.: Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci. 39(6), 535–576 (2013)
Google Scholar
Jia, F., Lei, Y., Lin, J., Zhou, X., Lu, N.: Deep neural networks: a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mech. Syst. Signal Process 72–73, 303–315 (2016)
Google Scholar
Jursa, R., Rohrig, K.: Short-term wind power forecasting using evolutionary algorithms for the automated specification of artificial intelligence models. Int. J. Forecast. 24, 694–709 (2008)
Google Scholar
Kazem, H.A., Yousif, J.H., Chaichan, M.T.: Modelling of daily solar energy system prediction using support vector machine for oman. Int. J. Appl. Eng. Res. 11(20), 10166–10172 (2016)
Google Scholar
Khatib, T., Mohamed, A., Sopian, K., Mahmoud, M.: Solar energy prediction for Malaysia using artificial neural networks. Int. J. Energy 6(1), 1–16 (2012)
Google Scholar
Kostylev, V., Pavlovski, A.: Solar power forecasting performance—towards industry standards. In: 1st International Workshop on the Integration of Solar Power into Power Systems, Aarhus, Denmark (2011)
Kramer, O., Gieseke, F.: Analysis of wind energy time series with kernel methods and neural networks. In: 7th International Conference on Natural Computation (2011)
Kusiak, A., Zheng, H., Song, Z.: Short-term prediction of wind farm power: a data mining approach. IEEE Trans. Energy Convers. 24(1), 125–136 (2009)
Google Scholar
Lange, M., Focken, U.: Physical Approach to Short-Term Wind Power Prediction. Springer, Berlin (2006). (ISBN-10 3-540-25662-8S)
Google Scholar
Leahy, K., Hu, R.L., Konstantakopoulis, I.C., Spanos, C.J., Agogino, A.M.: Diagnosing wind turbine faults using machine learning techniques applied to operational data. In: IEEE International Conference on Prognostics and Health Management (ICPHM) 22–26 June 2016. (2016). https://doi.org/10.1109/icphm.2016.7542860
Lever, J., Krywinski, M., Altman, N.: Model selection and overfitting. Nat Methods 13, 703–704 (2016). https://doi.org/10.1038/nmeth.3968
Article Google Scholar
Mohammed, A.A., Yaqub, W., Aung, Z.: Probabilistic forecasting of solar power: an ensemble learning approach. Intell. Decis. Technol. Smart Innov. Syst. Technol. 39, 449–458 (2015)
Google Scholar
Mohandes, M.A., Rehmann, S., Halawani, T.O.: A neural networks approach for wind speed prediction. Renew. Energy 13(3), 345–354 (1998)
Google Scholar
Mori, H., Takahashi, A.: A data mining method for selecting input variables for forecasting model of global solar radiation. In: Transmission and Distribution Conference and Exposition (T&D), IEEE, pp. 1–6 (2012)
Nageem, R., Jayabarathi, R.: Predicting the power output of a grid-connected solar panel using multi-input support vector regression. Proc. Comput. Sci. 115, 723–730 (2017)
Google Scholar
OPSD: European power system data in five packages. Open Power System Data. https://data.open-power-system-data.org/ (2019). Accessed 26 Mar 2019
OPSD Time Series: Load, wind and solar, prices in hourly resolution. https://doi.org/10.25832/time_series/2018-06-30 (2019). Accessed 26 Mar 2019
OPSD Weather Data: Hourly geographically aggregated weather data for Europe. https://doi.org/10.25832/weather_data/2018-09-04 (2019). Accessed 26 Mar 2019
Pal, S.K., Raj, C.S., Singh, A.P.: Comparative study of firefly algorithm and particle swarm optimization for noisy non-linear optimization problems. I. J. Intell. Syst. Appl. 10, 50–57 (2012)
Google Scholar
Pinson, P.: Wind energy: forecasting challenges for its operational management. Stat. Sci. 28, 564–585 (2013). https://doi.org/10.1214/13-STS445
Article MathSciNet MATH Google Scholar
Rana, M., Koprinska, I., Agelidis, V.G.: Solar power forecasting using weather type clustering and ensembles of neural networks. In: International Joint Conference on Neural Networks (IJCNN). Vancouver, BC, pp. 4962–4969 (2016)
Reikard, G.: Predicting solar radiation at high resolutions: a comparison of time series forecasts. Sol. Energy 83(3), 342–349 (2009)
Google Scholar
Samworth, R.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012)
MathSciNet MATH Google Scholar
Sanchez, I.: Short-term prediction of wind energy production. Int. J. Forecast. 22(1), 43–56 (2006)
MathSciNet Google Scholar
Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-neighbor methods in learning and vision: theory and practice (neural information processing). The MIT Press, Cambridge (2006). (ISBN:026219547X)
Google Scholar
Shin, Y.E., Ding, Y., Huang, J.Z.: Covariate matching methods for testing and quantifying wind turbine upgrades. Ann. Appl. Stat. 12, 1271–1292 (2018)
MathSciNet MATH Google Scholar
Sharma, N., Sharma, P., Irwin, D., Shenoy, P. Predicting solar generation from weather forecasts using machine learning. In: Proceedings of the 2011 IEEE International Conference on Smart Grid Communications, pp. 28–533 (2011)
Sivaneasan, B., Yu, C.Y., Goh, K.P.: Solar forecasting using ANN with fuzzy logic pre-processing. Energy Proc. 143, 727–732 (2017)
Google Scholar
Soman, S.S., Zareipour, H., Malik, O., Mandal, P: A review of wind power and wind speed forecasting methods with different time horizons. In: North American Power Symposium (NAPS), pp. 1–8 (2010)
Stetco, A., Dinmohammadi, F., Zhao, X.: Machine learning methods for wind turbine condition monitoring: a review. Renew. Energy 133, 620–635 (2019)
Google Scholar
Treiber, N.A., Heinermann, J., Kramer, O.: Wind power prediction with machine learning. In: Lässig J., Kersting K., Morik K. (eds) Computational Sustainability. Studies in Computational Intelligence, 645, 13–29. Springer, Cham (2016)
Vladislavleva, E., Friedrich, T., Neumann, F., Wagner, M.: Predicting the energy output of wind farms based on weather data: important variables and their correlation. Renew. Energy 50, 236–243 (2013). https://doi.org/10.1016/j.renene.2012.06.036
Article Google Scholar
Voyant, C., Notton, G., Kalogirou, S., Nivet, M.L., Paoli, C., Motte, F., Fouilloy, A.: Machine learning methods for solar radiation forecasting: a review. Renew. Energy 105, 569–582 (2017)
Google Scholar
Wang, J., Li, P., Ran, R., Che, Y., Zhou, Y.: A short-term photovoltaic power prediction model based on the gradient boost decision tree. Appl. Sci. 8, 689 (2018). https://doi.org/10.3390/app8050689
Article Google Scholar
Wood, D.A.: Metaheuristic profiling to assess performance of hybrid evolutionary optimization algorithms applied to complex wellbore trajectories. J. Nat. Gas Sci. Eng. 33, 751–768 (2016). https://doi.org/10.1016/j.jngse.2016.05.041
Article Google Scholar
Wood, D.A.: Evolutionary memetic algorithms supported by metaheuristic profiling effectively applied to the optimization of discrete routing problems. J. Nat. Gas Sci. Eng. 35, 997–1014 (2016). https://doi.org/10.1016/j.jngse.jngse.2016.09.031
Article Google Scholar
Wood, D.A.: A transparent open-box learning network provides insight to complex systems and a performance benchmark for more-opaque machine learning algorithms. Adv. Geo-Energy Res. 2(2), 148–162 (2018)
Google Scholar
Wood, D.A.: Transparent open-box learning network provides auditable predictions for coal gross calorific value. Model. Earth Syst. Environ. (2018). https://doi.org/10.1007/s40808-018-0543-9. (published online 16 November, 2018)
Article Google Scholar
Wood, D.A.: Thermal maturity and burial history modelling of shale is enhanced by use of Arrhenius time-temperature index and memetic optimizer. Petroleum 4, 25–42 (2018). https://doi.org/10.1016/j.petlm.2017.10.004
Article Google Scholar
Wood, D.A., Choubineh, A., Vaferi, B.: Transparent open-box learning network provides auditable predictions: pool boiling heat transfer coefficient for alumina-water-based nanofluids. J. Therm. Anal. Calorim. (2018). https://doi.org/10.1007/s10973-018-7722-9. (Published online: 20 pages)
Article Google Scholar
Yan, J., Li, K., Bai, E., Deng, J., Foley, A.: Hybrid probabilistic wind power forecasting using temporally local Gaussian process. IEEE Trans. Sustain. Energy 7, 87–95 (2016)
Google Scholar
Yang, X.-S.: Firefly Algorithms for Multimodal Optimization, in: Stochastic Algorithms: Foundations and Applications, SAGA, Lecture Notes in Computer Sciences, vol. 5792, pp. 169–178 (2009)
Yang, X.-S., He, X.: Firefly algorithm: recent advances and applications. Int. J. Swarm Intell. 1(1), 36–50 (2013)
Google Scholar
Zamo, M., Mestre, O., Arbogast, P., Pannekoucke, O.: A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production. Part II: probabilistic forecast of daily production. Sol. Energy 105, 804–816 (2014)
Google Scholar
Zeng, J., Qiao, W.: Short-term solar power prediction using a support vector machine. Renew. Energy 52, 118–127 (2013)
Google Scholar

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

DWA Energy Limited, Lincoln, UK
David A. Wood

Authors

David A. Wood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David A. Wood.

Ethics declarations

Conflict of interest

The author declares no conflicts of interest related to the topics addressed in this study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 1583 kb)

Appendices

Appendix A: dataset analysed

The compiled pre-processed hourly-average dataset analysed in this study is included as a supplementary file. The underlying data is published online and freely available in two distinct OPSD databases [52–54], but is not assembled there in the convenient hourly-averaged form for each variable as compiled and used in this study.

Appendix B: transparent open box algorithm calculation steps

The fourteen steps of the transparent open-box (TOB) algorithm [73, 74] are outlined here.

2.1 TOB Stage 1 (initial prediction applying unweighted data matching)

Step 1 Data is arranged in a 2-D array (M data records; N independent variables plus one dependent variable, i.e., the variable to be predicted, i.e., megawatts of power generated in this study).

Step 2 Data records are sorted into ascending or descending order of the prediction variable’s values.

Step 3 Each variable is expressed in basic statistical terms to characterize the variable distributions involved in the dataset to be evaluated (e.g., Table 1).

Step 4 Each variable’s maximum and minimum values are used to normalize the dataset over a scale of − 1 to + 1 by applying Eq. (10).

$$ X_{i} * = \, 2 \times \left[ {\left( {X_{i} - \, Xmin} \right)/\left( {Xmax - Xmin} \right)} \right] - 1 $$

(10)

where X_i= the ith data record for X of N + 1 variables, Xmin = minimum of variable X for the entire dataset, Xmax = maximum of variable X for all data records, X_i* = normalized value for the ith record for X of N + 1 variables.

Step 5 Verify the values of the dataset confirming that all variables are indeed correctly expressed in normalized terms i.e., − 1 ≤ X_i* ≤ +1.

Step 6 Allocate all the data records in the dataset to either a training subset, a tuning subset or a testing subset. This allocation is best not performed in a random manner to ensure that the full variable range is sampled by the tuning and testing subsets. Arranging the data records in ascending or descending order of dependent variable values (Step 2) enables the dataset to be sampled repeatedly at different offset intervals across the full value range when populating the tuning and testing subsets. Sensitivity analysis is required to establish the optimum allocations to each subset that provide the most accurate predictions. Typically, the training subset is likely to constitute more than seventy percent of the dataset records. The two-stage process of the TOB facilitates such sensitivity analysis. A series of TOB cases are run with different sizes of tuning subsets. Comparing the prediction performances of TOB stages 1 and 2 identifies the minimum number of data records needed in the tuning subset for the Stage 2 TOB predictions to consistently outperform the stage 1 predictions.

The requirement for separate tuning and training subsets is to enable the optimizer to tune the weights applied to the squared errors of each independent variables in the records of the training. This tuning approach can provide accurate predictions for a representative, but relatively small number of data records in the tuning subsets. For large datasets (e.g., many thousands of data records and multiple variables) large tuning subsets (e.g., substantially greater than about 150) tend to increase the computational effort without providing any benefits to the optimization process of TOB stage 2. By focusing on relatively small tuning subsets computational effort is reduced.

Step 7 Compute the variable squared error (VSE) for each of J tuning-subset records versus the K training-subset records using Eq. (11):

$$ VSE\left( X \right)_{jk} = \left[ {X_{k} \left( {tr} \right) - X_{j} \left( {tu} \right)} \right]^{2} $$

(11)

where $ X_{k} \left( {tr} \right) $ = variable X value for the kth training-subset record, $ X_{j} \left( {tu} \right) $ = variable X for the jth tuning-subset record, $ VSE\left( X \right)_{jk} $ = variable-squared error (VSE) for variable X for the jth tuning-subset record versus the kth training-subset record. ∑ $ VSE_{jk} $ = weighted sum of the computed VSE values applying Eq. (12):

$$ \sum VSE_{jk} = \mathop \sum \limits_{n = 1}^{n = N + 1} VSE\left( {Xn} \right)_{jk} \times \left( {Wn} \right) $$

(12)

where $ VSE\left( {Xn} \right)_{jk} $ = the variable-squared error (VSE) for variable Xn for the jth tuning-subset record versus the kth training-subset record. $ \sum VSE_{jk} $ = sum of variable-squared errors (VSE) for the N + 1 variables (including the dependent variable) for the jth tuning-subset record versus the kth data training-subset record.

Wn = weights (0 < Wn <= 1) applied to the calculated VSE for all variables involved in the prediction (i.e., N + 1). Each weight is set to a constant value (e.g., 0.5 is used in this study) in TOB stage 1. This avoids any bias being introduced into the ranked initial matches derived for the tuning versus training subsets.

Step 8 Rank the matching data records in the training subset versus each tuning-subset record. The training-subset record that possesses the smallest calculated ∑VSE value is identified as the best matching record for a specific tuning-subset data record. The top-Q-matching training-subset records, established for each tuning-subset record, are then selected for the initial TOB-stage-one prediction. Q = 10 has empirically been found to be sufficient to provide accurate TOB-stage-one predictions for a wide range of datasets. Increasing Q much above ten adds to the computational effort without providing benefits of sufficient improvements in prediction accuracy.

Step 9 The top ten, best matching records in the training subset for the jth tuning-subset record each contribute fractionally to the TOB-stage-one prediction for that record. The exact contribution fraction applied to each of those top-ten matching data records is established with Eqs. 13–15. The computed contribution fraction is calculated using the ∑VSE values for each training-subset record versus the jth tuning-subset record.

$$ f_{jq} = \sum VSE_{jq} /[\mathop \sum \limits_{r = 1}^{r = Q} \sum VSE_{jr} ] $$

(13)

where q = the qth of Q top-ranking training-subset records from the training subset for the jth tuning subset record. r = the rth of Q top-ranking training-subset records from the training subset for the jth tuning subset record. f_q= the contribution fraction calculated for the qth of Q top-ranking records for the jth tuning subset record.

Equation (14) adjusts the f_q values to sum to 1.

$$ \mathop \sum \limits_{q = 1}^{q = Q} f_{q} = 1 $$

(14)

The best-matching training-subset record (i.e., the one with the lowest $ \sum VSE_{jk} $ value) should make the greatest contribution to the prediction of the dependent variable associated with the j^th tuning-subset record. This is achieved by applying $ \left( {1 - f_{q} } \right) $ multipliers in Eq. (15).

$$ \left( {X_{N + 1} } \right)_{j}^{predicted} = \mathop \sum \limits_{q = 1}^{q = Q} \left[ {\left( {X_{N + 1} } \right)_{q} \times \left( {1 - f_{q} } \right) } \right] $$

(15)

where: $ \left( {X_{N + 1} } \right)_{q} $ = dependent variable for the qth training-subset record (i.e., one of Q best-matching records). $ \left( {X_{N + 1} } \right)_{j}^{predicted} $ = TOB-stage-one predicted-dependent-variable value for the jth tuning-subset record.

The output from Step 9 represents the TOB-stage-one prediction. It is provisional because it applies equal weights (Wn) to the variable squared errors (VSE). This prediction is further refined in TOB stage 2 optimization.

Step 10 Various statistical metrics are used to assess the accuracy of the TOB-stage-one predictions (see Sect. 3.3).

2.2 TOB Stage 2 (optimizing contributions from the best matching records)

Step 11 Optimized predictions are achieved by minimizing the root mean squared error (RMSE) metric (Eq. 3, Sect. 3.3). RMSE is computed using the squared prediction errors calculated for all J tuning-subset records. Three optimizers are run in this study to verify the results and generate a range of potentially optimum solutions (see Sect. 3.2).

Two TOB stage 2 minimization control metrics are Q and W_n are employed:

1.
The N input-variable weights (W_n) are allowed to vary across the fully constrained range (0 ≤ Wn ≤ 1) leaving the optimizer free to select the best values for them to minimize RMSE. It is not unusual for quite small non-zero values to be assigned to certain W_n by the optimizer. However, these low weights often have significant impacts that improve the prediction accuracy of TOB stage 2.
2.
The optimizer is allowed to vary Q (2 ≤ Q ≤ 10) for the TOB stage 2 calculations conducted with Eqs. 13–15.

Step 12 Statistical accuracy metrics for the TOB-stage-two predictions are computed and compared with those generated for the TOB-stage-1 predictions. Performing sensitivity analysis by optimizing with different fixed values of Q (i.e., Q = 2 to 10) also generates a set of sub-optimal solutions that also help assess potential underfitting or overfitting issues with the data set being evaluated.

Step 13 Compute TOB-stage-one and TOB-stage-two predictions for the independent testing subset records applying the optimum values established for W_n and Q in step 11 established by tuning the tuning subset Assess these predictions with the statistical accuracy metrics and compare them with those obtained for the tuning subset (e.g. Tables 3 and 5). They should be in comparable ranges. If they are not it is indicative that the tuning process may be either underfitting or overfitting the dataset. Further verification of the TOB Stage 2 optimum solutions prediction performance can be obtained by applying those solutions to the all records in the dataset and assessing the prediction accuracy achieved (e.g., Table 6).

As part of this step it is often appropriate to audit the intermediate calculation steps to reveal which variable errors (VSE) are weighted to contribute the most to the TOB Stage 2 predictions. Reviewing the intermediate calculations can also facilitate comprehensive outlier analysis (e.g., see Sect. 5.1). This provides insight as to why some data records lead to less-accurate predictions than the main trend of predictions). Also segmental analysis (see Sect. 4.3) identifies regions of the dependent-variable range for which the TOB method provides greater or lesser prediction accuracy.

Step 14 Compare the prediction accuracy provided by the TOB algorithm with other prediction methods (e.g., regression-based, machine learning algorithms and empirical methods). Such analysis for the renewable power generation dataset studied here will be presented in future studies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wood, D.A. German country-wide renewable power generation from solar plus wind mined with an optimized data matching algorithm utilizing diverse variables. Energy Syst 11, 1003–1045 (2020). https://doi.org/10.1007/s12667-019-00347-x

Download citation

Received: 27 March 2019
Accepted: 07 July 2019
Published: 16 July 2019
Issue Date: November 2020
DOI: https://doi.org/10.1007/s12667-019-00347-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

German country-wide renewable power generation from solar plus wind mined with an optimized data matching algorithm utilizing diverse variables

Abstract

Access this article

Similar content being viewed by others

Solar and wind power data from the Chinese State Grid Renewable Energy Generation Forecasting Competition

Solar plus wind country-wide electrical power forecasts across successive years by optimized data matching

Data Resource Library for Renewable Energy Prediction/Forecasting

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary material 1 (XLSX 1583 kb)

Appendices

Appendix A: dataset analysed

Appendix B: transparent open box algorithm calculation steps

2.1 TOB Stage 1 (initial prediction applying unweighted data matching)

2.2 TOB Stage 2 (optimizing contributions from the best matching records)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

German country-wide renewable power generation from solar plus wind mined with an optimized data matching algorithm utilizing diverse variables

Abstract

Access this article

Similar content being viewed by others

Solar and wind power data from the Chinese State Grid Renewable Energy Generation Forecasting Competition

Solar plus wind country-wide electrical power forecasts across successive years by optimized data matching

Data Resource Library for Renewable Energy Prediction/Forecasting

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary material 1 (XLSX 1583 kb)

Appendices

Appendix A: dataset analysed

Appendix B: transparent open box algorithm calculation steps

2.1 TOB Stage 1 (initial prediction applying unweighted data matching)

2.2 TOB Stage 2 (optimizing contributions from the best matching records)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation