Skip to main content

Advertisement

Log in

German country-wide renewable power generation from solar plus wind mined with an optimized data matching algorithm utilizing diverse variables

  • Original Paper
  • Published:
Energy Systems Aims and scope Submit manuscript

Abstract

Country-wide, hourly-averaged solar plus wind power generation (MW) data (8784 data records) published for Germany in 2016 is compiled to include ten influential variables related weather, ground-surface environmental and a specifically calculated day-head electricity price index. The transparent open box (TOB) learning network, a recently developed optimized nearest neighbour, data matching, prediction algorithm, accurately predicts MW and facilitates data mining for this historical dataset. The TOB analysis results in MW prediction outliers for about 1.5% of the data records. These outliers are revealed via TOB analysis to be related to uncommon conditions occurring on a few specific days typical over hourly sequences involving rapid change in weather-related conditions. Such outliers are readily identified and explained individually by the TOB algorithm’s data mining capabilities. A slightly filtered dataset (excluding 129 identified outliers) improves TOB’s prediction accuracy. The TOB algorithm facilitates accurate predictions and detailed evaluation over a range of historical temporal scales on a country-wide basis that could also be applied to regional spatial predictions. These attributes of the TOB method are conducive with it eventually being incorporated into forward-looking renewable forecasting frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Alfadda, A., Adhikari, R., Kuzlu, M., Rahman, S.: Hour-ahead solar PV power forecasting using SVR based approach. In: 2017 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, pp 1–5 (2017)

  2. Al-Shamisi, M.H., Assi, A.H., Hejase, H.A.N.: Artificial neural networks for predicting global solar radiation in Al-Ain City. UAE Int. J. Green Energy 10, 443–456 (2013)

    Google Scholar 

  3. Amarasinghe, P.A.G.M., Abeygunawardane, S.K.: Application of machine learning algorithms for solar power forecasting in Sri Lanka. In: 2nd International conference on electrical engineering (EECon) 28 Sep 2018, Sri Lanka, pp. 87–92 (2018)

  4. Antonanzas, J., Osorio, N., Escobar, R., Urraca, R., Martinez-de Pison, F.J., Antonanzas-Torres, F.: Review of photovoltaic power forecasting. Sol. Energy 136, 78–111 (2016)

    Google Scholar 

  5. Arora, S., Singh, S.: The firefly optimization algorithm: convergence analysis and parameter selection. Int. J. Comput. Appl. 69(3), 48–52 (2013)

    Google Scholar 

  6. Arora, S., Singh, S.: Performance research on firefly optimization algorithm with mutation. In: International Conference on Communication, Computing and Systems, pp. 168–172 (2014)

  7. Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning. Artif. Intell. Rev. 11(1–5), 11–73 (1997)

    Google Scholar 

  8. Bacher, P., Madsen, H., Nielsen, H.: Online short-term solar power forecasting. Sol. Energy 83, 1772–1783 (2009)

    Google Scholar 

  9. Birattari, M., Bontempi, G., Bersini, H.: Lazy learning meets the recursive least squares algorithm. Adv. Neural Inf. Process. Syst. 11, 375–381 (1999). (MIT Press, Cambridge, MA)

    MATH  Google Scholar 

  10. Bontempi, G., Birattari, M., Bersini, H.: Lazy learning for local modeling and control design. Int. J. Control 72(7/8), 643–658 (1999)

    MATH  Google Scholar 

  11. Brown, B.G., Katz, R.W., Murphy, A.H.: Time series models to simulate and forecast wind speed and wind power. J. Clim. Appl. Meteorol. 23(8), 1184–1195 (1984)

    Google Scholar 

  12. Catalao, J.P.S., Pousinho, H.M.I., Mendes, V.M.F.: An artificial neural network approach for short-term wind power forecasting in Portugal. In: 15th International Conference on Intelligent System Applications to Power Systems (2009)

  13. Chen, J.L., Liu, H.B., Wu, W.: Estimation of monthly solar radiation from measured temperatures using support vector machines. Renew. Energy 36(1), 413–420 (2011)

    Google Scholar 

  14. Chen, G.H., Shah, D.: Explaining the success of nearest neighbor methods in prediction. Found. Trends Mach. Learn. 10(5–6), 337–588 (2018)

    MATH  Google Scholar 

  15. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    MATH  Google Scholar 

  16. Dowell, J., Pinson, P.: Very-short-term probabilistic wind power forecasts by sparse vector autoregression. IEEE Trans. Smart Grid 7, 763–770 (2016)

    Google Scholar 

  17. Energinet. The Danish national transmission system operator for electricity and natural gas published Elspot Prices. https://www.energidataservice.dk/en/dataset/elspotprices (2019). Accessed 26 Mar 2019

  18. Erdem, E., Shi, J.: ARMA based approaches for forecasting the tuple of wind speed and direction. Appl. Energy 88, 1405–1414 (2011)

    Google Scholar 

  19. Ezzat, A.A., Jun, M., Ding, Y.: Spatio-temporal asymmetry of local wind fields and its impact on short-term wind forecasting. IEEE Trans. Sustain. Energy 9, 1437–1447 (2018)

    Google Scholar 

  20. Ezzat, A.A., Jun, M., Ding, Y.: Spatio-temporal short-term wind forecast: a calibrated regime-switching method. The Annals of Applied Statistics. Accepted. https://www.imstat.org/journals-and-publications/annals-of-applied-statistics/annals-of-applied-statistics-next-issues/ (2019). Accessed 12 July 2019

  21. Ferreira, H.: Predicting wind and solar generation from weather data using machine learning. https://nbviewer.jupyter.org/github/hugorcf/Renewable-energy-weather/blob/master/renewable.ipynb (2018). Accessed 26 Mar 2019

  22. Filipe, J.M., Bessa, R.J., Sumaili, J., Tomé, R., Sousa, S.N.: A hybrid short-term solar power forecasting tool. In: 2015 18th International Conference on Intelligent System Application to Power Systems (ISAP), Porto, pp. 1–6 (2015)

  23. Fix, E., Hodges, Jr., J.L.: Discriminatory analysis, nonparametric discrimination: consistency properties. Technical report, USAF School of Aviation Medicine (1951)

  24. Focken, U., Lange, M., Mönnich, K., Waldl, H., Beyer, H., Luig, A.: Short-term prediction of the aggregated power output of wind farms—a statistical analysis of the reduction of the prediction error by spatial smoothing effects. J. Wind. Eng. Ind. Aerodyn. 90(3), 231–246 (2002)

    Google Scholar 

  25. Foley, A.M., Leahy, P.G., Marvuglia, A., McKeogh, E.J.: Current methods and advances in forecasting of wind power generation. Renew. Energy 37, 1–8 (2012)

    Google Scholar 

  26. Frontline Solvers: Standard excel solver—limitations of nonlinear optimization. https://www.solver.com/standard-excel-solver-limitations-nonlinear-optimization (2019). Accessed 26 Mar 2019

  27. Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)

    Google Scholar 

  28. Gensler, A., Henze, J., Sick, B., Raabe, N: Deep learning for solar power forecasting—an approach using AutoEncoder and LSTM Neural Networks. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, pp. 002858–002865 (2016)

  29. Giebel, G., Brownsword, R., Kariniotakis, G., Denhard, M., Draxl, C.: The state-of-the-art in short-term prediction of wind power: a literature overview, 2nd edn. Tech. Rep., ANEMOS.plus. (2011). https://doi.org/10.13140/rg.2.1.2581.4485

  30. Gneiting, T., Larson, K., Westrick, K., Genton, M., Aldrich, E.: mCalibrated probabilistic forecasting at the stateline wind energy center. J. Am. Stat. Assoc. 101, 968–979 (2006)

    MATH  Google Scholar 

  31. Golestaneh, F., Pinson, P., Gooi, H.B.: Very short-term nonparametric probabilistic forecasting of renewable energy generation—with application to solar energy. IEEE Trans. Power Syst. (2016). https://doi.org/10.1109/TPWRS.2015.2502423

    Article  Google Scholar 

  32. Gul, A., Perperoglou, A., Khan, Z., Mahmoud, O., Miftahuddin, M., Adler, W., Lausen, B: Ensemble of a subset of kNN classifiers. Adv. Data. Anal. Classif. 12(4), 827–40 (2018). https://doi.org/10.1007/s11634-015-0227-5

  33. Han, S., Liu, Y., Yan, J.: Neural network ensemble method study for wind power prediction. In: Asia Pacific Power and Energy Engineering Conference (APPEEC) (2011)

  34. Heinermann, J., Kramer, O.: Precise wind power prediction with SVM ensemble regression. In: Artificial Neural Networks and Machine Learning—ICANN, pp. 797–804. Springer, Switzerland (2014)

  35. Hering, A., Genton, M.: Powering up with space-time wind forecasting. J. Am. Stat. Assoc. 105, 92–104 (2010)

    MathSciNet  MATH  Google Scholar 

  36. Hong, T., Pinson, P., Fan, S., Zareipour, H., Troccoli, A., Hyndman, R.J.: Probabilistic energy forecasting: global energy forecasting competition 2014 and beyond. Int. J. Forecast. 32(3), 896–913 (2016)

    Google Scholar 

  37. Inman, R.H., Pedro, H.T.C., Coimbra, C.F.M.: Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci. 39(6), 535–576 (2013)

    Google Scholar 

  38. Jia, F., Lei, Y., Lin, J., Zhou, X., Lu, N.: Deep neural networks: a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mech. Syst. Signal Process 72–73, 303–315 (2016)

    Google Scholar 

  39. Jursa, R., Rohrig, K.: Short-term wind power forecasting using evolutionary algorithms for the automated specification of artificial intelligence models. Int. J. Forecast. 24, 694–709 (2008)

    Google Scholar 

  40. Kazem, H.A., Yousif, J.H., Chaichan, M.T.: Modelling of daily solar energy system prediction using support vector machine for oman. Int. J. Appl. Eng. Res. 11(20), 10166–10172 (2016)

    Google Scholar 

  41. Khatib, T., Mohamed, A., Sopian, K., Mahmoud, M.: Solar energy prediction for Malaysia using artificial neural networks. Int. J. Energy 6(1), 1–16 (2012)

    Google Scholar 

  42. Kostylev, V., Pavlovski, A.: Solar power forecasting performance—towards industry standards. In: 1st International Workshop on the Integration of Solar Power into Power Systems, Aarhus, Denmark (2011)

  43. Kramer, O., Gieseke, F.: Analysis of wind energy time series with kernel methods and neural networks. In: 7th International Conference on Natural Computation (2011)

  44. Kusiak, A., Zheng, H., Song, Z.: Short-term prediction of wind farm power: a data mining approach. IEEE Trans. Energy Convers. 24(1), 125–136 (2009)

    Google Scholar 

  45. Lange, M., Focken, U.: Physical Approach to Short-Term Wind Power Prediction. Springer, Berlin (2006). (ISBN-10 3-540-25662-8S)

    Google Scholar 

  46. Leahy, K., Hu, R.L., Konstantakopoulis, I.C., Spanos, C.J., Agogino, A.M.: Diagnosing wind turbine faults using machine learning techniques applied to operational data. In: IEEE International Conference on Prognostics and Health Management (ICPHM) 22–26 June 2016. (2016). https://doi.org/10.1109/icphm.2016.7542860

  47. Lever, J., Krywinski, M., Altman, N.: Model selection and overfitting. Nat Methods 13, 703–704 (2016). https://doi.org/10.1038/nmeth.3968

    Article  Google Scholar 

  48. Mohammed, A.A., Yaqub, W., Aung, Z.: Probabilistic forecasting of solar power: an ensemble learning approach. Intell. Decis. Technol. Smart Innov. Syst. Technol. 39, 449–458 (2015)

    Google Scholar 

  49. Mohandes, M.A., Rehmann, S., Halawani, T.O.: A neural networks approach for wind speed prediction. Renew. Energy 13(3), 345–354 (1998)

    Google Scholar 

  50. Mori, H., Takahashi, A.: A data mining method for selecting input variables for forecasting model of global solar radiation. In: Transmission and Distribution Conference and Exposition (T&D), IEEE, pp. 1–6 (2012)

  51. Nageem, R., Jayabarathi, R.: Predicting the power output of a grid-connected solar panel using multi-input support vector regression. Proc. Comput. Sci. 115, 723–730 (2017)

    Google Scholar 

  52. OPSD: European power system data in five packages. Open Power System Data. https://data.open-power-system-data.org/ (2019). Accessed 26 Mar 2019

  53. OPSD Time Series: Load, wind and solar, prices in hourly resolution. https://doi.org/10.25832/time_series/2018-06-30 (2019). Accessed 26 Mar 2019

  54. OPSD Weather Data: Hourly geographically aggregated weather data for Europe. https://doi.org/10.25832/weather_data/2018-09-04 (2019). Accessed 26 Mar 2019

  55. Pal, S.K., Raj, C.S., Singh, A.P.: Comparative study of firefly algorithm and particle swarm optimization for noisy non-linear optimization problems. I. J. Intell. Syst. Appl. 10, 50–57 (2012)

    Google Scholar 

  56. Pinson, P.: Wind energy: forecasting challenges for its operational management. Stat. Sci. 28, 564–585 (2013). https://doi.org/10.1214/13-STS445

    Article  MathSciNet  MATH  Google Scholar 

  57. Rana, M., Koprinska, I., Agelidis, V.G.: Solar power forecasting using weather type clustering and ensembles of neural networks. In: International Joint Conference on Neural Networks (IJCNN). Vancouver, BC, pp. 4962–4969 (2016)

  58. Reikard, G.: Predicting solar radiation at high resolutions: a comparison of time series forecasts. Sol. Energy 83(3), 342–349 (2009)

    Google Scholar 

  59. Samworth, R.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012)

    MathSciNet  MATH  Google Scholar 

  60. Sanchez, I.: Short-term prediction of wind energy production. Int. J. Forecast. 22(1), 43–56 (2006)

    MathSciNet  Google Scholar 

  61. Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-neighbor methods in learning and vision: theory and practice (neural information processing). The MIT Press, Cambridge (2006). (ISBN:026219547X)

    Google Scholar 

  62. Shin, Y.E., Ding, Y., Huang, J.Z.: Covariate matching methods for testing and quantifying wind turbine upgrades. Ann. Appl. Stat. 12, 1271–1292 (2018)

    MathSciNet  MATH  Google Scholar 

  63. Sharma, N., Sharma, P., Irwin, D., Shenoy, P. Predicting solar generation from weather forecasts using machine learning. In: Proceedings of the 2011 IEEE International Conference on Smart Grid Communications, pp. 28–533 (2011)

  64. Sivaneasan, B., Yu, C.Y., Goh, K.P.: Solar forecasting using ANN with fuzzy logic pre-processing. Energy Proc. 143, 727–732 (2017)

    Google Scholar 

  65. Soman, S.S., Zareipour, H., Malik, O., Mandal, P: A review of wind power and wind speed forecasting methods with different time horizons. In: North American Power Symposium (NAPS), pp. 1–8 (2010)

  66. Stetco, A., Dinmohammadi, F., Zhao, X.: Machine learning methods for wind turbine condition monitoring: a review. Renew. Energy 133, 620–635 (2019)

    Google Scholar 

  67. Treiber, N.A., Heinermann, J., Kramer, O.: Wind power prediction with machine learning. In: Lässig J., Kersting K., Morik K. (eds) Computational Sustainability. Studies in Computational Intelligence, 645, 13–29. Springer, Cham (2016)

  68. Vladislavleva, E., Friedrich, T., Neumann, F., Wagner, M.: Predicting the energy output of wind farms based on weather data: important variables and their correlation. Renew. Energy 50, 236–243 (2013). https://doi.org/10.1016/j.renene.2012.06.036

    Article  Google Scholar 

  69. Voyant, C., Notton, G., Kalogirou, S., Nivet, M.L., Paoli, C., Motte, F., Fouilloy, A.: Machine learning methods for solar radiation forecasting: a review. Renew. Energy 105, 569–582 (2017)

    Google Scholar 

  70. Wang, J., Li, P., Ran, R., Che, Y., Zhou, Y.: A short-term photovoltaic power prediction model based on the gradient boost decision tree. Appl. Sci. 8, 689 (2018). https://doi.org/10.3390/app8050689

    Article  Google Scholar 

  71. Wood, D.A.: Metaheuristic profiling to assess performance of hybrid evolutionary optimization algorithms applied to complex wellbore trajectories. J. Nat. Gas Sci. Eng. 33, 751–768 (2016). https://doi.org/10.1016/j.jngse.2016.05.041

    Article  Google Scholar 

  72. Wood, D.A.: Evolutionary memetic algorithms supported by metaheuristic profiling effectively applied to the optimization of discrete routing problems. J. Nat. Gas Sci. Eng. 35, 997–1014 (2016). https://doi.org/10.1016/j.jngse.jngse.2016.09.031

    Article  Google Scholar 

  73. Wood, D.A.: A transparent open-box learning network provides insight to complex systems and a performance benchmark for more-opaque machine learning algorithms. Adv. Geo-Energy Res. 2(2), 148–162 (2018)

    Google Scholar 

  74. Wood, D.A.: Transparent open-box learning network provides auditable predictions for coal gross calorific value. Model. Earth Syst. Environ. (2018). https://doi.org/10.1007/s40808-018-0543-9. (published online 16 November, 2018)

    Article  Google Scholar 

  75. Wood, D.A.: Thermal maturity and burial history modelling of shale is enhanced by use of Arrhenius time-temperature index and memetic optimizer. Petroleum 4, 25–42 (2018). https://doi.org/10.1016/j.petlm.2017.10.004

    Article  Google Scholar 

  76. Wood, D.A., Choubineh, A., Vaferi, B.: Transparent open-box learning network provides auditable predictions: pool boiling heat transfer coefficient for alumina-water-based nanofluids. J. Therm. Anal. Calorim. (2018). https://doi.org/10.1007/s10973-018-7722-9. (Published online: 20 pages)

    Article  Google Scholar 

  77. Yan, J., Li, K., Bai, E., Deng, J., Foley, A.: Hybrid probabilistic wind power forecasting using temporally local Gaussian process. IEEE Trans. Sustain. Energy 7, 87–95 (2016)

    Google Scholar 

  78. Yang, X.-S.: Firefly Algorithms for Multimodal Optimization, in: Stochastic Algorithms: Foundations and Applications, SAGA, Lecture Notes in Computer Sciences, vol. 5792, pp. 169–178 (2009)

  79. Yang, X.-S., He, X.: Firefly algorithm: recent advances and applications. Int. J. Swarm Intell. 1(1), 36–50 (2013)

    Google Scholar 

  80. Zamo, M., Mestre, O., Arbogast, P., Pannekoucke, O.: A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production. Part II: probabilistic forecast of daily production. Sol. Energy 105, 804–816 (2014)

    Google Scholar 

  81. Zeng, J., Qiao, W.: Short-term solar power prediction using a support vector machine. Renew. Energy 52, 118–127 (2013)

    Google Scholar 

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David A. Wood.

Ethics declarations

Conflict of interest

The author declares no conflicts of interest related to the topics addressed in this study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 1583 kb)

Appendices

Appendix A: dataset analysed

The compiled pre-processed hourly-average dataset analysed in this study is included as a supplementary file. The underlying data is published online and freely available in two distinct OPSD databases [5254], but is not assembled there in the convenient hourly-averaged form for each variable as compiled and used in this study.

Appendix B: transparent open box algorithm calculation steps

The fourteen steps of the transparent open-box (TOB) algorithm [73, 74] are outlined here.

2.1 TOB Stage 1 (initial prediction applying unweighted data matching)

Step 1 Data is arranged in a 2-D array (M data records; N independent variables plus one dependent variable, i.e., the variable to be predicted, i.e., megawatts of power generated in this study).

Step 2 Data records are sorted into ascending or descending order of the prediction variable’s values.

Step 3 Each variable is expressed in basic statistical terms to characterize the variable distributions involved in the dataset to be evaluated (e.g., Table 1).

Step 4 Each variable’s maximum and minimum values are used to normalize the dataset over a scale of − 1 to + 1 by applying Eq. (10).

$$ X_{i} * = \, 2 \times \left[ {\left( {X_{i} - \, Xmin} \right)/\left( {Xmax - Xmin} \right)} \right] - 1 $$
(10)

where Xi= the ith data record for X of N + 1 variables, Xmin = minimum of variable X for the entire dataset, Xmax = maximum of variable X for all data records, Xi* = normalized value for the ith record for X of N + 1 variables.

Step 5 Verify the values of the dataset confirming that all variables are indeed correctly expressed in normalized terms i.e., − 1 ≤ Xi* ≤ +1.

Step 6 Allocate all the data records in the dataset to either a training subset, a tuning subset or a testing subset. This allocation is best not performed in a random manner to ensure that the full variable range is sampled by the tuning and testing subsets. Arranging the data records in ascending or descending order of dependent variable values (Step 2) enables the dataset to be sampled repeatedly at different offset intervals across the full value range when populating the tuning and testing subsets. Sensitivity analysis is required to establish the optimum allocations to each subset that provide the most accurate predictions. Typically, the training subset is likely to constitute more than seventy percent of the dataset records. The two-stage process of the TOB facilitates such sensitivity analysis. A series of TOB cases are run with different sizes of tuning subsets. Comparing the prediction performances of TOB stages 1 and 2 identifies the minimum number of data records needed in the tuning subset for the Stage 2 TOB predictions to consistently outperform the stage 1 predictions.

The requirement for separate tuning and training subsets is to enable the optimizer to tune the weights applied to the squared errors of each independent variables in the records of the training. This tuning approach can provide accurate predictions for a representative, but relatively small number of data records in the tuning subsets. For large datasets (e.g., many thousands of data records and multiple variables) large tuning subsets (e.g., substantially greater than about 150) tend to increase the computational effort without providing any benefits to the optimization process of TOB stage 2. By focusing on relatively small tuning subsets computational effort is reduced.

Step 7 Compute the variable squared error (VSE) for each of J tuning-subset records versus the K training-subset records using Eq. (11):

$$ VSE\left( X \right)_{jk} = \left[ {X_{k} \left( {tr} \right) - X_{j} \left( {tu} \right)} \right]^{2} $$
(11)

where \( X_{k} \left( {tr} \right) \) = variable X value for the kth training-subset record, \( X_{j} \left( {tu} \right) \) = variable X for the jth tuning-subset record, \( VSE\left( X \right)_{jk} \) = variable-squared error (VSE) for variable X for the jth tuning-subset record versus the kth training-subset record. ∑ \( VSE_{jk} \) = weighted sum of the computed VSE values applying Eq. (12):

$$ \sum VSE_{jk} = \mathop \sum \limits_{n = 1}^{n = N + 1} VSE\left( {Xn} \right)_{jk} \times \left( {Wn} \right) $$
(12)

where \( VSE\left( {Xn} \right)_{jk} \) = the variable-squared error (VSE) for variable Xn for the jth tuning-subset record versus the kth training-subset record. \( \sum VSE_{jk} \) = sum of variable-squared errors (VSE) for the N + 1 variables (including the dependent variable) for the jth tuning-subset record versus the kth data training-subset record.

Wn = weights (0 < Wn <= 1) applied to the calculated VSE for all variables involved in the prediction (i.e., N + 1). Each weight is set to a constant value (e.g., 0.5 is used in this study) in TOB stage 1. This avoids any bias being introduced into the ranked initial matches derived for the tuning versus training subsets.

Step 8 Rank the matching data records in the training subset versus each tuning-subset record. The training-subset record that possesses the smallest calculated ∑VSE value is identified as the best matching record for a specific tuning-subset data record. The top-Q-matching training-subset records, established for each tuning-subset record, are then selected for the initial TOB-stage-one prediction. Q = 10 has empirically been found to be sufficient to provide accurate TOB-stage-one predictions for a wide range of datasets. Increasing Q much above ten adds to the computational effort without providing benefits of sufficient improvements in prediction accuracy.

Step 9 The top ten, best matching records in the training subset for the jth tuning-subset record each contribute fractionally to the TOB-stage-one prediction for that record. The exact contribution fraction applied to each of those top-ten matching data records is established with Eqs. 1315. The computed contribution fraction is calculated using the ∑VSE values for each training-subset record versus the jth tuning-subset record.

$$ f_{jq} = \sum VSE_{jq} /[\mathop \sum \limits_{r = 1}^{r = Q} \sum VSE_{jr} ] $$
(13)

where q = the qth of Q top-ranking training-subset records from the training subset for the jth tuning subset record. r = the rth of Q top-ranking training-subset records from the training subset for the jth tuning subset record. fq= the contribution fraction calculated for the qth of Q top-ranking records for the jth tuning subset record.

Equation (14) adjusts the fq values to sum to 1.

$$ \mathop \sum \limits_{q = 1}^{q = Q} f_{q} = 1 $$
(14)

The best-matching training-subset record (i.e., the one with the lowest \( \sum VSE_{jk} \) value) should make the greatest contribution to the prediction of the dependent variable associated with the jth tuning-subset record. This is achieved by applying \( \left( {1 - f_{q} } \right) \) multipliers in Eq. (15).

$$ \left( {X_{N + 1} } \right)_{j}^{predicted} = \mathop \sum \limits_{q = 1}^{q = Q} \left[ {\left( {X_{N + 1} } \right)_{q} \times \left( {1 - f_{q} } \right) } \right] $$
(15)

where: \( \left( {X_{N + 1} } \right)_{q} \) = dependent variable for the qth training-subset record (i.e., one of Q best-matching records). \( \left( {X_{N + 1} } \right)_{j}^{predicted} \) = TOB-stage-one predicted-dependent-variable value for the jth tuning-subset record.

The output from Step 9 represents the TOB-stage-one prediction. It is provisional because it applies equal weights (Wn) to the variable squared errors (VSE). This prediction is further refined in TOB stage 2 optimization.

Step 10 Various statistical metrics are used to assess the accuracy of the TOB-stage-one predictions (see Sect. 3.3).

2.2 TOB Stage 2 (optimizing contributions from the best matching records)

Step 11 Optimized predictions are achieved by minimizing the root mean squared error (RMSE) metric (Eq. 3, Sect. 3.3). RMSE is computed using the squared prediction errors calculated for all J tuning-subset records. Three optimizers are run in this study to verify the results and generate a range of potentially optimum solutions (see Sect. 3.2).

Two TOB stage 2 minimization control metrics are Q and Wn are employed:

  1. 1.

    The N input-variable weights (Wn) are allowed to vary across the fully constrained range (0 ≤ Wn ≤ 1) leaving the optimizer free to select the best values for them to minimize RMSE. It is not unusual for quite small non-zero values to be assigned to certain Wn by the optimizer. However, these low weights often have significant impacts that improve the prediction accuracy of TOB stage 2.

  2. 2.

    The optimizer is allowed to vary Q (2 ≤ Q ≤ 10) for the TOB stage 2 calculations conducted with Eqs. 1315.

Step 12 Statistical accuracy metrics for the TOB-stage-two predictions are computed and compared with those generated for the TOB-stage-1 predictions. Performing sensitivity analysis by optimizing with different fixed values of Q (i.e., Q = 2 to 10) also generates a set of sub-optimal solutions that also help assess potential underfitting or overfitting issues with the data set being evaluated.

Step 13 Compute TOB-stage-one and TOB-stage-two predictions for the independent testing subset records applying the optimum values established for Wn and Q in step 11 established by tuning the tuning subset Assess these predictions with the statistical accuracy metrics and compare them with those obtained for the tuning subset (e.g. Tables 3 and 5). They should be in comparable ranges. If they are not it is indicative that the tuning process may be either underfitting or overfitting the dataset. Further verification of the TOB Stage 2 optimum solutions prediction performance can be obtained by applying those solutions to the all records in the dataset and assessing the prediction accuracy achieved (e.g., Table 6).

As part of this step it is often appropriate to audit the intermediate calculation steps to reveal which variable errors (VSE) are weighted to contribute the most to the TOB Stage 2 predictions. Reviewing the intermediate calculations can also facilitate comprehensive outlier analysis (e.g., see Sect. 5.1). This provides insight as to why some data records lead to less-accurate predictions than the main trend of predictions). Also segmental analysis (see Sect. 4.3) identifies regions of the dependent-variable range for which the TOB method provides greater or lesser prediction accuracy.

Step 14 Compare the prediction accuracy provided by the TOB algorithm with other prediction methods (e.g., regression-based, machine learning algorithms and empirical methods). Such analysis for the renewable power generation dataset studied here will be presented in future studies.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wood, D.A. German country-wide renewable power generation from solar plus wind mined with an optimized data matching algorithm utilizing diverse variables. Energy Syst 11, 1003–1045 (2020). https://doi.org/10.1007/s12667-019-00347-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12667-019-00347-x

Keywords

Navigation