Abstract
Prediction intervals in supervised machine learning bound the region where the true outputs of new samples may fall. They are necessary in the task of separating reliable predictions of a trained model from near random guesses, minimizing the rate of false positives, and other problem-specific tasks in applied machine learning. Many real problems have heteroscedastic stochastic outputs, which explains the need of input-dependent prediction intervals. This paper proposes to estimate the input-dependent prediction intervals by a separate extreme learning machine model, using variance of its predictions as a correction term accounting for the model uncertainty. The variance is estimated from the model’s linear output layer with a weighted Jackknife method. The methodology is very fast, robust to heteroscedastic outputs, and handles both extremely large datasets and insufficient amount of training data.
Similar content being viewed by others
References
Akusok A, Miche Y, Hegedus J, Nian R, Lendasse A (2014) A two-stage methodology using K-NN and false-positive minimizing ELM for nominal data classification. Cognit Comput 6(3):432–445
Hegedus J, Miche Y, Ilin A, Lendasse A (2011) Methodology for Behavioral-based Malware Analysis and Detection Using Random Projections and K-Nearest Neighbors Classifiers. In: 2011 seventh international conference on computational intelligence and security, pp 1016–1023
Pevec D, Kononenko I (2014) Input dependent prediction intervals for supervised regression. Intell Data Anal 18(5):873–887
Akusok A, Miche Y, Karhunen J, Björk KM, Nian R, Lendasse A (2015) Arbitrary category classification of websites based on image content. IEEE Comput Intell Mag 10(2):30–41
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: A new learning scheme of feedforward neural networks. In: 2004 IEEE international joint conference on neural networks, 2004. Proceedings, vol 2, pp 985–990
Lendasse A, Man VC, Miche Y, Huang GB (2016) Advances in extreme learning machines (ELM2014). Neurocomputing 174, Part A:1–3
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Tikhonov AN (1963) Solution of incorrectly formulated problems and the regularization method. Sov Math Dokl 5:1035–1038
Miche Y, van Heeswijk M, Bas P, Simula O, Lendasse A. (2011)TROP-ELM: a double-regularized ELM using LARS and Tikhonov regularization. In: Advances in extreme learning machine: theory and applications biological inspired systems. Computational and ambient intelligence selected papers of the 10th international work-conference on artificial neural networks (IWANN2009), vol 74(16), pp 2413–2421
Akusok A, Veganzones D, Miche Y, Björk KM, du Jardin P, Séverin E, Lendasse A (2015) MD-ELM: originally mislabeled samples detection using OP-ELM model. Neurocomputing 159:242–250
Termenon M, Graña M, Savio A, Akusok A, Miche Y, Björk KM, Lendasse A (2016) Brain MRI morphological patterns extraction tool based on extreme learning machine and majority vote classification. Neurocomputing 174, Part A:344–351
Huang GB, Bai Z, Kasun L, Vong CM (2015) Local receptive fields based extreme learning machine. IEEE Comput Intell Mag 10(2):18–29
Sovilj D, Eirola E, Miche Y, Björk KM, Nian R, Akusok A, Lendasse A (2016) Extreme learning machine for missing data using multiple imputations. Neurocomputing 174, Part A:220–231
Huang Z, Yu Y, Gu J, Liu H (2017) An efficient method for traffic sign recognition based on extreme learning machine. IEEE Trans Cybern 47(4):920–933
Akusok A, Björk KM, Miche Y, Lendasse A (2015) High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3:1011–1025
Swaney C, Akusok A, Björk KM, Miche Y, Lendasse A (2015) Efficient skin segmentation via neural networks: HP-ELM and BD-SOM. In: INNS conference on big data 2015 program, San Francisco, CA, USA 8–10 Aug 2015, vol 53, pp 400–409
Soria-Olivas E, Gomez-Sanchis J, Martin JD, Vila-Frances J, Martinez M, Magdalena JR, Serrano AJ (2011) BELM: Bayesian extreme learning machine. IEEE Trans Neural Netw 22(3):505–509
Chen Y, Yang J, Wang C, Park D (2016) Variational Bayesian extreme learning machine. Neural Comput Appl 27(1):185–196
Shang Z, He J (2015) Confidence-weighted extreme learning machine for regression problems. Neurocomputing 148:544–550
He YL, Wang XZ, Huang JZ (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364(C):222–240
Asai H, Tanaka S, Uegima K (1982) Linear regression analysis with fuzzy model. IEEE Trans Syst Man Cybern 12(6):903–07
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Wang XZ, Zhang T, Wang R (2017) Noniterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst PP(99):1–10
Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497
Pevec D, Kononenko I (2015) Prediction intervals in supervised learning for model evaluation and discrimination. Appl Intell 42(4):790–804
Lin B, Wang Q, Zhang J, Pang Z (2017) Stable prediction in high-dimensional linear models. Stat Comput 27(5):1401–1412
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. In: Neural networks selected papers from the 7th Brazilian symposium on neural networks (SBRN ’04), vol 70(1–3), pp 489–501
Rao CR, Mitra SK (1972) Generalized inverse of a matrix and its applications. In: Proceedings of the sixth Berkeley symposium on mathematical statistics and probability. Theory of statistics, vol 1. University of California Press, Berkeley, pp 601–620
Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. Syst Man Cybern Part B Cybern IEEE Trans 42(2):513–529
Bishop CM (2006) Pattern recognition and machine learning. Information science and statistics, vol 4. Springer Science + Business Media, Singapore
Shao J, Wu CFJ (1987) Heteroscedasticity-robustness of Jackknife variance estimators in linear models. Ann Stat 15(4):1563–1579
Loève M (1955) Probability Theory; foundations. Random Sequences. D. Van Nostrand Company, New York
Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, vol 5. Prentice Hall, Upper Saddle River
Nix DA, Weigend AS (1995) Learning local error bars for nonlinear regression. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems, vol 7. MIT Press, Cambridge, pp 489–496
Wu CFJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 4:1261–1295
Horn PS, Pesce AJ, Copeland BE (1998) A robust approach to reference interval estimation and evaluation. Clin Chem 44(3):622–631
Flachaire E (2005) Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap. In: 2nd CSDA special issue on computational econometrics, vol 49(2), pp 361–376
Davidson R, Flachaire E (2008) The wild bootstrap, tamed at last. J Econometr 146(1):162–169
Khosravi A, Nahavandi S, Creighton D, Atiya AF (2011) Lower upper bound estimation method for construction of neural network-based prediction intervals. IEEE Trans Neural Netw 22(3):337–346
Yeh IC (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808
Nierenberg DW, Stukel TA, Baron JA, Dain BJ, Greenberg ER (1989) Determinants of plasma levels of beta-carotene and retinol. Am J Epidemiol 130(3):511–521
Guidorzi R, Rossi R (1974) Identification of a power plant from normal operating records. Autom Control Theory Appl 2(3):63–67
Chryssolouris G, Lee M, Ramsey A (1996) Confidence interval prediction for neural network models. IEEE Trans Neural Netw 7(1):229–232
Ding AA, He X (2003) Backpropagation of pseudo-errors: neural networks that are adaptive to heterogeneous noise. IEEE Trans Neural Netw 14(2):253–262
MacKay DJC (1992) The evidence framework applied to classification networks. Neural Comput 4(5):720–736
Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A (2010) OP-ELM: optimally-pruned extreme learning machine. IEEE Trans Neural Netw 21(1):158–162
Zhu H, Tsang EC, Wang XZ, Ashfaq RAR (2017) Monotonic classification extreme learning machine. Neurocomputing 225(Supplement C):205–213
Phung SL, Bouzerdoum A, Chai DS (2005) Skin segmentation using color pixel classification: analysis and comparison. Pattern Anal Mach Intell IEEE Trans 27(1):148–154
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Akusok, A., Miche, Y., Björk, KM. et al. Per-sample prediction intervals for extreme learning machines. Int. J. Mach. Learn. & Cyber. 10, 991–1001 (2019). https://doi.org/10.1007/s13042-017-0777-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0777-2