Data-driven decision model based on local two-stage weighted ensemble learning

Xu, Che; Chang, Wenjun; Liu, Weiyong

doi:10.1007/s10479-022-04599-2

Data-driven decision model based on local two-stage weighted ensemble learning

Original Research
Published: 05 April 2022

Volume 325, pages 995–1028, (2023)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Che Xu^1,2,3,
Wenjun Chang ORCID: orcid.org/0000-0001-6787-4253^1,2,3 &
Weiyong Liu⁴

531 Accesses
Explore all metrics

Abstract

To improve the decision performance using historical decision data, this paper proposes a data-driven decision model based on local two-stage weighted ensemble learning. The assessments of historical alternatives are collected from a multicriteria framework. For each new alternative, a set of its similar alternatives is determined from historical alternatives using the K-Nearest Neighbor technique, and then a set of base classifiers (BCs) is generated by the historical assessments. Based on ensemble error and diversity of BCs in predicting the similar historical alternatives of the new alternative, a local two-stage weighted ensemble method is developed to learn the optimal BC weights for the new alternative. Such a learning process not only considers the changes of BCs’ competence in facing different alternatives (instances) but also avoids falling into the dilemma of balancing the accuracy and diversity of BCs. By combining the continuous outputs of different BCs with the learned BC weights, the weighted ensemble outputs are obtained for the similar historical alternatives of the new alternative. Based on these outputs and the assessments of those similar historical alternatives on criteria, a linear optimization model is constructed to learn criterion weights. Using the learned criterion weights, the interpretable decision is performed. The advantages of the proposed decision model against four traditional decision models are validated by a real case study for the diagnosis of thyroid nodules. Thirty real datasets examine the competence of the proposed weighted ensemble method against mainstream ensemble methods and combination rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KNN Ensemble Learning Integration Algorithm Based on Three-Way Decision

A DES-based group decision model for group decision making with large-scale alternatives

Article 11 November 2021

Multi-criteria Decision-Making Based Classifier Ensemble by Using Prioritized Aggregation Operator

Notes

Usually, K is used to denote the number of data subsets in the cross validation. Considering that K has been defined as the size of the local region in this paper, here for distinction, we use Z to represent the number of data subsets in the cross validation.

References

Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., & Herrera, F. (2011). KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17, 255–287.
Google Scholar
Alelaumi, S., Wang, H., Lu, H., & Yoon, S. W. (2020). A predictive abnormality detection model using ensemble learning in stencil printing process. IEEE Transactions on Components, Packaging and Manufacturing Technology, 10(9), 1560–1568.
Google Scholar
Alfaro, C., Cano-Montero, J., Gómez, J., Moguerza, J. M., & Ortega, F. (2016). A multi-stage method for content classification and opinion mining on weblog comments. Annals of Operations Research, 236, 197–213.
Google Scholar
Ardakani, A. A., Bitarafan-Rajabi, A., Mohammadi, A., Hekmat, S., Tahmasebi, A., Shiran, M. B., & Mohammadzadeh, A. (2019). CAD system based on B-mode and color Doppler sonographic features may predict if a thyroid nodule is hot or cold. European Radiology, 29, 4258–4265.
Google Scholar
Blake, C., & Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.Edu/mlearn/MLRepository.html
Bonami, P., Günlük, O., & Linderoth, J. (2018). Globally solving nonconvex quadratic programming problems with box constraints via integer programming method. Mathematical Programming Computation, 10, 333–382.
Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Google Scholar
Brown, G., Wyatt, J., Harris, R., & Yao, X. (2005). Diversity creation methods: A survey and categorization. Information Fusion, 6, 5–20.
Google Scholar
Cappelli, C., Castellano, M., Pirola, I., Cumetti, D., Agosti, B., Gandossi, E., & Rosei, E. A. (2007). The predictive value of ultrasound findings in the management of thyroid nodules. QJM: An International Journal of Medicine, 100(1), 29–35.
Cevikalp, H., & Polikar, R. (2008). Local classifier weighting by quadratic programming. IEEE Transactions on Neural Networks, 19(10), 1832–1838.
Google Scholar
Chen, T. Q., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794).
Chong, E. K. P., & Zak, S. H. (2013). An introduction to optimization. Wiley.
Google Scholar
Costa, V. S., Farias, A. D. S., Bedregal, B., Santiago, R. H. N., & Canuto, A. M. D. P. (2018). Combining multiple algorithms in classifier ensembles using generalized mixture functions. Neurocomputing, 313, 402–414.
Google Scholar
Cruz, R. M. O., Sabourin, R., & Cavalcanti, G. D. C. (2018). Dynamic classifier selection: Recent advances and perspectives. Information Fusion, 41, 195–216.
Google Scholar
Cui, S., Wang, Y. Z., Yin, Y. Q., Cheng, T. C. E., Wang, D. J., & Zhai, M. Y. (2021). A cluster-based intelligence ensemble learning method for classification problem. Information Sciences, 560, 386–409.
Google Scholar
Dash, R., Samal, S., Dash, S., & Rautray, R. (2019). An integrated TOPSIS crow search based classifier ensemble: In application to stock index price movement prediction. Applied Soft Computing, 85, 105784.
Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Google Scholar
Fernandes, L., Fischer, A., Júdice, J., Requejo, C., & Soares, J. (1998). A block active set algorithm for large-scale quadratic programming with box constraints. Annals of Operations Research, 81, 75–95.
Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Google Scholar
Freund, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 28, 367–378.
Google Scholar
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. American Statistical Association, 32, 675–701.
Google Scholar
Fu, C., Chang, W. J., & Liu, W. Y. (2019). Data-driven group decision making for diagnosis of thyroid nodule. Science China Information Sciences, 62, 212205:1-212205:23.
Google Scholar
Fu, C., Liu, W. Y., & Chang, W. J. (2020). Data-driven multiple criteria decision making for diagnosis of thyroid cancer. Annals of Operations Research, 293(2), 833–862.
Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition, 44, 1761–1776.
Google Scholar
García, S., Zhang, Z. L., Altalhi, A., Alshomrani, S., & Herrera, F. (2018). Dynamic ensemble selection for multi-class imbalanced datasets. Information Sciences, 445–446, 22–37.
Google Scholar
Guo, M. Z., Liao, X. W., Liu, J. P., & Zhang, Q. P. (2020). Consumer preference analysis: A data-driven multiple criteria approach integrating online information. Omega, 96, 102074.
Google Scholar
Guo, M. Z., Zhang, Q. P., Liao, X. W., Chen, F. Y., & Zeng, D. D. (2021). A hybrid machine learning framework for analyzing human decision-making through learning preferences. Omega, 101, 102263.
Google Scholar
Horvath, E., Silva, C. F., Majlis, S., Rodriguez, I., Skoknic, V., Castro, A., Rojas, H., Niedmann, J. P., Madrid, A., Capdeville, F., Whittle, C., Rossi, R., Domínguez, M., & Tala, H. (2017). Prospective validation of the ultrasound based TIRADS (Thyroid Imaging Reporting and Data System) classification: Results in surgically resected thyroid nodules. European Radiology, 27(6), 2619–2628.
Google Scholar
Irpino, A., & Verde, R. (2008). Dynamic clustering of interval data using a wasserstein-based distance. Pattern Recognition Letters, 29(11), 1648–1658.
Google Scholar
Jardin, P. D. (2021). Forecasting corporate failure using ensemble of self-organizing neural networks. European Journal of Operational Research, 288, 869–888.
Google Scholar
Jiang, M., Jia, L., Chen, Z., & Chen, W. (2020). The two-stage machine learning ensemble models for stock price prediction by combining mode decomposition, extreme learning machine and improved harmony search algorithm. Annals of OperationsResearch. https://doi.org/10.1007/s10479-020-03690-w
Article Google Scholar
Johnson, M., Albizri, A., & Simsek, S. (2020). Artificial intelligence in healthcare operations to enhance treatment outcomes: A framework to predict lung cancer prognosis. Annals of OperationsResearch. https://doi.org/10.1007/s10479-020-03872-6
Article Google Scholar
Khosravi, K., Shahabi, H., Pham, B. T., Adamowski, J., Shirzadi, A., Pradhan, B., Dou, J., Ly, H. B., Gróf, G., Ho, H. L., Hong, H., Chapi, K., & Prakash, I. (2019). A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. Journal of Hydrology, 573, 311–323.
Google Scholar
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Google Scholar
Krannichfeldt, L. V., Wang, Y., & Hug, C. (2021). Online ensemble learning for load forecasting. IEEE Transactions on Power Systems, 36(1), 545–548.
Google Scholar
Krawczyk, B., Galar, M., Woźniak, M., Bustince, H., & Herrera, F. (2018). Dynamic ensemble selection for multi-class classification with one-class classifiers. Pattern Recognition, 83, 34–51.
Google Scholar
Krogh, A., & Vedelsby, J. (1994). Neural network ensembles, cross validation, and active learning. In Proceedings of the 7-th international conference on neural information processing systems (pp. 231–238).
Kuncheva, L., & Whitaker, C. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51, 181–207.
Google Scholar
Kuncheva, L. (2013). A bound on kappa-error diagrams for analysis of classifier ensembles. IEEE Transactions on Knowledge and Data Engineering, 25, 494–501.
Google Scholar
Lamy, J. B., Sekar, B., Guezennec, G., Bouaud, J., & Séroussi, B. (2019). Explainable artificial intelligence for breast cancer: A visual case-based reasoning approach. Artificial Intelligence in Medicine, 94, 42–53.
Google Scholar
Li, T., Wang, Y., & Zhang, N. (2020). Combining probability density forecasts for power electrical loads. IEEE Transactions on Smart Grid, 11(2), 1679–1690.
Google Scholar
Li, X., Zhang, S. L., Zhang, M., & Liu, H. (2008). Rank of interval numbers based on a new distance measure. Journal of Southwest University of Science and Technology, 27(1), 87–90.
Google Scholar
Liang, Z., Xiao, Z., Wang, J., Sun, L., Li, B., Hu, Y., & Wu, Y. (2019). An improved chaos similarity model for hydrological forecasting. Journal of Hydrology, 577, 123953.
Google Scholar
Liu, Z. G., Pan, Q., Dezert, J., & Martin, A. (2018). Combination of classifiers with optimal weight based on evidential reasoning. IEEE Transactions on Fuzzy Systems, 26(3), 1217–1230.
Google Scholar
Lu, H. Y., Wang, H. F., Zhang, Q. Q., Won, D., & Yoon, S. W. (2018). A dual-tree complex wavelet transform based convolutional neural network for human thyroid medical image segmentation. In Proceedings of 2018 IEEE international conference on healthcare informatics (ICHI) (pp. 191–198).
Mahbobi, M., Kimiagari, S., & Vasudevan, M. (2021). Credit risk classification: An integrated predictive accuracy algorithm using artificial and deep neural networks. Annals of Operations Research. https://doi.org/10.1007/s10479-021-04114-z
Article Google Scholar
Mao, S. S., Jiao, L., Xiong, L., Gou, S., Chen, B., & Yeung, S. K. (2015). Weighted classifier ensemble based on quadratic form. Pattern Recognition, 48, 1688–1706.
Google Scholar
Mao, S. S., Chen, J. W., Jiao, L. C., Geou, S. P., & Wang, R. F. (2019). Maximizing diversity by transformed ensemble learning. Applied Soft Computing, 82, 105580.
Google Scholar
Nachappa, T. G., Piralilou, S. T., Gholamnia, K., Ghorbanzadeh, O., Rahmati, O., & Blaschke, T. (2020). Flood susceptibility mapping with machine learning, multi-criteria decision analysis and ensemble using Dempster Shafer Theory. Journal of Hydrology, 590, 125275.
Google Scholar
Nguyen, T. T., Luong, A. V., Dang, M. T., Liew, A. W. C., & Mccall, J. (2020). Ensemble selection based on classifier prediction confidence. Pattern Recognition, 100, 107104.
Google Scholar
Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer.
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.
Google Scholar
Razzaghi, T., Safro, I., Ewing, J., Sadrfaridpour, E., & Scott, J. D. (2019). Predictive models for bariatric surgery risks with imbalanced medical datasets. Annals of Operations Research, 280, 1–18.
Google Scholar
Şen, M. U., & Erdoğan, H. (2013). Linear classifier combination and selection using group sparse regularization and hinge loss. Pattern Recognition Letters, 34, 265–274.
Google Scholar
Seref, O., Razzaghi, T., & Xanthopoulos, P. (2017). Weighted relaxed support vector machines. Annals of Operations Research, 249, 235–271.
Google Scholar
Smits, P. C. (2002). Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection. IEEE Transactions on Geoscience and Remote Sensing, 40(4), 801–813.
Google Scholar
Sue, K. L., Tsai, C. F., & Chiu, A. (2021). The data sampling effect on financial distress prediction by single and ensemble learning techniques. Communications in Statistics-Theory and Methods. https://doi.org/10.1080/03610926.2021.1992439
Article Google Scholar
Tang, E. K., Suganthan, P. N., & Yao, X. (2006). An analysis of diversity measures. Machine Learning, 65, 247–271.
Google Scholar
Tang, L., Wang, S., He, K., & Wang, S. (2015). A novel mode-characteristic-based decomposition ensemble model for nuclear energy consumption forecasting. Annals of Operations Research, 234, 111–132.
Google Scholar
Wang, H., Song, B., Ye, N. R., Ren, J. L., Sun, X. L., Dai, Z. D., Zhang, Y., & Chen, B. T. (2020). Machine learning-based multiparametric MRI radiomics for predicting the aggressiveness of papillary thyroid carcinoma. European Journal of Radiology, 122, 108755.
Google Scholar
Wang, H. F., Zheng, B. C., Yoon, S. W., & Ko, H. S. (2018). A support vector machine-based ensemble algorithm for breast cancer diagnosis. European Journal of Operational Research, 267(2), 687–699.
Google Scholar
Wang, H. F., Won, D., & Yoon, S. W. (2019). A deep separable neural network for human tissue identification in three-dimensional optical coherence tomography images. IISE Transactions on Healthcare Systems Engineering, 9(3), 250–271.
Google Scholar
Wang, J. M. (2012). Robust optimization analysis for multiple attribute decision making problems with imprecise information. Annals of Operations Research, 197, 109–122.
Google Scholar
Wang, Y. M. (1997). Using the method of maximizing deviation to make decision for multiindices. Journal of Systems Engineering and Electronics, 8(3), 21–26.
Google Scholar
Wu, Z. B., & Chen, Y. H. (2007). The maximizing deviation method for group multiple attribute decision making under linguistic environment. Fuzzy Sets and Systems, 158(14), 1608–1617.
Google Scholar
Xu, C., Fu, C., Liu, W. Y., Sheng, S., & Yang, S. L. (2021). Data-driven decision model based on dynamical classifier selection. Knowledge-Based Systems, 212, 106590.
Google Scholar
Yin, X. C., Huang, K. Z., Yang, C., & Hao, H. W. (2014). Convex ensemble learning with sparsity and diversity. Information Fusion, 20, 49–59.
Google Scholar
Zhang, X., & Liu, P. (2010). Methods for multiple attribute decision-making under risk with interval numbers. International Journal of Fuzzy Systems, 12(3), 237–242.
Google Scholar
Zhang, L., & Zhou, W. D. (2011). Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognition, 44, 97–106.
Google Scholar
Zhang, Y. Q., Cao, G., Wang, B. S., & Li, X. S. (2019). A novel ensemble method for k-nearest neighbor. Pattern Recognition, 85, 13–25.
Google Scholar
Zhou, Z. H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.
Zuo, W., Zhang, D., & Wang, K. (2008). On kernel difference-weighted k-nearest neighbor classification. Pattern Analysis and Applications, 11, 247–257.
Google Scholar

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (Grant Nos. 72101074, 72171066, and 72071061), and the Fundamental Research Funds for the Central Universities (JZ2021HGTA0139 and JZ2021HGQA0203).

Author information

Authors and Affiliations

School of Management, Hefei University of Technology, Box 270, Hefei, 230009, Anhui, China
Che Xu & Wenjun Chang
Key Laboratory of Process Optimization and Intelligent Decision-Making, Ministry of Education, Hefei, 230009, Anhui, China
Che Xu & Wenjun Chang
Ministry of Education Engineering Research Center for Intelligent Decision-Making and Information System Technologies, Hefei, 230009, China
Che Xu & Wenjun Chang
Department of Ultrasound, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
Weiyong Liu

Authors

Che Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Weiyong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wenjun Chang or Weiyong Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 102 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, C., Chang, W. & Liu, W. Data-driven decision model based on local two-stage weighted ensemble learning. Ann Oper Res 325, 995–1028 (2023). https://doi.org/10.1007/s10479-022-04599-2

Download citation

Accepted: 14 February 2022
Published: 05 April 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10479-022-04599-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven decision model based on local two-stage weighted ensemble learning

Abstract

Access this article

Similar content being viewed by others

KNN Ensemble Learning Integration Algorithm Based on Three-Way Decision

A DES-based group decision model for group decision making with large-scale alternatives

Multi-criteria Decision-Making Based Classifier Ensemble by Using Prioritized Aggregation Operator

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 102 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data-driven decision model based on local two-stage weighted ensemble learning

Abstract

Access this article

Similar content being viewed by others

KNN Ensemble Learning Integration Algorithm Based on Three-Way Decision

A DES-based group decision model for group decision making with large-scale alternatives

Multi-criteria Decision-Making Based Classifier Ensemble by Using Prioritized Aggregation Operator

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 102 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation