Abstract
There has been a rapid growth in the domains of artificial intelligence, data mining and machine learning during the last few years. Machine learning techniques have been extensively used for outcome predication and classification in different spheres of research now a days. Machine learning shows excellent performance for outcome prediction and classification in the domains of medicine, cyber security, banking fraud, drug discovery etc. However, in the field of sports, particularly for the game of badminton, outcome result prediction with the aid of artificial intelligence and machine learning is still unexplored. The machine learning techniques for outcome prediction have been used for limited games only. This paper presents machine learning based technique for badminton match outcome prediction with less input attributes. Here, supervised learning approach with feature reduction techniques has been proposed for badminton match outcome prediction. The raw data related to Australian Open, Malaysian Open, German Open and Singapore Open Badminton tournaments from 2016 to 2019 are collected from internet sources (official websites and other websites). CSV file is formulated from the scarp data with total thirty features for singles tournament and thirty-four features for doubles tournaments. Correlation Feature Selection Method, Info Gain Attribute Selection Method, ReliefF Attribute Selection Method, Probabilistic Significance Attribute Evaluation Method and Symmetrical Uncertainty Attribute Evaluation feature reduction techniques are employed to evaluate feature significance. Fourteen significant features as input predictors for three machine learning classifiers are selected for badminton match result prediction. The classifiers performance for match outcome prediction is evaluated in terms of accuracy, root mean square error, receiver operating characteristics and other confusion matrices parameters. Results for each tournament with reduced features are analysed and compared with full feature dataset. It has been observed that Naïve Bayes with correlation based feature weighting shows remarkable performance in contrast to other proposed classifiers in match outcome prediction for reduced feature dataset.
Similar content being viewed by others
References
Zaki MJ, Meira W Jr (2018) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–53
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Morgan Kaufmann, Amsterdam
Larose DT (2014) Discovering knowledge in data: an introduction to data mining. Wiley, Hoboken, NJ
Steinberg R (2017) 6 areas where artificial neural networks outperform humans. https://venturebeat.com/2017/12/08/6-areas-where-artificial-neural-networks-outperform-humans/
Le T, Le HS, Vo MT, Lee MY, Baik SW (2018) A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset. Symmetry 10(7):250. https://doi.org/10.3390/sym10070250
Le T, Lee MY, Park JR, Baik SW (2018) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4):79. https://doi.org/10.3390/sym10040079
Le T, Vo B, Baik SW (2018) Engineering applications of applied efcient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng Appl Artif Intell 68:1–9. https://doi.org/10.1016/j.engappai.2017.09.010
Chen PH, Zafar H, Galperin-Aizenberg M, Cook T (2018) Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports. J Digit Imaging 31(2):178–184. https://doi.org/10.1007/s10278-017-0027-x
ElMouatez BK, Mourad D (2019) MalDy: portable, data-driven malware detection using natural language processing and machine learning techniques on behavioural analysis reports. Digit Investig 28:S77–S87
Le HS, Tran MT, Fujita H, Dey N, Ashour AS, Vo TNN, Le QA, Chu DT (2018) Dental diagnosis from X-ray images: an expert system based on fuzzy computing. Biomed Signal Process Control 39:64–73. https://doi.org/10.1016/j.bspc.2017.07.005
Singh J, Singh G, Singh R (2017) Optimization of sentiment analysis using machine learning classifiers. Hum Cent Comput Inf Sci 7:32. https://doi.org/10.1186/s13673-017-0116-3
Martinez-Torres MR, Toral SL (2019) A machine learning approach for the identification of the deceptive reviews in the hospitality sector using unique attributes and sentiment orientation. Tour Manag 75:393–403
Vajda S, Karargyris A, Jäger S, Santosh KC, Candemir C, Xue Z, Antani SK, Thoma GR (2018) Feature selection for automatic tuberculosis screening in frontal chest radiographs. J Med Syst 42:146
Sharma M (2019) Cervical cancer prognosis using genetic algorithm and adaptive boosting approach. Health Technol 9(5):877–886
Roan TN, Ali M, Le HS (2018) δ-equality of intuitionistic fuzzy sets: a new proximity measure and applications in medical diagnosis. Appl Intell 48(2):499–525. https://doi.org/10.1007/s10489-017-0986-0
Kistan T, Gardi A, Sabatini R (2018) Machine learning and cognitive ergonomics in air traffic management: recent developments and considerations for certification. Aerospace 5:103. https://doi.org/10.3390/aerospace5040103
Sharma M (2019) Improved autistic spectrum disorder estimation using Cfs with greedy stepwise feature selection technique. Int J Inf Tecnol. https://doi.org/10.1007/s41870-019-00335-5
Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutor 10(4):56–76. https://doi.org/10.1109/SURV.2008.080406
BkassinyMLY JSK (2012) A survey on machine learning techniques in cognitive radios. IEEE Commun Surv Tutour 15(3):1136–1159
Statistics (By season stats). https://www.iplt20.com. Accessed 27 Aug 2019
Statistics (Players and team stats category). https://www.nfl.com. Accessed 15 Jan 2020
Statisics (Scores and stats). https://www.mlb.com. Accessed 15 Dec 2019
Statistics (Team stats). https://in.nba.com. Accessed 10 Nov 2019
Business (TV is biggest driver in global sport league revenue). https://globalsportmatters.com/business/2019/03/07/tv-is-biggest-driver-in-global-sport-league-revenue. Accessed 05 Decem 2019
Sports industry statistic and market size overview, business and industry statistics. https://www.plunkettresearch.com/statistics/Industry-Statistics-Sports-Industry-Statistic-and-Market-Size-Overview. Accessed 20 Jan 2020
Services (Analytics). https://www.optasports.com. Accessed 22 Oct 2019
Betting and fantasy. https://www.stats.com. Accessed 24 Nov 2019
Prasitio D, Harlili D (2016) Predicting football match results with logistic regression. In: Proceedings of the 2016 international conference on advanced informatics: concepts, theory and application (ICAICTA), 16–19 Aug 2016, Penang, Malaysia. https://doi.org/10.1109/ICAICTA.2016.7803111
Bunker RP, Thabtah F (2019) A machine learning framework for sport result prediction. Appl Comput Inform 15(1):27–33
Gu W, Saaty TL (2019) Predicting the outcome of a tennis tournament: based on both data and judgments. J Syst Sci Syst Eng 28:317–343. https://doi.org/10.1007/s11518-018-5395-3
Ghosh S, Sadhu S, Biswas S, Sarkar D, Sarkar PP (2019) A comparison between different classifiers for tennis match result prediction. Malays J Comput Sci 32(2):97–111
Barnett T, Brown A, Clarke SR (2006) Developing a tennis model that reflects outcomes of tennis matches. In: Proceedings of the 8th Australasian conference on mathematics and computers in sport, Coolangatta, Queensland, pp 178–188
Martins RG, Martins AS, Neves LA, Lima LV, Flores EL, de Nascimento MZ (2017) Exploring polynomial classifier to predict match results in football championships. Expert Syst Appl 83:79–93
Kyriakides G, Talattinis K, George S (2014) Rating systems vs machine learning on the context of sports. In: Proceedings of the 18th panhellenic conference on informatics, ACM, Athens, Greece, pp 1–6. https://doi.org/10.1145/2645791.2645846
Baboota R, Kaur H (2018) Predictive analysis and modelling football results using machine learning approach for English Premier League. Int J Forecast 35(2):741–755. https://doi.org/10.1016/j.ijforecast.2018.01.003
Soliman G, El-Nabawy A, Misbah A, Eldawlatly S (2017) Predicting all star player in the national basketball association using random forest. In: Proceedings of the 2017 intelligent systems conference (Intelli Sys), London, UK, pp 706–713. https://doi.org/10.1109/IntelliSys.2017.8324371
Thabtah F, Zhang L, Abdelhamid N (2019) NBA game result prediction using feature analysis and machine learning. Ann Data Sci 6(1):103–116
Loeffelholz B, Bednar E, Bauer KW (2009) Predicting NBA games using neural networks. J Quant Anal Sports 5(1):1156
Novatchkov H, Baca A (2013) Artificial intelligence in sports on the example of weight training. J Sports Sci Med 12(1):27–37
Fister I, Rauter S, Yang XS, Ljubiˇc K, Fister I (2015) Planning the sports training sessions with the bat algorithm. Neurocomputing 149:993–1002
Chu WT, Situmeang S (2017) Badminton video analysis based on spatiotemporal and stroke features. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, Bucharest, Romania, pp 448–451. https://doi.org/10.1145/3078971.3079032
Careelmont S (2013) Badminton shot classification in compressed video with baseline angled camera. Master Thesis, University of Ghent
Chen B, Wang Z (2007) A statistical method for analysis of technical data of a badminton match based on 2-d seriate images. Tsinghua Sci. Technol. 12(5):594–601
Sharma M, Monika, Kumar N, Kumar P (2020) Badminton match outcome prediction model using naïve bayes and feature weighting technique. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02578-8
Results. https://bwfbadminton.com. Accessed 15 Dec 2019
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Hall MA (2020) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning (ICML ’00), Morgan Kaufmann, San Francisco, Calif, USA, pp 359–366
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: ECML'94: proceedings of the 7th European conference on machine learning, pp171–182. https://doi.org/10.1007/3-540-57868-4_57
Ahmad A, Dey L (2005) A feature selection technique for classificatory analysis. Pattern Recogn Lett 26:43–56
Jiang L, Zhang L, Li C, Wu J (2019) A correlation-based feature weighting filter for Naive Bayes. IEEE Trans Knowl Data Eng 31(2):201–213
Wilkinson L, Anand A, Tuan DN (2011) CHIRP: a new classifier based on composite hypercubes on iterated random projections. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 6–14. https://doi.org/10.1145/2020408.2020418
Lazarsfeld PF, Henry N (1968) Latent structure analysis. Houghton Mifflin, Boston
Sturges HA (1926) The choice of a class interval. J Am Stat Assoc 21:65–66
Witten IH, Eibe F, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco
Acknowledgements
Corresponding author acknowledges the Triveni Badminton Club (TBC) for valuable inputs.
Funding
There is no funding source.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sharma, M., Monika, Kumar, N. et al. Naive bayes-correlation based feature weighting technique for sports match result prediction. Evol. Intel. 15, 2171–2186 (2022). https://doi.org/10.1007/s12065-021-00629-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-021-00629-3