Skip to main content

Advertisement

Log in

Naive bayes-correlation based feature weighting technique for sports match result prediction

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

There has been a rapid growth in the domains of artificial intelligence, data mining and machine learning during the last few years. Machine learning techniques have been extensively used for outcome predication and classification in different spheres of research now a days. Machine learning shows excellent performance for outcome prediction and classification in the domains of medicine, cyber security, banking fraud, drug discovery etc. However, in the field of sports, particularly for the game of badminton, outcome result prediction with the aid of artificial intelligence and machine learning is still unexplored. The machine learning techniques for outcome prediction have been used for limited games only. This paper presents machine learning based technique for badminton match outcome prediction with less input attributes. Here, supervised learning approach with feature reduction techniques has been proposed for badminton match outcome prediction. The raw data related to Australian Open, Malaysian Open, German Open and Singapore Open Badminton tournaments from 2016 to 2019 are collected from internet sources (official websites and other websites). CSV file is formulated from the scarp data with total thirty features for singles tournament and thirty-four features for doubles tournaments. Correlation Feature Selection Method, Info Gain Attribute Selection Method, ReliefF Attribute Selection Method, Probabilistic Significance Attribute Evaluation Method and Symmetrical Uncertainty Attribute Evaluation feature reduction techniques are employed to evaluate feature significance. Fourteen significant features as input predictors for three machine learning classifiers are selected for badminton match result prediction. The classifiers performance for match outcome prediction is evaluated in terms of accuracy, root mean square error, receiver operating characteristics and other confusion matrices parameters. Results for each tournament with reduced features are analysed and compared with full feature dataset. It has been observed that Naïve Bayes with correlation based feature weighting shows remarkable performance in contrast to other proposed classifiers in match outcome prediction for reduced feature dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Zaki MJ, Meira W Jr (2018) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  2. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–53

    Google Scholar 

  3. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Morgan Kaufmann, Amsterdam

    MATH  Google Scholar 

  4. Larose DT (2014) Discovering knowledge in data: an introduction to data mining. Wiley, Hoboken, NJ

    MATH  Google Scholar 

  5. Steinberg R (2017) 6 areas where artificial neural networks outperform humans. https://venturebeat.com/2017/12/08/6-areas-where-artificial-neural-networks-outperform-humans/

  6. Le T, Le HS, Vo MT, Lee MY, Baik SW (2018) A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset. Symmetry 10(7):250. https://doi.org/10.3390/sym10070250

    Article  Google Scholar 

  7. Le T, Lee MY, Park JR, Baik SW (2018) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4):79. https://doi.org/10.3390/sym10040079

    Article  Google Scholar 

  8. Le T, Vo B, Baik SW (2018) Engineering applications of applied efcient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng Appl Artif Intell 68:1–9. https://doi.org/10.1016/j.engappai.2017.09.010

    Article  Google Scholar 

  9. Chen PH, Zafar H, Galperin-Aizenberg M, Cook T (2018) Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports. J Digit Imaging 31(2):178–184. https://doi.org/10.1007/s10278-017-0027-x

    Article  Google Scholar 

  10. ElMouatez BK, Mourad D (2019) MalDy: portable, data-driven malware detection using natural language processing and machine learning techniques on behavioural analysis reports. Digit Investig 28:S77–S87

    Article  Google Scholar 

  11. Le HS, Tran MT, Fujita H, Dey N, Ashour AS, Vo TNN, Le QA, Chu DT (2018) Dental diagnosis from X-ray images: an expert system based on fuzzy computing. Biomed Signal Process Control 39:64–73. https://doi.org/10.1016/j.bspc.2017.07.005

    Article  Google Scholar 

  12. Singh J, Singh G, Singh R (2017) Optimization of sentiment analysis using machine learning classifiers. Hum Cent Comput Inf Sci 7:32. https://doi.org/10.1186/s13673-017-0116-3

    Article  Google Scholar 

  13. Martinez-Torres MR, Toral SL (2019) A machine learning approach for the identification of the deceptive reviews in the hospitality sector using unique attributes and sentiment orientation. Tour Manag 75:393–403

    Article  Google Scholar 

  14. Vajda S, Karargyris A, Jäger S, Santosh KC, Candemir C, Xue Z, Antani SK, Thoma GR (2018) Feature selection for automatic tuberculosis screening in frontal chest radiographs. J Med Syst 42:146

    Article  Google Scholar 

  15. Sharma M (2019) Cervical cancer prognosis using genetic algorithm and adaptive boosting approach. Health Technol 9(5):877–886

    Article  Google Scholar 

  16. Roan TN, Ali M, Le HS (2018) δ-equality of intuitionistic fuzzy sets: a new proximity measure and applications in medical diagnosis. Appl Intell 48(2):499–525. https://doi.org/10.1007/s10489-017-0986-0

    Article  Google Scholar 

  17. Kistan T, Gardi A, Sabatini R (2018) Machine learning and cognitive ergonomics in air traffic management: recent developments and considerations for certification. Aerospace 5:103. https://doi.org/10.3390/aerospace5040103

    Article  Google Scholar 

  18. Sharma M (2019) Improved autistic spectrum disorder estimation using Cfs with greedy stepwise feature selection technique. Int J Inf Tecnol. https://doi.org/10.1007/s41870-019-00335-5

    Article  Google Scholar 

  19. Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutor 10(4):56–76. https://doi.org/10.1109/SURV.2008.080406

    Article  Google Scholar 

  20. BkassinyMLY JSK (2012) A survey on machine learning techniques in cognitive radios. IEEE Commun Surv Tutour 15(3):1136–1159

    Article  Google Scholar 

  21. Statistics (By season stats). https://www.iplt20.com. Accessed 27 Aug 2019

  22. Statistics (Players and team stats category). https://www.nfl.com. Accessed 15 Jan 2020

  23. Statisics (Scores and stats). https://www.mlb.com. Accessed 15 Dec 2019

  24. Statistics (Team stats). https://in.nba.com. Accessed 10 Nov 2019

  25. Business (TV is biggest driver in global sport league revenue). https://globalsportmatters.com/business/2019/03/07/tv-is-biggest-driver-in-global-sport-league-revenue. Accessed 05 Decem 2019

  26. Sports industry statistic and market size overview, business and industry statistics. https://www.plunkettresearch.com/statistics/Industry-Statistics-Sports-Industry-Statistic-and-Market-Size-Overview. Accessed 20 Jan 2020

  27. Services (Analytics). https://www.optasports.com. Accessed 22 Oct 2019

  28. Betting and fantasy. https://www.stats.com. Accessed 24 Nov 2019

  29. Prasitio D, Harlili D (2016) Predicting football match results with logistic regression. In: Proceedings of the 2016 international conference on advanced informatics: concepts, theory and application (ICAICTA), 16–19 Aug 2016, Penang, Malaysia. https://doi.org/10.1109/ICAICTA.2016.7803111

  30. Bunker RP, Thabtah F (2019) A machine learning framework for sport result prediction. Appl Comput Inform 15(1):27–33

    Article  Google Scholar 

  31. Gu W, Saaty TL (2019) Predicting the outcome of a tennis tournament: based on both data and judgments. J Syst Sci Syst Eng 28:317–343. https://doi.org/10.1007/s11518-018-5395-3

    Article  Google Scholar 

  32. Ghosh S, Sadhu S, Biswas S, Sarkar D, Sarkar PP (2019) A comparison between different classifiers for tennis match result prediction. Malays J Comput Sci 32(2):97–111

    Article  Google Scholar 

  33. Barnett T, Brown A, Clarke SR (2006) Developing a tennis model that reflects outcomes of tennis matches. In: Proceedings of the 8th Australasian conference on mathematics and computers in sport, Coolangatta, Queensland, pp 178–188

  34. Martins RG, Martins AS, Neves LA, Lima LV, Flores EL, de Nascimento MZ (2017) Exploring polynomial classifier to predict match results in football championships. Expert Syst Appl 83:79–93

    Article  Google Scholar 

  35. Kyriakides G, Talattinis K, George S (2014) Rating systems vs machine learning on the context of sports. In: Proceedings of the 18th panhellenic conference on informatics, ACM, Athens, Greece, pp 1–6. https://doi.org/10.1145/2645791.2645846

  36. Baboota R, Kaur H (2018) Predictive analysis and modelling football results using machine learning approach for English Premier League. Int J Forecast 35(2):741–755. https://doi.org/10.1016/j.ijforecast.2018.01.003

    Article  Google Scholar 

  37. Soliman G, El-Nabawy A, Misbah A, Eldawlatly S (2017) Predicting all star player in the national basketball association using random forest. In: Proceedings of the 2017 intelligent systems conference (Intelli Sys), London, UK, pp 706–713. https://doi.org/10.1109/IntelliSys.2017.8324371

  38. Thabtah F, Zhang L, Abdelhamid N (2019) NBA game result prediction using feature analysis and machine learning. Ann Data Sci 6(1):103–116

    Article  Google Scholar 

  39. Loeffelholz B, Bednar E, Bauer KW (2009) Predicting NBA games using neural networks. J Quant Anal Sports 5(1):1156

    MathSciNet  Google Scholar 

  40. Novatchkov H, Baca A (2013) Artificial intelligence in sports on the example of weight training. J Sports Sci Med 12(1):27–37

    Google Scholar 

  41. Fister I, Rauter S, Yang XS, Ljubiˇc K, Fister I (2015) Planning the sports training sessions with the bat algorithm. Neurocomputing 149:993–1002

    Article  Google Scholar 

  42. Chu WT, Situmeang S (2017) Badminton video analysis based on spatiotemporal and stroke features. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, Bucharest, Romania, pp 448–451. https://doi.org/10.1145/3078971.3079032

  43. Careelmont S (2013) Badminton shot classification in compressed video with baseline angled camera. Master Thesis, University of Ghent

  44. Chen B, Wang Z (2007) A statistical method for analysis of technical data of a badminton match based on 2-d seriate images. Tsinghua Sci. Technol. 12(5):594–601

    Article  Google Scholar 

  45. Sharma M, Monika, Kumar N, Kumar P (2020) Badminton match outcome prediction model using naïve bayes and feature weighting technique. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02578-8

    Article  Google Scholar 

  46. Results. https://bwfbadminton.com. Accessed 15 Dec 2019

  47. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18. https://doi.org/10.1145/1656274.1656278

  48. Hall MA (2020) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning (ICML ’00), Morgan Kaufmann, San Francisco, Calif, USA, pp 359–366

  49. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  50. Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: ECML'94: proceedings of the 7th European conference on machine learning, pp171–182.  https://doi.org/10.1007/3-540-57868-4_57

  51. Ahmad A, Dey L (2005) A feature selection technique for classificatory analysis. Pattern Recogn Lett 26:43–56

    Article  Google Scholar 

  52. Jiang L, Zhang L, Li C, Wu J (2019) A correlation-based feature weighting filter for Naive Bayes. IEEE Trans Knowl Data Eng 31(2):201–213

    Article  Google Scholar 

  53. Wilkinson L, Anand A, Tuan DN (2011) CHIRP: a new classifier based on composite hypercubes on iterated random projections. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 6–14. https://doi.org/10.1145/2020408.2020418

  54. Lazarsfeld PF, Henry N (1968) Latent structure analysis. Houghton Mifflin, Boston

    MATH  Google Scholar 

  55. Sturges HA (1926) The choice of a class interval. J Am Stat Assoc 21:65–66

    Article  Google Scholar 

  56. Witten IH, Eibe F, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

Download references

Acknowledgements

Corresponding author acknowledges the Triveni Badminton Club (TBC) for valuable inputs.

Funding

There is no funding source.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naresh Kumar.

Ethics declarations

Conflict of interest

The corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, M., Monika, Kumar, N. et al. Naive bayes-correlation based feature weighting technique for sports match result prediction. Evol. Intel. 15, 2171–2186 (2022). https://doi.org/10.1007/s12065-021-00629-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-021-00629-3

Keywords

Navigation