Deep Transfer Learning-Based Foot No-Ball Detection in Live Cricket Match

Automation in every part of life has become a frequent situation because of the rapid advancement of technology, mostly driven by AI technology, and has helped facilitate improved decision-making. Machine learning and the deep learning subset of AI provide machines with the capacity to make judgments on their own through a continuous learning process from vast amounts of data. To decrease human mistakes while making critical choices and to improve knowledge of the game, AI-based technologies are now being implemented in numerous sports, including cricket, football, basketball, and others. Out of the most globally popular games in the world, cricket has a stronghold on the hearts of its fans. A broad range of technologies are being discovered and employed in cricket by the grace of AI to make fair choices as a method of helping on-field umpires because cricket is an unpredictable game, anything may happen in an instant, and a bad judgment can dramatically shift the game. Hence, a smart system can end the controversy caused just because of this error and create a healthy playing environment. Regarding this problem, our proposed framework successfully provides an automatic no-ball detection with 0.98 accuracy which incorporates data collection, processing, augmentation, enhancement, modeling, and evaluation. This study starts with collecting data and later keeps only the main portion of bowlers' end by cropping it. Then, image enhancement technique are implied to make the image data more clear and noise free. After applying the image processing technique, we finally trained and tested the optimized CNN. Furthermore, we have increased the accuracy by using several modified pretrained model. Here, in this study, VGG16 and VGG19 achieved 0.98 accuracy and we considered VGG16 as the proposed model as it outperformed considering recall value.


Introduction
Cricket is the second-most well-liked sport [1], with 2.5 billion spectators, just behind soccer/football, which has 3.5 billion fans [2]. Two teams of eleven players each compete in cricket, and each team is focused on winning. Two umpires are assigned to each end of the feld to oversee the match (the end for bowlers and the end for strikers). Te umpire must guarantee that the game is played in line with the laws [3]. Te umpire's choice afects crucial rules. In a game, anything can happen in a split second, and a poor choice might signifcantly change the outcome. If a bad decision has an impact on the game, it has an impact on the entire fan base. Several regulations exist to make the game of cricket fair to both batters and bowlers, one of which is the no ball rule. No-ball rulings are the most difcult to forecast of all umpire judgments. Unlawful deliveries to a batter are known as "no balls" in cricket. Tere are many forms of no balls in cricket including front foot no ball, touching the return crease, a full toss no ball, uninformed change in bowling style, throwing the ball before delivery, chucking, felding related no balls, wicket-keeping-related no ball, if the bowlers touch the wickets, and if the ball does not reach the batter [4]. Tis choice could result in a victory for one team or a defeat for the opposing team. A no-ball throw results in the addition of one run, or two under some rules, to the batting team's total score, and an additional ball must be bowled [5]. Terefore, it is vital to choose prudently in a no-ball circumstance [6]. In the case of uncertain wickets, despite the presence of several umpires and a well-known videographer, they had difculty making timely, precise decisions. Out of the limited research that has been done to detect the no ball, there has not been much using the CNNbased models to solve the problem. In relation to this, the suggested system proposes a newly created dataset since the nonend striker's image data is the key source of information for front no-ball detection. Te entire dataset was later pruned while keeping just that portion. Te quantity and quality of the data afect both the performance of ML and DL models, thus we employed a variety of pertinent augmentation techniques to increase the data size and certain image enhancement approaches to improve the quality of the images to train a tuned CNN model using keras tuner and a few well-known pretrained models (VGG19, VGG16, ResNet50, InceptionV3, and MobileNet). Here, in this study, VGG16 presented better results in terms of accuracy (0.98), precision, and recall. Tus, the system is designed through data collection, processing, augmentation, modeling, and evaluation.
Tis study's research challenges are listed as follows: (1) Main challenge was collecting the dataset.
(2) Resizing and tuning the image were the next challenge that was undertaken during the study. (3) Modifying and fnding the best model parameters.
Te key main contributions of this study are given as follows: (1) We initiated the dataset (https://drive.google.com/ drive/folders/1_4PFP_zZEifYCnVvhkdBUuzwb03io QMp?usp=sharing) as there was no available one for detecting no ball by using both manual and automated processes from various online sources, for instance, Google search, highlights, and live matches, and annotated it into two categories such as "legal" and "no ball". (2) We cropped the images while trimming out unnecessary portions as foot no ball is detected in the striker end. (3) To enlarge the dataset, we used various augmentation techniques such as rotation, width and height shifting, shearing the range, and horizontal fip. (4) We used some image enhancement techniques to make the images sharper and brighter. (5) We implied 5 (fve) transfer learning models alongside optimized tuned models trained on the augmented and enhanced images and optimized CNN models are settled by hyperparameter tuning. (6) Finally, we evaluated the model on performance metrics such as accuracy, precision, recall, and F1score.
Te rest of the paper contains Section 2 which is literature review on diferent research papers, and Section 3 presents an overview of the cricket game. Section 4 shows the methodology used in the whole process. Section 5 presents the experimental results and discussion on the generated results, and fnally, Section 6 shows the conclusion and the future works.

Related Works
In recent years, in order to make smart cities [7], researchers have invented a number of technologies. As a result of this idea, smart sports such as smart cricket have become very important. Although we conducted a literature review and observed that there has been minimal deep learning research in cricket no-ball recognition, Rahman et al. [8] developed a unique approach for identifying the kind of delivery from a bowler's fnger grip while the bowler is making a delivery. Tey correctly classify bowlers' grips with 0.9875 accuracy. To prevent umpires from making mistakes owing to human error, Iyer et al. [9] proposed automatic system for deciding run outs and no-ball deliveries.
Here, they tried with diferent CNN-based model VGG16 and machine learning model SVM for classifcation; in both the cases, VGG16 outperforms the SVM which shows the power of CNN in extracting the feature from the image data compared to other structures. It detected no ball with an accuracy of 0.8923, and in the case of run, it out achieved an accuracy of 0.8889. Kowsher et al. [10] studied to avoid erroneous interpretations brought on by perspective mistakes, and an automated multidimensional visual system was presented. Tis work initially has presented a technique for automated no-ball decision-making to distinguish between virtual fault and true decision which aims towards a system which can demolish bad interpretations because of perspective errors by analyzing the team, player performance, and evaluation of playing environments before and after the match to fnd the insights about the game, players, strategy, and the environments using a graphics system based on computer and by using many dimensions to approximate a ball's movement and contrasting the anticipated path to the main line. CNN models are used thoroughly to counter image-related problems. Namburu et al. [11] investigated the X-ray images with the help of CNN based on several pretrained state-of-art models. An investigation was done by using [12] medical imaging to detect breast cancer using several ML algorithms, and image processing and segmentation techniques are being used by them on the mammogram data of patients. As an extension, Harun-Ur-Rashid et al. [6] employed a CNN-based classifcation technique using Inception V3 to automatically distinguish and categorize waist high no balls. Teir approach achieves an overall average accuracy of 0.88 with a comparatively low cross-entropy value. A thorough examination of the cricket pitch [13] may be useful in forecasting winners and losers while also eliminating the necessity for manual pitch analysis. Tey have decided to investigate the impact of fractures on the cricket pitch on the ball bowled by bowlers. 2 Computational Intelligence and Neuroscience Minhas et al. [14] investigated an efective shot categorization approach for on-feld sport footage based on AlexNet convolutional neural networks (AlexNet CNN) which yielded a 0.9407 accuracy. Khan et al. [15] initiated deep convolutional neural networks based systems to identify and categorize distinct batting strokes from cricket videos. Te proposed model can recognize a shot with 0.90 accuracy. Sen et al. [16] initiated a hybrid deep neural network architecture for the classifcation of 10 distinct cricket batting strokes from ofine footage with 0.93 height accuracy. Tong et al. [17] demonstrated the framework (unifed) for the types of feld sports played with ball for the purpose of shot of semantic in category, which leads to separation of video frames based on three main important factors such as diameter of camera snap, technology used for video creation, and main topic in a scene. Tey outlined the frameworks, defned semantic shot, and worked with a few instances to provide detection based on the three properties. Te results were tested using shot clustering and retrieval, video segmentation (temporal), and semantic video analysis. Li et al. [18] made extension of this work, aiming towards detection of video shots and its implication in detecting events to detect shots on specifc game using some predefned rules. Tey used visual words model to demonstrate the main frame for each individual shot which will go to SVM and PLSA to classify the main frame towards predicting the type of shot, as it is not domain specifc and can be integrated with diferent types of games. Chowdhury et al. [19] used computer vision to detect foot no ball in a cricket match which applied image subtraction method on the pixel value for having the chance to make a judgment. Despite the fact that several tracking methods for improved monitoring of crickets have been published in a number of articles, relatively few studies specifcally address cricket no ball tracking. Batra et al. [20] implemented augmented reality for ball tracking and automated no-ball detection where they calculated the distance among bowling crease point, popping crease point, and foot marks of the bowlers using a contour algorithm to label a ball whether no or not. Terefore, this study aimed towards making an automated system considering only the video content of the bowler's end and popping crease. Hence, it is capable of detecting no ball automatically without intervention of extra devices rather than the one used in cricket matches.

Cricket Game Overview
Cricket is the world's second popular game, played between two teams. Tere is unanimity among experts that the history of cricket begins in the late 16th century. It started in the south-east of England, and in the 18th century, it turned into a global sport and expanded internationally between the 19th and 20th centuries. Since the 19th century, international matches have been played, besides ofcial test matches dating back to 1877 [21]. Figure 1 captures a moment of that time. Cricket is internationally conducted by the International Cricket Council (ICC). Cricket rules and laws, such as LBW, stumps, and bat width, were devised and eventually adjusted in 1774 [22]. And as time passes, things begin to shift and evolve, and fresh regulations are added. Tese regulations and laws were visible on the cricket pitch following the introduction of umpires. And nowadays, it is not simply a game but also an expression of people's emotions. As a result, the impact of a match's outcome is not limited to two countries. However, there are situations when umpires make mistakes that cause a lot of controversy. And classifying a ball as a no ball is a critical choice. Tere are several examples of this type that have resulted in a significant debate. Once, Lasith Maliinga detected a no ball by a signifcant margin but the umpire neglected to register it [23]. Another one was done by Ben stokes, where England's fast bowlers conceded 3 consecutive no balls [24] but none of those was marked by the on-feld umpires.

Proposed Methodology
Detecting no-ball with an AI system is all about feeding the model with captured video from left and right side of the ground to make a judgment. In this research paper, we created a dataset and tested the model with video frame after training the model with prepossessed and trimmed data set. A step-by-step workfow diagram in detecting overstepping is given in Figure 7, and the algorithm 1 depicts the pseudocode.

4.1.
Dataset. Using the methods described (see Figure 8), we created a dataset of no-ball and legal-ball images: (1) Identifying the research objectives: Te frst step is to identify the research objectives (see Introduction section).

Data Collection.
Gathering the data is one of the major parts of the research. And like others, we collected data from Google and by capturing images from the live cricket match and highlights. Images involved in the process were not only from the international but also from domestic practice matches and also from the practice sessions. A sample of raw data is shown in Figure 9 and Table 1 depicts the data distribution among training, validation, and testing set, which is almost balanced.

Data Cropping.
Te picture region near the nonstrikers is solely signifcant in the process of identifying no-ball through over-stepping. Te image of the entire playground does not have to be counted while classifying. As a result, cropping all of the images was required immediately after gathering data in order to create this system. Figure 10 depicts the picture condition after cropping [25].

Data
Augmentation. Data augmentation [26] is one of the go to techniques in the image processing types of problem using machine and deep learning. Tere are many reasons to adopt this in the system. It helps to generate images based on the original images, thus increasing the datasets and enabling models to learn from diferent perspectives, which further plays a big hand in resolving over-ftting problem [27]. Keeping all that in mind, there are several augmentation techniques implied in this study for making robust models as shown in Table 2.

Data Enhancement.
Image enhancement is a key component of strategies for improving image quality, such as emphasizing signifcant areas and eliminating or weakening distortion or noise in the image [28][29][30]. Smooth and sharp, noise reduction, deblur pictures, contrast adjustment, brighten an image, and gray-scale image histogram equalization are some of the prominent ways [31,32]. Furthermore, the optimum feasible combination of those is essential to keep the system running smoothly. Figure 11 demonstrates some of those after efects of image enhancement.

Convolutional Neural Network.
Te convolutional neural network (CNN or convnet) [33] is a signifcant subgroup of the three artifcial neural net-learning models with 100 * 100 dimension. A CNN is a sort of network design for deep-learning algorithms that is mostly used for image classifcation, identifcation, localization, and      detection tasks that include the handling of pixel input. Convnet is responsible for extracting the feature which defnes the image followed by the pooling layer [34], which ends up with a fatten layer [35] to convert higher dimensional data into 1D. Figure 12 captures the functioning of a CNN. Table 3 presents the experimental parameters while tuning the hyperparameters.
Te number of successfully predicted input data in terms of the total number of samples is referred to as accuracy [36].

Transfer Learning.
Transfer learning is a machine learning approach that entails applying a learning approach produced for one issue as the basis for a model for another [37,38]. It uses a previously trained model to solve a new problem. Shorter training timeframes, greater artifcial neural output (in most circumstances), and the lack of a large quantity of data are indeed the main strengths of transfer learning (TL), VGG19 [39], VGG16 [40], and ResNet50 [41], which is based on the residual network [42]; InceptionV3 [43] and MobileNet [44] are used as pretrained models in this study. Table 4 depicts the model's parameters.

Experimental Results and Analysis
Tis section holds the empirical results obtained by hyperparameter tuned CNN and a few pretrained models after feeding the trained data. Instead of having an almost balanced dataset, models were evaluated based on the classifcation report [45] (accuracy, precision, recall, and F1score). Table 5 demonstrates the performance of the models. Figure 13 depicts the accuracy of diferent models in the bar chart. All the 5 (fve) pretrained models and the optimized CNN perform quite well in learning the features of the images and classifying them with the highest classifcation rate of 0.98 is achieved jointly by both VGG19 and VGG16.
ResNet 50 performed relatively low as compared to others. And Table 6 depicts the empirical macro and weighted average. All the models trained for 50 epochs using the available GPU of Google colab and kept track of model training progress. Te accuracy of each model increases as epochs go. Te over-ftting is a common issue, and most of the time, the training occurs when there is a good amount of gap between validation and training accuracy. Figure 14 demonstrates that VGG19 was slightly over-ftted [46] as epochs go. And VGG16 seems smoother in terms of training and validation accuracy. Figure 15 shows the confusion matrix [47] of all the models, which is also a powerful tool for evaluating models based on their reactions to positive and negative categories. Te value in the top right-hand corner box represents the false positive, whereas the value in the bottom left corner box denotes the false negative and the FN value is quite low, ranging between 0 and 2, indicating that the models identify the no ball correctly but are a little perplexed in determining the legal ball. Finally, it would not be incorrect to state that the system works well and reliably enough to discern between legitimate and illegal deliveries. Computational Intelligence and Neuroscience 5 Finally, after reviewing all the pertinent works, although currently there is not enough research in this area. Here, it summarizes the comparison among related works.
(1) Chowdhury et al. [19] applied the image subtraction method to the pixel value for having the chance to make a judgment by further testing the system for 6 input frames using computer vision, where our proposed system is tested on 262 test images.
(2) Batra et al. [20] introduced a no-ball detection system through the distance among bowling crease points, popping crease point, and foot marks of the bowlers using the contour algorithm. On the contrary, deep learning models are implied as the classifer in our proposed method and have achieved reliable satisfactory results.        Select the best optimized model else (1) Modify the input layer of transfer (2) Add some additional layer while keeping the top Layer false; Add the Output layer; for i � 1 to Iter do (1) Train model with n number of batch size; (2) Feature extraction through hidden layers; Forward propagation; (4) Backward propagation for updating weights; (5) Model validation with validation data to check over ftting.

Model Evaluation:
(1) Evaluate the model with test data; (2) Store the model performance in the Acc variable.     Figure 13: Accuracy comparison among the models.  Computational Intelligence and Neuroscience (3) Harun-Ur-Rashid [6] built a framework using CNN and Inception V3 models including image processing techniques to segregate waist-high no ball and legal deliveries and achieved 0.92 accuracy compared to which this framework started with data collection, augmentation, enhancement, modeling, and evaluation with test data and counted 0.98 accuracy.

Conclusions and Future Works
In this study, a deep learning-based approach was proposed to detect no balls in cricket. Te study began with a thorough background study to gather domain information, followed by the creation of a dataset containing two classes: legal and no ball which was then segregated into training, validation, and test sets. Various image augmentation techniques were applied to increase the dataset's size and create a robust model. Additionally, image enhancement techniques were used to make the images brighter and clearer. Te performance of a fne-tuned CNN model and a modifed state-ofthe-art pretrained model were compared, and the selected VGG16 model achieved a 0.98 accuracy rate. Te fndings of this study could potentially give cricket organizations and policymakers information on the efects of technology on the fan experience. Decisionmakers should consider the potential efects on fan engagement, motivation, and loyalty before deciding whether to deploy no-ball detection systems or other similar technologies. Moreover, the study could aid in formulating plans to mitigate any negative efects on the fan experience and keep fans engaged despite technological progress. Despite the proposed system's success, it has some drawbacks, including real-time identifcation from video frames, training models on large amounts of data, and the use of sophisticated picture enhancing methods. Tese limitations suggest potential future applications to increase the dataset's size and incorporate enhancement techniques such as histogram equalization, noise removal using a Wiener flter, linear contrast adjustment, median fltering, and unsharp mask fltering. Integrating this system with the video camera of cricket grounds and incorporating a voice assistant system for taking immediate actions as soon as a no ball is detected could also be explored. Additionally, amalgamating evidential reasoning [48,49], adopting belief rule-based systems [50][51][52] with deep learning models, and employing explainable artifcial intelligence (XAI) [53] technologies could potentially yield better outcomes.

Data Availability
Te data used to support the fndings of this study are available upon reasonable request to the corresponding author.

Conflicts of Interest
Te authors declare that there are no conficts of interest.