Enhancing Badminton Game Analysis: An Approach to Shot Refinement via a Fusion of Shuttlecock Tracking and Hit Detection from Monocular Camera

Extracting the flight trajectory of the shuttlecock in a single turn in badminton games is important for automated sports analytics. This study proposes a novel method to extract shots in badminton games from a monocular camera. First, TrackNet, a deep neural network designed for tracking small objects, is used to extract the flight trajectory of the shuttlecock. Second, the YOLOv7 model is used to identify whether the player is swinging. As both TrackNet and YOLOv7 may have detection misses and false detections, this study proposes a shot refinement algorithm to obtain the correct hitting moment. By doing so, we can extract shots in rallies and classify the type of shots. Our proposed method achieves an accuracy of 89.7%, a recall rate of 91.3%, and an F1 rate of 90.5% in 69 matches, with 1582 rallies of the Badminton World Federation (BWF) match videos. This is a significant improvement compared to the use of TrackNet alone, which yields 58.8% accuracy, 93.6% recall, and 72.3% F1 score. Furthermore, the accuracy of shot type classification at three different thresholds is 72.1%, 65.4%, and 54.1%. These results are superior to those of TrackNet, demonstrating that our method effectively recognizes different shot types. The experimental results demonstrate the feasibility and validity of the proposed method.


Introduction
Recently, the importance of scientific training methods and data analysis has notably increased in professional sports.Typically, the analyzed information includes important knowledge related to sports training; both coaches and athletes develop improved training methods through the acquired information, thereby achieving superior results in competitions.However, its effectiveness can be hard to observe due to information gathering being a multifaceted domain.It requires a deep understanding of advanced training theories and methodologies from sports-dominant nations, the incorporation of innovative skills and tactics, cutting-edge training equipment, and the comprehension of opponent characteristics and their habits.If these requirements can be met, the provided information allows athletes to better realize the impact of their training intensity and effectiveness.Furthermore, it helps them understand and apply tactical skills that specifically match their opponents' traits [1].
To illustrate how data analysis impacts the sports domain, this study initially discusses its applications across various sports fields.Data analysis has been successfully applied in Major League Baseball (MLB) for many years-for example, the Oakland Athletics, a team with a limited budget and no superstars, whose manager evaluated players based on their on-base percentage (OBP) rather than traditional statistics like batting average (BA) or on-base plus slugging (OPS).This strategy led them to win many games and become a methods of collecting valuable information are labor-intensive and require an extensive review of extensive game data.Without this process, it is challenging for experts to summarize valuable insights into players' characteristics, strengths, and weaknesses.Even when information is collected, organizing these data to create effective training plans is time-consuming.Therefore, the need for accuracy, efficiency, and scalability drives the development of an automated information-gathering system, which is also the aim of this study.
To achieve the above goal, this study presents an automated analytical system that analyzes badminton games from a monocular camera.This system focuses on accurately identifying the rallies and extracting each shot between two players by tracking the shuttlecock and players' movements on the court.Features such as types of shots, the average movement distance, and scoring area distribution can help players understand opponents' tactics.They can utilize this information as a reference to improve their on-court performance and increase their chances of winning.
The objective of this study is to provide a data-driven understanding of individual player abilities for both players and coaches, thereby propelling the development of training plans and competitive strategies.Leveraging computer vision and machine learning technologies, we aim to tackle the challenge of acquiring more accurate shots from regular competition videos.This approach allows us to conduct an in-depth analysis of individual playing styles, helping the coach to understand each player's unique characteristics and enabling the customization of specific training plans.While this methodology is applied to badminton, it also holds the potential for transferability to a wide range of other sports disciplines.
The contributions of this study are as follows: 1.
We propose a shot refinement algorithm that uses the results of shuttlecock tracking and action detection, demonstrating improved results compared to previous methods.

2.
The extracted shots are more precise than those obtained using previous methods, particularly in circumstances with numerous detection misses of the shuttlecock.This leads to improved accuracy in shot-type classification.

3.
The proposed method only analyzes videos captured from a monocular camera, making it applicable to existing badminton videos without any additional hardware.
The rest of this paper is organized as follows.Section 2 provides related works in sports analytics, particularly focusing on computer vision approaches.Section 3 describes the proposed algorithm and modules used in this study.Section 4 highlights some experiments performed to prove the feasibility and superiority of the proposed method, followed by conclusions in Section 5.

Related Work 2.1. Sports Analytics with Computer Vision and Deep Learning
Numerous studies have leveraged computer vision technology to facilitate automated sports analysis.For example, a real-time basketball shooting spot analysis system was designed using deep neural networks (DNNs) [32].The system analyzes basketball videos from a court view, tackling challenges such as the panning and zooming of the camera, and is capable of accurately identifying the exact timepoint of a shot.The study predicted the types of shots to be a three-pointer, two-pointer, or free throw.Some studies utilize a Virtual Reality (VR) system to provide an immersive environment.This has been developed to enhance court vision for basketball athletes, which is a critical competency in competitive scenarios.Enhanced court vision enables athletes to effectively identify teammates with open shot opportunities, observe the positioning of defensive players around the open space, and promptly select appropriate passing paths.A practical example of this approach is VisCoach [33], which simulates the player's viewing perspective and provides highquality visual training tasks to expedite the player's development process.By aligning the court situation with the player's eye gaze, the system can analyze vision-related behaviors to evaluate the effectiveness of the training.
DNNs have also found significant applications in the field of football, particularly in predicting match outcomes [27].The employed dataset encompasses a comprehensive range of factors, including team rankings, past performances, and the results of previous international football matches, among other relevant variables.By deploying a DNN to explore and process this vast array of sports data, predictive values can be generated.As per the proposed DNN architecture, the model demonstrated exceptional performance in predicting the outcomes of the FIFA 2018 World Cup matches, achieving a remarkable accuracy rate of 63.3%.
Wang et al. proposed an innovative system, Coach AI [34], which utilizes deep learning methodologies for the analysis of badminton game videos.This system effectively extracts vital information such as player's movements, positioning, and strategies.Moreover, it is capable of detecting the trajectory of the shuttlecock, thereby enabling the identification of the players' swing moments, segmentation of each round, summarization of points won and lost, and categorization of the shots.The utilization of this system significantly aids coaches and athletes in formulating strategic policies for training practice, thereby enhancing the competitive performance of the players.

Human Pose Estimation
In applications of motion analysis via visual-based systems, the estimation of human posture is a technique frequently employed.Plenty of techniques with high accuracy have been developed to address this complex task.Each technique presents its own unique methodologies and technical elements to effectively manage the challenges inherent in human pose estimation.
A popular solution for human pose estimation is the Openpose framework [35].Openpose is a well-known open-source library using convolutional neural networks (CNN) to achieve powerful human pose estimation.Notably, the Openpose framework is not only outstanding in estimating human poses, but it also excels at tracking facial expressions, torso, limbs, and subtle finger movements in both single-and multi-person scenarios.The network structure of Openpose is illustrated in Figure 1.DNNs have also found significant applications in the field of football, particularly in predicting match outcomes [27].The employed dataset encompasses a comprehensive range of factors, including team rankings, past performances, and the results of previous international football matches, among other relevant variables.By deploying a DNN to explore and process this vast array of sports data, predictive values can be generated.As per the proposed DNN architecture, the model demonstrated exceptional performance in predicting the outcomes of the FIFA 2018 World Cup matches, achieving a remarkable accuracy rate of 63.3%.
Wang et al. proposed an innovative system, Coach AI [34], which utilizes deep learning methodologies for the analysis of badminton game videos.This system effectively extracts vital information such as player's movements, positioning, and strategies.Moreover, it is capable of detecting the trajectory of the shuttlecock, thereby enabling the identification of the players' swing moments, segmentation of each round, summarization of points won and lost, and categorization of the shots.The utilization of this system significantly aids coaches and athletes in formulating strategic policies for training practice, thereby enhancing the competitive performance of the players.

Human Pose Estimation
In applications of motion analysis via visual-based systems, the estimation of human posture is a technique frequently employed.Plenty of techniques with high accuracy have been developed to address this complex task.Each technique presents its own unique methodologies and technical elements to effectively manage the challenges inherent in human pose estimation.
A popular solution for human pose estimation is the Openpose framework [35].Openpose is a well-known open-source library using convolutional neural networks (CNN) to achieve powerful human pose estimation.Notably, the Openpose framework is not only outstanding in estimating human poses, but it also excels at tracking facial expressions, torso, limbs, and subtle finger movements in both single-and multi-person scenarios.The network structure of Openpose is illustrated in Figure 1.Openpose takes an image as input, followed by CNNs, forming a comprehensive feature map.These feature maps are then divided into two branches to obtain Confidence Maps and Part Affinity Fields.The use of bipartite matching helps to associate different body parts by connecting corresponding joint nodes of the individual, leveraging the nature of Part Affinity Fields for high-precision matching.This process culminates in the merging of matches, yielding a holistic skeletal representation of a person.Additionally, Openpose takes an image as input, followed by CNNs, forming a comprehensive feature map.These feature maps are then divided into two branches to obtain Confidence Maps and Part Affinity Fields.The use of bipartite matching helps to associate different body parts by connecting corresponding joint nodes of the individual, leveraging the nature of Part Affinity Fields for high-precision matching.This process culminates in the merging of matches, yielding a holistic skeletal representation of a person.Additionally, Openpose is able to tackle the multi-person estimation problem, treating this challenge as a graph matching problem, and navigates it with the utilization of the Hungarian Algorithm, demonstrating excellent accuracy in sophisticated multi-person pose-estimation scenarios.
Sensors 2024, 24, 4372 5 of 20 Another noteworthy approach is DensePose-RCNN [36], integrated within the De-tectron2 framework [37].The DensePose-RCNN leverages three prominent deep learning models: Denspose, Mask-RCNN [38], and DenseReg [39].The model is highly proficient in predicting dense correspondences of body parts and executing intricate pose estimations of individuals.Specifically, the model is capable of assigning each pixel to a specific part of the human body, such as the head, hands, feet, etc., and providing comprehensive posture information for these parts, including factors such as the rotation angle and position.This detailed, dense correspondence underscores the substantial application value of DensePose-RCNN.Beyond providing fundamental pose information, it aids in understanding the nuanced movements of various parts of the human body, thereby offering a wealth of potential applications in human behavior analysis and virtual try-on solutions, among others.Figure 2 depicts the network structure.
Openpose is able to tackle the multi-person estimation problem, treating this challenge as a graph matching problem, and navigates it with the utilization of the Hungarian Algorithm, demonstrating excellent accuracy in sophisticated multi-person pose-estimation scenarios.
Another noteworthy approach is DensePose-RCNN [36], integrated within the De-tectron2 framework [37].The DensePose-RCNN leverages three prominent deep learning models: Denspose, Mask-RCNN [38], and DenseReg [39].The model is highly proficient in predicting dense correspondences of body parts and executing intricate pose estimations of individuals.Specifically, the model is capable of assigning each pixel to a specific part of the human body, such as the head, hands, feet, etc., and providing comprehensive posture information for these parts, including factors such as the rotation angle and position.This detailed, dense correspondence underscores the substantial application value of DensePose-RCNN.Beyond providing fundamental pose information, it aids in understanding the nuanced movements of various parts of the human body, thereby offering a wealth of potential applications in human behavior analysis and virtual try-on solutions, among others.Figure 2 depicts the network structure.

Multiple Object Tracking
Multi-object tracking techniques are typically used to address video-understanding problems.Currently, there are two types of solutions.The first type is the model-free method, which initially determines the positions of the objects to be tracked in the frame (usually the first frame) and then tracks these objects in subsequent frames.However, this method cannot track new or reappeared targets.Another method is tracking-by-detection.This method performs detection on every frame and then compares the detection results between them to extract the correct trajectory of each object [40][41][42][43][44][45].For example, Bewley et al. proposed a simple online and real-time tracking (SORT) framework [41].This method first employs Faster R-CNN with VGG16 to detect objects in the video and sets all detected targets in the first frame as tracks.Then, the Kalman filter is employed to predict the possible bounding box of each track, and matches them with the bounding box predicted by the detection model.Finally, SORT divides the matching results into three types: unmatched tracks, unmatched detections, and successful matches.A successful match means the detected bounding box and the predicted bounding box match each other, indicating that the target tracking of two consecutive frames is successful.Meanwhile, unmatched tracks are deleted, and unmatched detections are regarded as new tracks.
Due to recent advancements in deep learning techniques, tracking-by-detection methods are no longer time-consuming; thus, object-detection models are widely adopted.Apart from the Faster R-CNN used in the SORT method, another popular model is You Only Look Once (YOLO).Since time and accuracy are often contradictory, the development direction of YOLO is to make trade-offs toward speed while providing adequate detection accuracy [46][47][48][49][50][51].In 2023, YOLOv7 [52] made significant progress in processing speed.With the same average precision as YOLOv5, the computational speed increased by 120%, making it more suitable for applications that require real-time computing.For example, Tan et al. proposed a YOLO-based multiple object tracking algorithm [53].This algorithm uses YOLO to obtain a target's size and position, then extracts deep

Multiple Object Tracking
Multi-object tracking techniques are typically used to address video-understanding problems.Currently, there are two types of solutions.The first type is the model-free method, which initially determines the positions of the objects to be tracked in the frame (usually the first frame) and then tracks these objects in subsequent frames.However, this method cannot track new or reappeared targets.Another method is tracking-bydetection.This method performs detection on every frame and then compares the detection results between them to extract the correct trajectory of each object [40][41][42][43][44][45].For example, Bewley et al. proposed a simple online and real-time tracking (SORT) framework [41].This method first employs Faster R-CNN with VGG16 to detect objects in the video and sets all detected targets in the first frame as tracks.Then, the Kalman filter is employed to predict the possible bounding box of each track, and matches them with the bounding box predicted by the detection model.Finally, SORT divides the matching results into three types: unmatched tracks, unmatched detections, and successful matches.A successful match means the detected bounding box and the predicted bounding box match each other, indicating that the target tracking of two consecutive frames is successful.Meanwhile, unmatched tracks are deleted, and unmatched detections are regarded as new tracks.
Due to recent advancements in deep learning techniques, tracking-by-detection methods are no longer time-consuming; thus, object-detection models are widely adopted.Apart from the Faster R-CNN used in the SORT method, another popular model is You Only Look Once (YOLO).Since time and accuracy are often contradictory, the development direction of YOLO is to make trade-offs toward speed while providing adequate detection accuracy [46][47][48][49][50][51].In 2023, YOLOv7 [52] made significant progress in processing speed.With the same average precision as YOLOv5, the computational speed increased by 120%, making it more suitable for applications that require real-time computing.For example, Tan et al. proposed a YOLO-based multiple object tracking algorithm [53].This algorithm uses YOLO to obtain a target's size and position, then extracts deep features through the VGG model and inputs them into the LSTM model, thus understanding the temporal relationship between frames.Finally, objects in adjacent frames are matched by calculating the Euclidean distance between different targets.YOLO has remarkable accuracy for general objects such as the human body, but for small objects like badminton shuttlecocks, its detection accuracy is unsatisfactory.To address this problem, Cao et al. [54] proposed two novel networks based on Tiny YOLOv2, named M-YOLOv2 and YOLOBR, to enhance the shuttlecock recognition performance.

High-Speed Ball Tracking
For sports such as tennis, badminton, and baseball, the small size and high speed of the ball often result in blurred images in video frames, thereby posing significant challenges in tracking its trajectory [55][56][57].To address this problem, Huang et al. proposed a deep neural network called TrackNet [58], which is specifically designed for tracking small objects moving at high speed.
As shown in Figure 3, TrackNet adopts the Convolutional Neural Network (CNN) architecture that considers a sequence of consecutive video frames to generate a probability map of the shuttlecock's position for each frame.This approach is aimed at learning not merely the visual features of the ball but also the distinct characteristics of its trajectory patterns in order to achieve precise positioning.The ultimate objective of this methodology is to calculate the ball's flight path and accurately locate the shuttlecock in blurred images.The first 13 layers of TrackNet are the same backbone of the VGG16 model, and these are utilized for feature extraction.The subsequent layers, numbered 14 to 24, are inspired by DeconvNet and aim to generate predictions at the pixel level.
features through the VGG model and inputs them into the LSTM model, thus understanding the temporal relationship between frames.Finally, objects in adjacent frames are matched by calculating the Euclidean distance between different targets.YOLO has remarkable accuracy for general objects such as the human body, but for small objects like badminton shuttlecocks, its detection accuracy is unsatisfactory.To address this problem, Cao et al. [54] proposed two novel networks based on Tiny YOLOv2, named M-YOLOv2 and YOLOBR, to enhance the shuttlecock recognition performance.

High-Speed Ball Tracking
For sports such as tennis, badminton, and baseball, the small size and high speed of the ball often result in blurred images in video frames, thereby posing significant challenges in tracking its trajectory [55][56][57].To address this problem, Huang et al. proposed a deep neural network called TrackNet [58], which is specifically designed for tracking small objects moving at high speed.
As shown in Figure 3, TrackNet adopts the Convolutional Neural Network (CNN) architecture that considers a sequence of consecutive video frames to generate a probability map of the shuttlecock's position for each frame.This approach is aimed at learning not merely the visual features of the ball but also the distinct characteristics of its trajectory patterns in order to achieve precise positioning.The ultimate objective of this methodology is to calculate the ball's flight path and accurately locate the shuttlecock in blurred images.The first 13 layers of TrackNet are the same backbone of the VGG16 model, and these are utilized for feature extraction.The subsequent layers, numbered 14 to 24, are inspired by DeconvNet and aim to generate predictions at the pixel level.After obtaining the predicted coordinates of the shuttlecock, TrackNet introduces a trajectory smoothing algorithm to eliminate incorrect detections and compensate for missed detections.The entire process is divided into three steps.First, the algorithm calculates the distance between predicted coordinates in adjacent frames.If the distance exceeds 100 pixels, it is considered a prediction error and is removed.Next, a sliding window of size 7 is defined to calculate the quadratic curve of these coordinates if there are more than three predicted coordinates in this window.At this stage, any detected coordinate more than 50 pixels away from the curve is removed.Finally, if the distance between the predicted coordinate and the adjacent quadratic curves is less than 5 pixels, a final curve is obtained, which is estimated from the predicted coordinates across 15 consecutive frames.This curve is used to compensate for the missing value in these 15 frames.After this step, TrackNet checks whether there are any missing coordinates in each frame.If any missing coordinates exist, it calculates the quadratic curve through the adjacent six frames and compensates for this missing value [59].The algorithm is described in Algorithm 1.After obtaining the predicted coordinates of the shuttlecock, TrackNet introduces a trajectory smoothing algorithm to eliminate incorrect detections and compensate for missed detections.The entire process is divided into three steps.First, the algorithm calculates the distance between predicted coordinates in adjacent frames.If the distance exceeds 100 pixels, it is considered a prediction error and is removed.Next, a sliding window of size 7 is defined to calculate the quadratic curve of these coordinates if there are more than three predicted coordinates in this window.At this stage, any detected coordinate more than 50 pixels away from the curve is removed.Finally, if the distance between the predicted coordinate and the adjacent quadratic curves is less than 5 pixels, a final curve is obtained, which is estimated from the predicted coordinates across 15 consecutive frames.This curve is used to compensate for the missing value in these 15 frames.After this step, TrackNet checks whether there are any missing coordinates in each frame.If any missing coordinates exist, it calculates the quadratic curve through the adjacent six frames and compensates for this missing value [59].The algorithm is described in Algorithm 1.

Separate Shots in a Rally
In badminton, a rally refers to the exchange of shots while the shuttlecock is in play.Figure 4 shows the partial shuttlecock flight paths during the match between Chou Tien Chen and Anders Antonsen at the 2019 China Badminton Open, with black dots marking the shooting events.Other colors represent different shots.In observing multiple games, Huang et al. [58] found that most of the hit action occurs at the relatively low point of the trajectory of the shuttlecock, specifically at the place where the trajectory begins to rise, as shown in Figure 4. Therefore, they proposed two algorithms to identify shots, including the "Peak Identification Method" and the "Direction Identification Method".The Peak Identification Method converts the shuttlecock position from (x, y) image coordinates to (f i , y), where f i represents the frame number.After this conversion, the shuttlecock's trajectories will form a wave-like pattern and the hit moments primarily occur at the peak of this wave pattern.The frame of a hit can be identified by detecting those peaks.However, some trajectories do not exhibit obvious changes in y.To address this problem, TrackNet proposed the Direction Identification Method, which checks for sudden changes in the direction of the shuttlecock.If there is a significant change in the direction of the shuttlecock, it is considered a hit that forces the shuttlecock to change its flight direction.
tories will form a wave-like pattern and the hit moments primarily occur at the peak of this wave pattern.The frame of a hit can be identified by detecting those peaks.However, some trajectories do not exhibit obvious changes in y.To address this problem, TrackNet proposed the Direction Identification Method, which checks for sudden changes in the direction of the shuttlecock.If there is a significant change in the direction of the shuttlecock, it is considered a hit that forces the shuttlecock to change its flight direction.

Proposed Method
The vision-based approach proposed in this study is specifically designed for singleplayer badminton matches.The system flowchart is illustrated in Figure 5. Initially, the Pose Estimation Module is deployed to acquire the body postures of players in each frame.Subsequently, the player's footprints are recorded by transforming their ankle points to a plane coordinate system via perspective transformation.Meanwhile, the Badminton Tracking Module is employed to track the flight trajectory of the shuttlecock.The extracted trajectories are then processed by the Shot Refinement Module to cut out each shot.Lastly, the results derived from these modules are aggregated or classified to extract information such as shot types, players' hitting habits, causes of point loss, and serving positions.These analytical results are then graphically represented to facilitate rapid comprehension and interpretation for both players and coaches.The following subsections provide a detailed overview of each of these modules.

Proposed Method
The vision-based approach proposed in this study is specifically designed for singleplayer badminton matches.The system flowchart is illustrated in Figure 5. Initially, the Pose Estimation Module is deployed to acquire the body postures of players in each frame.Subsequently, the player's footprints are recorded by transforming their ankle points to a plane coordinate system via perspective transformation.Meanwhile, the Badminton Tracking Module is employed to track the flight trajectory of the shuttlecock.The extracted trajectories are then processed by the Shot Refinement Module to cut out each shot.Lastly, the results derived from these modules are aggregated or classified to extract information such as shot types, players' hitting habits, causes of point loss, and serving positions.These analytical results are then graphically represented to facilitate rapid comprehension and interpretation for both players and coaches.The following subsections provide a detailed overview of each of these modules.

Pose Estimation
Identifying the stances of badminton players is crucial for distinguishing their offensive intentions and positioning within the match.In this study, we used the DensePose model, as discussed in Section 2.2, to estimate players' postures (see Figure 6a).Although OpenPose offers superior accuracy in human pose estimation, its computational intensity when detecting multiple individuals concurrently led us to consider DensePose as a more

Pose Estimation
Identifying the stances of badminton players is crucial for distinguishing their offensive intentions and positioning within the match.In this study, we used the DensePose model, as discussed in Section 2.2, to estimate players' postures (see Figure 6a).Although OpenPose offers superior accuracy in human pose estimation, its computational intensity when detecting multiple individuals concurrently led us to consider DensePose as a more practical alternative.

Pose Estimation
Identifying the stances of badminton players is crucial for distinguishing their offensive intentions and positioning within the match.In this study, we used the DensePose model, as discussed in Section 2.2, to estimate players' postures (see Figure 6a).Although OpenPose offers superior accuracy in human pose estimation, its computational intensity when detecting multiple individuals concurrently led us to consider DensePose as a more practical alternative.
After extracting the ankle coordinates of the players in each frame, these coordinates are projected onto an aerial view by utilizing perspective transformation (see Figure 6b).Although the presence of non-players, such as referees and spectators, can also be detected by the DensePose model, their information can be ignored because the positioning of their ankle points is outside the court.This approach ensures the accurate extraction of the players' actual ground movement and also helps identify different players.As mentioned in Section 2.3, since YOLO does not perform well in detecting small objects, this study only uses YOLO to detect the swinging actions of the players.In terms After extracting the ankle coordinates of the players in each frame, these coordinates are projected onto an aerial view by utilizing perspective transformation (see Figure 6b).Although the presence of non-players, such as referees and spectators, can also be detected by the DensePose model, their information can be ignored because the positioning of their ankle points is outside the court.This approach ensures the accurate extraction of the players' actual ground movement and also helps identify different players.
As mentioned in Section 2.3, since YOLO does not perform well in detecting small objects, this study only uses YOLO to detect the swinging actions of the players.In terms of detecting the trajectory of the shuttlecock, this study adopted another model that aims to detect fast-moving small objects, and this model will be detailed in the next subsection.It is worth mentioning that this study does not track the moving players, so there is no need to apply the object-tracking technique across frames.The reason for using the object-detection model is that this study believes that there should be a large number of positive detections before and after the moment a player hits the shuttlecock.Combined with the trajectory of the shuttlecock, the hit moment of the player can be more accurately determined so that we can obtain precise shots in play.

Shuttlecock Tracking
Shuttlecocks, possessing the world record for the fastest ball type in games, with top speeds reaching 493 km/h, present significant challenges in tracking and capturing via standard optical cameras.The shuttlecock in badminton game videos is often very small and typically appears in a cluttered background, including advertising boards, court lines, Sensors 2024, 24, 4372 10 of 20 and nets.Such a scenario increases the difficulty in accurately locating the shuttlecock within images.Moreover, the shuttlecock is frequently occluded by players' bodies, adding another layer of complexity to the tracking task.To address these challenges, this study adopts the TrackNet model mentioned in Section 2.4.Since this model is specifically designed for tracking the trajectory of fast-moving balls with small volumes, it serves as the backbone of the shuttlecock tracking module in this study.However, after implementing the algorithm of TrackNet, we identified that this shuttlecock-based tracking process can be destabilized when the shuttlecock is missing.Such circumstances may result in errors in determining the actual shots between players' swings (see Figure 7a).To address this, this study proposes a shot refinement algorithm (SRA) that integrates the detection results from the shuttlecock tracking module with the identified players in every hit to ascertain accurate shots (see Figure 7b).The details of SRA will be elaborated in the next subsection.

Shuttlecock Tracking
Shuttlecocks, possessing the world record for the fastest ball type in games, with top speeds reaching 493 km/h, present significant challenges in tracking and capturing via standard optical cameras.The shuttlecock in badminton game videos is often very small and typically appears in a cluttered background, including advertising boards, court lines, and nets.Such a scenario increases the difficulty in accurately locating the shuttlecock within images.Moreover, the shuttlecock is frequently occluded by players' bodies, adding another layer of complexity to the tracking task.To address these challenges, this study adopts the TrackNet model mentioned in Section 2.4.Since this model is specifically designed for tracking the trajectory of fast-moving balls with small volumes, it serves as the backbone of the shuttlecock tracking module in this study.However, after implementing the algorithm of TrackNet, we identified that this shuttlecock-based tracking process can be destabilized when the shuttlecock is missing.Such circumstances may result in errors in determining the actual shots between players' swings (see Figure 7a).To address this, this study proposes a shot refinement algorithm (SRA) that integrates the detection results from the shuttlecock tracking module with the identified players in every hit to ascertain accurate shots (see Figure 7b).The details of SRA will be elaborated in the next subsection.

Shot Refinement Algorithm
As mentioned in the last section, when the shuttlecock is occluded by the player's body or exits the field of view for a significant duration, it could increase the detection errors of the shuttlecock's flight trajectory.Therefore, relying solely on the shuttlecock's flight trajectory to determine the precise timing of a stroke might lead to substantial inaccuracies.To alleviate these unstable detections, this study proposed the shot refinement algorithm.This module incorporates an action-detection model to locate the correct timing of a player's hit.
To train the action-detection model, we collected a set of videos and labeled each player's hit action manually.These videos feature multiple players performing strokes against various backgrounds, taken from official competition videos.The model used in this study was YOLOv7, as described in Section 2.3.A threshold is applied to the

Shot Refinement Algorithm
As mentioned in the last section, when the shuttlecock is occluded by the player's body or exits the field of view for a significant duration, it could increase the detection errors of the shuttlecock's flight trajectory.Therefore, relying solely on the shuttlecock's flight trajectory to determine the precise timing of a stroke might lead to substantial inaccuracies.To alleviate these unstable detections, this study proposed the shot refinement algorithm.This module incorporates an action-detection model to locate the correct timing of a player's hit.
To train the action-detection model, we collected a set of videos and labeled each player's hit action manually.These videos feature multiple players performing strokes against various backgrounds, taken from official competition videos.The model used in this study was YOLOv7, as described in Section 2.3.A threshold is applied to the prediction outputs of the model, where a high confidence score in a specific frame implies that a player is performing a hit (see Figure 8).Notably, both the shuttlecock-tracking and actiondetection modules are susceptible to false alarms or detection misses.The objective of the proposed SRA is to combine the results of both modules, thereby providing the precise identification of the hit moment.
Regarding the shuttlecock-tracking module, we follow the algorithm proposed by TrackNet, which includes the "Peak Identification Method" and "Direction Identification Method", as described in Section 2.5, to detect the sudden direction change of the shuttlecock.Given this method's reliance on the shuttlecock trajectory, the resulting detections are denoted as the Hit Detection by Trajectories (HD-T t ), where t is the frame number.Meanwhile, the detection results derived from the YOLOv7 model yield an alternative series of players' hit actions.As this method depends on the results of detecting the swing action, its detection results are denoted as the Hit Detection by Action (HD-A).An advantage of the action-detection module is that it can identify the player who performed the hit when considering the pose estimation results illustrated in Section 3.1.That is, each HD-A is equipped with the player's id, denoted as HD-A (k) t , where t is the frame number, and k is 1 or 2, representing the player's id.
cock.Given this method's reliance on the shuttlecock trajectory, the resulting detections are denoted as the Hit Detection by Trajectories (- ), where t is the frame number.Meanwhile, the detection results derived from the YOLOv7 model yield an alternative series of players' hit actions.As this method depends on the results of detecting the swing action, its detection results are denoted as the Hit Detection by Action (-).An advantage of the action-detection module is that it can identify the player who performed the hit when considering the pose estimation results illustrated in Section 3.1.That is, each - is equipped with the player's id, denoted as - ( ) , where t is the frame number, and k is 1 or 2, representing the player's id.In badminton, a shot refers to an exchange sequence between two players.That is, two consecutive detected hits should naturally be performed by players in turn.As mentioned, both HD-T and HD-A cannot avoid producing detection misses or false detections.Therefore, the intuition of SRA is to combine the outputs of both modules and produce fine-grained results.For example, given two consecutive HD-Ts, depicted as - and - , if these hits are detected within the same HS, meaning that they are executed by the same player, it suggests a detection error, considered to be a false positive that necessitates corrective action.Conversely, if these hits correspond to different temporal segments, it is regarded as a correct detection, subsequently considering the trajectory from - to - as a shot.Regarding all possible combinations of HD-T and HS, the SRA divides the shots into five distinct scenarios and finds the hit moments (HMs) as follows: This scenario is considered as a correct detection (see Figure 9).Given that a single swing action can only result in one hitting time point, set - as  ( ) , where k is the same as the k of  ( ) .Initially, frames in a rally are annotated based on the HD-A results.Then, contiguous t with the same k are merged to produce Hit Segments (HS).By doing so, a video sequence is partitioned into three types: HS , where i means the sequence of the segment, 1 and 2 indicate which player is executing the hit on the shuttlecock within the specified segment, and None means no player id is assigned for this segment.
In badminton, a shot refers to an exchange sequence between two players.That is, two consecutive detected hits should naturally be performed by players in turn.As mentioned, both HD-T and HD-A cannot avoid producing detection misses or false detections.Therefore, the intuition of SRA is to combine the outputs of both modules and produce fine-grained results.For example, given two consecutive HD-Ts, depicted as HD-T t and HD-T t ′ , if these hits are detected within the same HS, meaning that they are executed by the same player, it suggests a detection error, considered to be a false positive that necessitates corrective action.Conversely, if these hits correspond to different temporal segments, it is regarded as a correct detection, subsequently considering the trajectory from HD-T t to HD-T t ′ as a shot.Regarding all possible combinations of HD-T and HS, the SRA divides the shots into five distinct scenarios and finds the hit moments (HMs) as follows: • Case 1: Only one HD-T in HS (k) i , k = 1 or 2 This scenario is considered as a correct detection (see Figure 9).Given that a single swing action can only result in one hitting time point, set HD-T t as HM This scenario is classified as a false positive (see Figure 10), as it is a time segment that should not contain any hitting action.Consequently, all HD-T occurrences within this segment are eliminated.i , k = None This scenario is classified as a false positive (see Figure 10), as it is a time segment that should not contain any hitting action.Consequently, all HD-T occurrences within this segment are eliminated.Figure 9.An example of case 1.The colored lines represent the extracted trajectories.There is only one HD-T in a hit sequence, so it is considered a true hit moment.

•
Case 2: More than one - in  ( ) , k = None This scenario is classified as a false positive (see Figure 10), as it is a time segment that should not contain any hitting action.Consequently, all HD-T occurrences within this segment are eliminated.Figure 10.An example of case 2. The colored lines represent the extracted trajectories.HD-T occurs in a hit sequence that does not belong to any player, so it is considered a false positive and is removed.

•
Case 3: This scenario represents a concurrent occurrence of correct detection and false positives (see Figure 11), given that only a single hit moment can exist within a swing action.Therefore, the frame with the highest confidence score in  ( ) is initially identified.Subsequently, set the - closest to this frame as  ( ) , while the others are disgarded.The colored lines represent the extracted trajectories.HD-T occurs in a hit sequence that does not belong to any player, so it is considered a false positive and is removed.
This scenario represents a concurrent occurrence of correct detection and false positives (see Figure 11), given that only a single hit moment can exist within a swing action.
Therefore, the frame with the highest confidence score in HS  This scenario is classified as a false positive (see Figure 10), as it is a time segment that should not contain any hitting action.Consequently, all HD-T occurrences within this segment are eliminated.Figure 10.An example of case 2. The colored lines represent the extracted trajectories.HD-T occurs in a hit sequence that does not belong to any player, so it is considered a false positive and is removed.

•
Case 3: This scenario represents a concurrent occurrence of correct detection and false positives (see Figure 11), given that only a single hit moment can exist within a swing action.Therefore, the frame with the highest confidence score in  ( ) is initially identified.Subsequently, set the - closest to this frame as  ( ) , while the others are disgarded.i , k = 1, 2 This instance is classified as a detection miss or false negative because a swing action should result in a hit (see Figure 12).Therefore, within this segment, the frame exhibiting the highest confidence score within HS (k) i is selected as HM This instance is classified as a detection miss or false negative because a swing action should result in a hit (see Figure 12).Therefore, within this segment, the frame exhibiting the highest confidence score within  ( ) is selected as  ( ) to account for the absent HD-T.No HD-T exists in a hit sequence that belongs to a player.In this situation, the frame that provides highest confidence by the detection model is considered the hit moment.
This scenario is categorized as an accurate detection, denoted as a true negative (see Figure 13).During this time segment, the shuttlecock is observed to be in flight while no players are in the process of hitting, consequently resulting in the absence of HD-T.i , k = None This scenario is categorized as an accurate detection, denoted as a true negative (see Figure 13).During this time segment, the shuttlecock is observed to be in flight while no players are in the process of hitting, consequently resulting in the absence of HD-T. Figure 12.An example of case 4. The colored lines represent the extracted trajectories.No HD-T exists in a hit sequence that belongs to a player.In this situation, the frame that provides highest confidence by the detection model is considered the hit moment.

•
Case 5: No - in  ( ) , k = None This scenario is categorized as an accurate detection, denoted as a true negative (see Figure 13).During this time segment, the shuttlecock is observed to be in flight while no players are in the process of hitting, consequently resulting in the absence of HD-T.We summarize the proposed SRA and illustrate its flow in Algorithm 2.

Experimental Results and Discussions
In this section, several experiments are conducted to demonstrate the validity of the proposed approach.All experiments were conducted on a 10th generation Core i9 Intel CPU with 32 GB of RAM and an RTX 3080Ti GPU.The algorithm was implemented using Pyton 3.8 with tensorflow 2.12.

Data Acquisition
This study uses two datasets for experiments.The first dataset was obtained from publicly accessible videos provided by the Badminton World Federation (BWF) on YouTube, comprising 32 matches and 602 rallies across 8975 images.The second dataset was from the public dataset of the National University Competition in Artificial Intelligence, sponsored by the Ministry of Education, Taiwan.This dataset comprised 98,675 images from 354 matches and 6855 rallies.Overall, the datasets included 16 players, with a gender distribution of 7 females and 9 males.Figure 14 shows some sample images from BWF official videos.

Comparison with State of the Art
In this study, we evaluated the shot extraction performance using two methods: the trajectory-based detection, which is the method proposed by TrackNet, and our proposed method, SRA.For the first method, we reimplemented the algorithm and adopted the same threshold setting as the original study.We adopted the temporal Intersection over Union (t-IoU) as the metric (see Equation ( 1)): where i is the ith hit moment, and GT is the groundtruth shots.The terminology 1 ↔ 2 means a transition from different players, indicating that this temporal segment is a shot.Note that the shots are always continuous, so each predicted shot may intersect with two adjacent groundtruth shots.In that case, we only consider the groundtruth shot with a higher IoU.The precision and recall of the model are calculated using the metric depicted in Equations ( 2) and ( 3).

Comparison with State of the Art
In this study, we evaluated the shot extraction performance using two methods: the trajectory-based detection, which is the method proposed by TrackNet, and our proposed method, SRA.For the first method, we reimplemented the algorithm and adopted the same threshold setting as the original study.We adopted the temporal Intersection over Union (t-IoU) as the metric (see Equation ( 1)): where i is the ith hit moment, and GT is the groundtruth shots.The terminology 1 ↔ 2 means a transition from different players, indicating that this temporal segment is a shot.Note that the shots are always continuous, so each predicted shot may intersect with two adjacent groundtruth shots.In that case, we only consider the groundtruth shot with a higher IoU.The precision and recall of the model are calculated using the metric depicted in Equations ( 2) and (3).

Precision = # of true detections all detections (2)
Recall = # of true detections all ground truths (3) A detected shot is considered as a true detection if t-IoU is above a threshold.As shown in Table 1, the trajectory-based method only yielded a precision of 0.588, a recall rate of 0.936, and an F1 score of 0.723 when the IoU threshold was set to 0.5.In contrast, the proposed SRA resulted in a significant improvement in extracting accurate shots, with the precision increasing to 0.843, a slightly lower recall rate of 0.882, and the F1 score reaching 0.862.No matter the chosen of threshold, the proposed SRA exhibits better results than the TrackNet method.The primary objective of the algorithm is to eliminate false positives generated by HD-T with the assistance of HS.We observe that the HD-T results align more closely with the true hit moment compared to those obtained from HS.Therefore, the detection results from case 2 and 3 in SRA are likely more precise than those from case 4 in SRA.In order to minimize the occurrence of case 4, we attempted to increase the number of detections of the HD-T model.Although this tuning decreased the overall performance of HD-T, the final results can be recovered by the process in case 2 and 3 in SRA.To verify this approach, we conducted a comparative analysis of two combinations: HD-T (TrackNet) and HD-T (TrackNet) + SRA, as well as HD-T (tuned) and HD-T (tuned) + SRA.As shown in Table 1, the precision of the HD-T (tuned) decreased to 0.524, and the recall also decreased to 0.897, but with the help of SRA, the precision bounced back to 0.897, reflecting an increase of 37.3%.Additionally, both the recall rate and the F1 score exhibited remarkable improvements.The results demonstrate that the SRA performance can be further improved with a tuned HD-T.Because the proposed method includes an additional YOLO model, the overall processing time is increased.Even though the processing time of SRA is longer than using TrackNet alone, we believe this trade-off is worthwhile.
Figure 15 allows us to clearly see that SRA can successfully eliminate false positives generated by HD-T.As shown in the first row of Figure 15, HD-T produced a false positive due to an error in detecting the shuttlecock's trajectory (indicated by the red segment).By using SRA, we can confirm that no other player was performing a hitting action during that time, thereby successfully eliminating that false positive.

Shot Type Classification
An advantage of recognizing the shuttlecock's trajectory is that it can be used to distinguish the type of shots.The analysis of shot types is closely related to classifying a player's attack type.Therefore, this study also examined the effect of SRA on recognizing different types of shots.
In this study, shots are categorized into seven primary types: clear, drive, smash, net, drop, lift, and push.Notably, this study only considers a few features but demonstrates the superiority of the proposed method.The seven-dimensional features used in this model are as follows:

•
Both players' ankle positions (x1, y1, x2, y2) at the hit moment, which is obtained by the method proposed in Section 3.1.

•
Flight duration of the shuttlecock, denoted as T in regard to the number of frames • The displacement of the shuttlecock (Δx, Δy), quantified as the difference in its positions from the start to the end of the shot.
This study does not consider the shuttlecock positions in every frame due to the occurrence of detection misses.However, as a complete shot can still be obtained, the shuttlecock displacement can be considered reliable.These features provide an enhanced understanding of the dynamics of the game, thereby enabling accurate classification of shot types and differentiation of playing styles.Limited by the number of features and samples available for training, this study employs the XGBoost model for shot classification.The results are presented in Table 2.In Table 2, we can see that the proposed SRA method achieves 72.1% accuracy when the IoU threshold is set to 0.95.When the threshold is loosened to 0.5, the classification accuracy drops to 54.1% but still outperforms TrackNet.The degradation happens because all features heavily rely on accurate hit moments.A looser threshold implies less precise shots.However, these results still demonstrate that SRA has a better ability to extract accurate shots compared to TrackNet.As shown in Figure 16, it is notable to acknowledge that some shot types, such as drives and smashes, exhibit similar feature values, so the classification accuracy is not as good as that of other types of shots.

Shot Type Classification
An advantage of recognizing the shuttlecock's trajectory is that it can be used to distinguish the type of shots.The analysis of shot types is closely related to classifying a player's attack type.Therefore, this study also examined the effect of SRA on recognizing different types of shots.
In this study, shots are categorized into seven primary types: clear, drive, smash, net, drop, lift, and push.Notably, this study only considers a few features but demonstrates the superiority of the proposed method.The seven-dimensional features used in this model are as follows:

•
Both players' ankle positions (x 1 , y 1 , x 2 , y 2 ) at the hit moment, which is obtained by the method proposed in Section 3.1.

•
Flight duration of the shuttlecock, denoted as T in regard to the number of frames.• The displacement of the shuttlecock (∆x, ∆y), quantified as the difference in its positions from the start to the end of the shot.
This study does not consider the shuttlecock positions in every frame due to the occurrence of detection misses.However, as a complete shot can still be obtained, the shuttlecock displacement can be considered reliable.These features provide an enhanced understanding of the dynamics of the game, thereby enabling accurate classification of shot types and differentiation of playing styles.Limited by the number of features and samples available for training, this study employs the XGBoost model for shot classification.The results are presented in Table 2.In Table 2, we can see that the proposed SRA method achieves 72.1% accuracy when the IoU threshold is set to 0.95.When the threshold is loosened to 0.5, the classification accuracy drops to 54.1% but still outperforms TrackNet.The degradation happens because all features heavily rely on accurate hit moments.A looser threshold implies less precise shots.However, these results still demonstrate that SRA has a better ability to extract accurate shots compared to TrackNet.As shown in Figure 16, it is notable to acknowledge that some shot types, such as drives and smashes, exhibit similar feature values, so the classification accuracy is not as good as that of other types of shots.

Conclusions
The advancement of an automated sports data analysis system signifies a considerable progression in improving players' skills and the refinement of coaching methodologies.In this study, videos captured from a monocular camera are employed for analyzing badminton games, thereby alleviating the necessity for players to wear any additional equipment for data collection.
Analyzing badminton shot types is beneficial in planning player's strategies.The correctness of shot classification relies on detecting the precise shot trajectories.Previous studies have proposed shuttlecock-tracking technology to extract the trajectory of the shuttlecock and identify each shot based on trajectory alterations.Building upon their efforts, this study proposes an enhancement approach by combining the results of both shuttlecock trajectories, action detection, and pose estimation for refined hit detection.The proposed algorithm leverages the results of TrackNet, with a YOLO-based hit detection model, to mitigate the occurrence of false positives and false negatives generated during the detection process.The experimental results demonstrate that the proposed method can more accurately determine each shot, and the results benefit subsequent advanced analyses, such as shot type classification.These precise results can be further utilized to analyze more abstract concepts, such as a player's hitting style or tactical flaws in game-

Conclusions
The advancement of an automated sports data analysis system signifies a considerable progression in improving players' skills and the refinement of coaching methodologies.In this study, videos captured from a monocular camera are employed for analyzing badminton games, thereby alleviating the necessity for players to wear any additional equipment for data collection.
Analyzing badminton shot types is beneficial in planning player's strategies.The correctness of shot classification relies on detecting the precise shot trajectories.Previous studies have proposed shuttlecock-tracking technology to extract the trajectory of the shuttlecock and identify each shot based on trajectory alterations.Building upon their efforts, this study proposes an enhancement approach by combining the results of both shuttlecock trajectories, action detection, and pose estimation for refined hit detection.The proposed algorithm leverages the results of TrackNet, with a YOLO-based hit detection model, to mitigate the occurrence of false positives and false negatives generated during the detection process.The experimental results demonstrate that the proposed method can more accurately determine each shot, and the results benefit subsequent advanced analyses, such as shot type classification.These precise results can be further utilized to analyze more abstract concepts, such as a player's hitting style or tactical flaws in gameplay.
Nevertheless, because only a monocular camera is utilized as the input, the proposed system is constrained by a single perspective, making it difficult to correctly project 2D image coordinates back to 3D real coordinates.This may potentially affect the accuracy of ankle coordinates and lead to misjudgments of the player's position, thereby impacting the analysis of shot types and player movement direction.For example, when the player jumps and performs a power strike, the results of the inferred ankle position are incorrect.

Figure 4 .
Figure 4. Example of the shuttlecock trajectories during a play.Different colors represent different shots.The black dots imply the hit moments.

Figure 4 .
Figure 4. Example of the shuttlecock trajectories during a play.Different colors represent different shots.The black dots imply the hit moments.

Figure 6 .
Figure 6.(a) Extracted results using Openpose; (b) Results of perspective transformation.Only players' ankle points are pertained.The red dots is the left ankle points and the blue dots are the right ankle points.

Figure 6 .
Figure 6.(a) Extracted results using Openpose; (b) Results of perspective transformation.Only players' ankle points are pertained.The red dots is the left ankle points and the blue dots are the right ankle points.

Figure 7 .
Figure 7. Visual comparison between TrackNet and the proposed SRA.The colored lines represent the extracted trajectories.(a) A wrong shot extraction due to the miss of shuttlecock detection; (b) The proposed SRA, which can extract correct shots under severe detection miss; (c) The groundtruth trajectories, which contain three shots marked in different colors.

Figure 7 .
Figure 7. Visual comparison between TrackNet and the proposed SRA.The colored lines represent the extracted trajectories.(a) A wrong shot extraction due to the miss of shuttlecock detection; (b) The proposed SRA, which can extract correct shots under severe detection miss; (c) The groundtruth trajectories, which contain three shots marked in different colors.

Figure 8 .
Figure 8.An example of hit-detection result.Initially, frames in a rally are annotated based on the - results.Then, contiguous - ( ) with the same k are merged to produce Hit Segments (HS).By doing so, a video sequence is partitioned into three types:  ( ) ,  ( ) ,  (

Figure 8 .
Figure 8.An example of hit-detection result.

21 Figure 9 .
Figure 9.An example of case 1.The colored lines represent the extracted trajectories.There is only one HD-T in a hit sequence, so it is considered a true hit moment.• Case 2: More than one - in  ( ) , k = None

Figure 9 .
Figure 9.An example of case 1.The colored lines represent the extracted trajectories.There is only one HD-T in a hit sequence, so it is considered a true hit moment.

• Case 2 :
More than one HD-T in HS (k)

Figure 11 . 2 Figure 10 .
Figure 11.An example of case 3. The colored lines represent the extracted trajectories.Multiple HD-Ts occurr in a hit sequence which belongs to a player, so only one HD-T is retained.• Case 4: No - in  ( ) , k = 1, 2 the HD-T t closest to this frame as HM (k) t , while the others are disgarded.

Figure 9 .• Case 2 :
Figure 9.An example of case 1.The colored lines represent the extracted trajectories.There is only one HD-T in a hit sequence, so it is considered a true hit moment.• Case 2: More than one - in  ( ) , k = None

Figure 11 . 2 Figure 11 .
Figure 11.An example of case 3. The colored lines represent the extracted trajectories.Multiple HD-Ts occurr in a hit sequence which belongs to a player, so only one HD-T is retained.• Case 4: No - in  ( ) , k = 1, 2 Figure 11.An example of case 3. The colored lines represent the extracted trajectories.Multiple HD-Ts occurr in a hit sequence which belongs to a player, so only one HD-T is retained.• Case 4: No HD-T in HS (k)

t
to account for the absent HD-T.Sensors 2024, 24, x FOR PEER REVIEW 13 of 21

Figure 12 .
Figure 12.An example of case 4. The colored lines represent the extracted trajectories.No HD-T exists in a hit sequence that belongs to a player.In this situation, the frame that provides highest confidence by the detection model is considered the hit moment.

Figure 12 .
Figure 12.An example of case 4. The colored lines represent the extracted trajectories.No HD-T exists in a hit sequence that belongs to a player.In this situation, the frame that provides highest confidence by the detection model is considered the hit moment.

• Case 5 :
No HD-T in HS (k)

Figure 13 .
Figure 13.An example of case 5.No HD-T exists in a hit sequence that does not belong to any player.This situation is considered true negative.We summarize the proposed SRA and illustrate its flow in Algorithm 2. Algorithm 2: Shot Refinement Algorithm Input: HD-T, HS Output: True Hit Moment (THM)∶=  ( )

Algorithm 2 :i
Shot Refinement Algorithm Input: HD-T, HS Output: True Hit Moment (THM):= {HM (p) t } Initialize: i ← 0 (current HS sequence) while i < total HS sequence do p ← k of HS (k) i ; switch condition do case only one HD-T in HS (k) i , k = 1 or 2 do t ← t of HD-T t ; add HM (p) t to THM; end case more than one HD-T in HS (k) i , k = None do THM is not assigned; end case multiple HD-T in HS (k) i , k = 1 or 2 do t ← the frame of the player with the highest confidence in HD-T t ; add HM (p) t to THM; end case no HD-T in HS (k) i , k = 1 or 2 do t ← the frame with the highest confidence score in HS , k = None do do nothing; end end i ← i + 1; end This study includes a comparison of the accuracy of different methods to verify the effectiveness of the proposed SRA.The experimental results are discussed in the next section.

Figure 14 .
Figure 14.Some sample images from BWF official videos.

Figure 15 .
Figure 15.Trajectory-extraction results.Each column represents the extraction results of a method: (a) HD-T; (b) SRA; (c) Groundtruth.Black dots represent the hit moment and the colored lines represent the extracted trajectories.The proposed SRA can effectively address the false positives produced by HD-T.

Figure 15 .
Figure 15.Trajectory-extraction results.Each column represents the extraction results of a method: (a) HD-T; (b) SRA; (c) Groundtruth.Black dots represent the hit moment and the colored lines represent the extracted trajectories.The proposed SRA can effectively address the false positives produced by HD-T.

Figure 16 .
Figure 16.Comparison of shot-type classification results when IoU threshold is set to 0.5.A darker color represents a higher value.(a) TrackNet; (b) SRA.

Figure 16 .
Figure 16.Comparison of shot-type classification results when IoU threshold is set to 0.5.A darker color represents a higher value.(a) TrackNet; (b) SRA.
24nsors 2024,24, x FOR PEER REVIEW 4 of 21 and provides high-quality visual training tasks to expedite the player's development process.By aligning the court situation with the player's eye gaze, the system can analyze vision-related behaviors to evaluate the effectiveness of the training.

Table 1 .
Comparison between the proposed SRA and TrackNet.Numbers in bold represent the highest value (higher is better).

Table 2 .
Shot-type classification accuracy with different IoU thresholds.

Table 2 .
Shot-type classification accuracy with different IoU thresholds.