Real-Time Collection Method of Athletes’ Abnormal Training Data Based on Machine Learning

Real-time collection of athletes’ abnormal training data can improve the training effect of athletes.(is paper studies the real-time collection method of athletes’ abnormal training data based on machine learning. (e main motivation of this paper is to collect the athletes’ abnormal training data in time, which can help to evaluate and improve the training effect. Four sensor nodes are arranged in the upper and lower limbs of athletes to collect the angular velocity, acceleration, and magnetic field strength data of athletes in training state. (e data are sent to the data transmission base station through wireless sensors, and the data transmission base station transmits the data to the data processing terminal.(e data processing terminal calculates the difference between the sample values of each sensor to obtain the data dispersion of each sensor. (e features of each dimension data in a time domain and frequency domain are obtained by using the dispersion degree to construct 32-dimensional feature vectors, and the extracted feature vectors are input into the hidden Markov model. (e forward algorithm is used to obtain the probability of the final observation sequence, so as to realize the final collection of athletes’ abnormal training data. (e experimental results show that the accuracy and recall rate of the abnormal data collected by this method is higher than 98%, which requires less time.


Introduction
In competitive sports, the ultimate goal of sports training is to create excellent sports performance. e daily training of athletes is the most basic and controllable factor to improve competitive ability. Athletes' daily training is the basic way for coaches to understand athletes' sports conditions. Coaches need to analyze the athletes' sports situation [1], make clear the training situation of each athlete, evaluate the athletes' training status according to their own experience, and formulate corresponding training programs to further guide the training and improve the athletes' sports performance. With the accumulation of athletes' training data, it has become more and more difficult to manage and analyze these data by manual processing. Using the traditional data processing and database management function of a computer, it can solve the problem of athletes' training management, help coaches to manage athletes, convert sports performance, manage historical data, and improve the efficiency of data processing [2,3]. e traditional data analysis and processing methods only analyze the local or surface characteristics of the data and cannot get the description of the overall characteristics of the data which are hidden behind the data and the prediction of its development trend. e collection of athletes' abnormal training data focuses on the important information hidden behind the data. Data mining technology can extract valuable, unknown, hidden, and potentially useful knowledge from a large number of original data.
In the process of athletes' training and competition, coaches need to make corresponding training plans according to different athletes' individual conditions in order to improve athletes' sports levels.
e traditional training method is that coaches make training plans according to their own training theory and experience, combined with the skill level of athletes. is training mode is highly subjective [4]. Coaches need to spend a lot of time analyzing athletes' posture, and it is difficult to objectively evaluate the training effect of athletes. e core of modern sports training is accuracy and efficiency. If the coach can accurately control the abnormal data of training, it can greatly improve the effect of sports training. It is a new research direction to collect and analyze athletes' training data and determine athletes' abnormal training data, which is of great significance for improving the scientificity of coaches' training plans and improving athletes' training effects.
Artificial intelligence is a new comprehensive subject that is developed from computer science, cybernetics, information theory, and other disciplines; it is a science about understanding the internal mechanism of human intelligence and realizing it on the machine. Machine learning is the core content of artificial intelligence research [5][6][7]. It has been applied in all branches of artificial intelligence, such as natural language understanding, pattern recognition, computer vision, intelligent robot, and other fields [8]. As early as the 1950s, machine learning-related research began, mainly focusing on the connectionist learning of neural networks. From the 1950s to the 1970s, artificial intelligence research was in the "reasoning period," but with the development of research, it shows that machines with only logical reasoning ability cannot reach artificial intelligence [9]. In the 1980s, machine learning became an independent discipline and began to develop rapidly. Michalski et al. divided machine learning research into "learning in problem solving and planning" and "learning from instructions," and so on. Feigenbaum divided machine learning technology into four categories in his famous Manual of Artificial Intelligence, namely, "mechanical learning," "teaching-learning," "analogical learning," and "inductive learning". In the 21st century, machine learning has been applied in various fields. iFLYTEK's powerful real-time speech recognition technology and today's headline intelligent news recommendation system are all products of the rapid development of machine learning.
ere are many literatures of the evaluation method for athletes based on benefit evaluation theory and regression analysis method which is proposed in reference [10]. e benefit evaluation theory and regression analysis method are used to evaluate the process of athletes' safety assessment, and the fusion analysis method is used to realize the monitoring and evaluation of physiological indicators. However, this method has poor adaptability to the safety assessment of athlete training and has a large time expenditure. In reference [11], an athlete training safety evaluation model based on big data fusion feature analysis is proposed. e model of integrated information statistics of athlete training safety is constructed and the method of fuzzy association rule scheduling is used to evaluate the safety of athlete training. However, the method carries out the evaluation of the safety of athlete training with a large amount of calculation and the anti-interference is not good. In reference [12], the evaluation method of athlete training safety based on rough set evaluation is proposed.
For the action recognition of human daily behavior, inertial sensors such as gyroscope and accelerometer are mainly used for algorithm classification and pattern recognition of daily behavior such as standing, walking, running, and lying [13]. Wang  e improved algorithm has a high recognition rate; Atalla used wearable sensors to identify the daily behaviors with different complexity and explored the accuracy of different sensor installation positions under different complexity actions through many experiments. Some researchers tried to use sensors for sports data monitoring and technical evaluation to achieve the effect of auxiliary training. e sensor of human body was used to detect the behavior characteristics of athletes, such as body posture, movement range, and speed, and based on the analysis and mining of athletes' behavior data, the technical loopholes were found out to help athletes to improve their technical level; Qaisar et al. used multiple acceleration sensors and gyroscopes to identify a variety of different bowling movements, analyze the technical level of action quality by qualitative and quantitative analysis, and make technical evaluation and feedback of bowling posture in bowling training teaching; King et al. designed golf clubs with embedded acceleration sensors. rough receiving data and calculating important parameters related to swing, such as golf club top position, speed, and direction, in the golf training of athletes or amateurs, the data can be analyzed to feed back the quality of users' swing, so as to achieve the effect of intelligent training. e basic idea of inertial sensor recognition is that athletes wear simple and light data collection sensors and send the collected data to the processing terminal [14] in real time to identify the athletes' posture according to various posture data. is method can make up for the lack of image collection and recognition, has low requirements for the use environment and high recognition efficiency, and has become a hot research method of motion attitude recognition. Abnormal data's real-time collection is an important branch of pattern recognition, which has been widely concerned and developed in recent years. With the rapid development of microelectronics technology, the use of inertial sensors to identify human posture has become a research hotspot. Many researchers apply wearable devices to human auto disturbance recognition. Sensors are used to collect human acceleration, angular velocity, body temperature, heart rate, and much other information. Using the collected information to extract time domain space and frequency space characteristics of athletes' training actions is convenient to analyze athletes' abnormal training data [15]. e feature extraction can analyze the athlete's unit action and transfer the relevant attribute features as sample data to the machine classifier to realize the abnormal data division. e contributions of this paper are summarized as follows: is paper studies the real-time collection method of athletes' abnormal training data based on machine learning. e sensor is used to collect athlete training data, and the features used for real-time collection of athletes' abnormal training data are extracted from time domain and frequency domain, respectively. e extracted features are used to accurately collect athletes' abnormal training data using the hidden Markov model in machine learning. e experimental results verify the effectiveness of this method in realtime collection of athletes' abnormal training data.
is paper is organized as follows. Section 2 presents the materials and methods. In Section 3, experimental results are presented and analyzed. Finally, Section 4 sums up some conclusions and gives some suggestions as the future research topics.

Collection of Sensor Signal.
It is the basic condition to collect the abnormal data of athletes' training accurately to collect the data of human movement posture. In this system, there are four parts, which are sensors, transceiver, processor, and the power supply. e inertial sensor is used to collect human motion posture data. rough the magnetic sensor, angular velocity sensor, and acceleration sensor fixed on the athlete's body, the data related to human movement is collected [16], and the collected data are transmitted to the terminal processing device for posture recognition through a wireless sensor network. e power supply can provide power to the system. Data quality is the key to affect the accuracy of abnormal data collection in athletes' training. e hardware structure of collecting athletes' abnormal training data is shown in Figure 1. e hardware structure mainly contains data collection and data transmission, including four data collection nodes and one data transmission base station. e data collection node is composed of three-axis gyroscope MPU3050M, three-axis accelerometer, and magnetometer LSM303DLH, which collect the angular velocity and acceleration data of human body, respectively. e core component of the data sending base station is the wireless transceiver nRF24L01. e receiving node collects data and sends it to the data terminal through the wireless network. e core processing function of data collection module is completed by 32-bit ARM microcontroller STM32F103. e energy supply of the data collection module is provided by a 3.7 V lithium-ion battery.
e data collection signal transmission includes two parts: one is that the sensor node sends the collected human posture data to the data transmission base station; the other is that the data transmission base station sends the data to the processing terminal. e signal transmission between the sensor node and data transmission base station is based on a wireless sensor network. e problem to be overcome is to reduce the data collision rate as far as possible [17], reduce the data loss, and improve the accuracy of data collection. e signal transmission between the data transmission base station and the processing terminal is based on the star topology network, using the time-division multiplexing protocol [18]. It is necessary to calibrate the clock deviation between different nodes to keep the time uniform.
In order to accurately collect the abnormal data of athletes' training, it is necessary to accurately grasp the movement posture data of athletes' upper and lower limbs. e sensor layout and data collection topology are shown in Figure 2.
Four sensor nodes are used to collect the angular velocity, acceleration, and magnetic field strength data of the upper and lower limbs of the athletes, and the data are sent to the data transmission base station through wireless sensors. e data transmission base station transmits the data to the data processing terminal.

Feature Extraction of Athletes' Training Data.
After collecting the data of human motion posture, the training data of athletes are divided firstly, and the features of training data are extracted by using the divided data. e extracted features of training data of athletes are sent to the machine learning classifier to realize the collection of abnormal training data of athletes.

Division of Athletes' Training Data.
e degree of dispersion is the difference between the values of the observed variables, and the difference between the sample values of the sensor signal is defined as the degree of dispersion. Taking the angular velocity as an example, ω x n represents the x-axis angular velocity data at the time n, ω x n−1 represents the x-axis angular velocity data at the time n-1, and d x n represents the angular velocity difference between the x-axis angular velocity of the sensor at the time n and the previous time. e formula of dispersion d x n can be obtained as follows: e movement data include angular velocity data and acceleration data [19]. In order to realize the accurate division of athletes' training data, it is necessary to comprehensively consider the characteristics of each sensor data [20]. n are used to represent the dispersion of acceleration and angular velocity of each axis, respectively. en, D a n and D g n are obtained as follows: (2) In the static state, the dispersion of acceleration and angular velocity are kept below the threshold λ a and λ g , respectively; in the moving state, the sensor data change rapidly with the athletes' actions [21], and the dispersion can reflect the difference degree of the sensor data, so according to the characteristics of the dispersion, the athletes' moving state can be divided. c n is used to represent the state of the athlete's limbs at the n-th moment; when c n is 0, it means the static state, and when c n is 1, it means the moving state. e formula is as follows: c n � 0, D a n < λ a and D g n < λ g , 1, D a n ≥ λ a or D g n ≥ λ g .
e data dispersion of each sensor is calculated, and the athletes' movement states can be divided by the threshold.

Extraction of Training Data.
After data division, the unit action data composed of acceleration and angular velocity are obtained. Acceleration vector sum and angular velocity vector sum are represented by a n and g n , respectively. e formula is as follows: (4) e three-axis acceleration, three-axis angular velocity, combined acceleration, and combined angular velocity form an 8-dimensional vector, and N is used to represent the number of sampling points in each unit action, so there are N sampling data in each dimension of the vector. If each unit action is taken as a sample, then each sample is an N × 8-dimensional matrix. e data features of each dimension of each sample are calculated [22], and the extracted signal features include time domain features and frequency domain features. Time domain features include mean value and variance. μ a and δ 2 are used to represent the mean value and variance of some component of the increment speed of unit action, respectively, and the formula can be obtained as follows: where a is a component of the acceleration. e frequency domain features include the peak value of discrete Fourier transform and its corresponding frequency [23]. e discrete Fourier transform method is used to transform the signal from time domain to frequency domain.
e Fourier transform result of the n-th sampling point is represented by S DFT (n), and the imaginary number unit is represented by j. e formula is as follows: According to the results of Fourier transform, the peak value S DFT (K) is obtained. If the sampling point corresponding to the peak value of Fourier transform is K, the corresponding frequency f formula of Fourier transform is as follows: where f s is the sampling frequency of the sensor. e features of each dimension data in time domain and frequency domain are obtained by feature calculation, and a 32-dimensional feature vector is constructed.   e hidden Markov model is a probability model about time series, which describes the process of generating unobservable state random sequence randomly from a hidden Markov chain and then generating an observation random sequence from each state [24]. e hidden Markov model is determined by the initial probability distribution π, the state probability distribution A, and the observation probability distribution B.

Collection of Athletes' Abnormal Training Data
Let Q � q 1 , q 2 , . . . , q N be the set of all possible states and V � v 1 , v 2  e formula of state transition probability matrix A is as follows: where a ij � P(i t+1 � q j | i t � q i ), i � 1, 2, . . . , N and j � 1, 2, . . . , N, and a ij is the probability of transition from state q i at time t to state q j at time t + 1. e formula of the observed probability matrix B can be obtained as follows: N, and b j (k) is the probability of generating observation v k when it is in state q j at time t. π is the initial state probability vector; π i � P(i 1 � q i ) is the probability that t � 1 is in state q i .
Hidden Markov model λ can be represented by the following symbols, namely, λ � (A, B, π).
In formula (10) Firstly, forward probability is defined. In the model λ � (A, B, π), the probability that part of the observation sequence at time t is o 1 , o 2 , . . . , o t and the state is q i is defined as the forward probability, which is denoted as e process of obtaining the observation sequence probability P(O | λ) is as follows: (1) Initial value (2) For t � 1, 2, . . . , T − 1, the formula is as follows: (3) End e forward algorithm is to calculate the probability P(O | λ) of observation sequence through the known hidden Markov model λ � (A, B, π) and observation sequence So, the probability of the final observation sequence is . e efficiency of the forward algorithm is to recursively deduce the forward probability to the global by using the path structure of the model to get the final probability P(O | λ). Each recursion directly refers to the calculation result of the previous time [22], which avoids repeated calculation and reduces the time complexity of the algorithm from O(TN T ) to O(N 2 T).

Results
In order to verify the effectiveness of the real-time collection method for abnormal data of athletes' training, 8 basketball players of sports major in a university are selected as the experimental objects, and 9 training movements including walking, running, jumping when there is no ball, standing dribbling, walking dribbling, running dribbling, shooting, passing, and catching when holding the ball are made. e sensors are placed on the upper and lower limbs of the athletes. A total of 12000 samples are collected, including 6000 upper limb movements and 6000 lower limb movements of standing dribble, walking dribble, shooting, passing, catching, and running dribble. During the sample collection process, the subjects completed the training according to the regulations. e specific contents of the samples are shown in Table 1. Considering that the real-time collection of athletes' abnormal training data is a binary classification problem, the collection accuracy, collection recall rate, F1 value, mean Mobile Information Systems square error, and AUC (area under the curve) value are used to measure the real-time collection performance of athletes' abnormal training data. e accuracy of collection indicates the proportion between the correct collection of abnormal data instances and all instances assigned to the class by the classifier. Recall rate represents the proportion of instances in a given category correctly classified by the machine learning classifier. e F1 value is a harmonic average of precision and recall. AUC is the probability that the positive instance selected randomly by the classifier is higher than the negative instance selected randomly (assuming that "positive" is higher than "negative"). When AUC is close to 1, it means that the collection accuracy is higher. When AUC is close to 0.5, it means that machine learning is a random classification condition, and the collection accuracy is poor. BP neural network and support vector machine are selected as comparison methods, and the above two methods are compared with the method in this paper. ree methods are used to collect the peak signal-tonoise ratio of athletes' training data sample; the comparison results are shown in Table 2. e BP neural network method used in the experiment is proposed in reference [25]. e support vector machine method used in the experiment is proposed in reference [26]. e experimental results in Table 2 show that the peak signal-to-noise ratio of the sample signals collected by the proposed method is higher than 30 dB; the peak signal-tonoise ratio of the sample signals collected by BP neural network method is less than 30 dB; the peak signal-to-noise ratio of the sample signals collected by support vector machine method is less than 29 dB. e experimental results show that the quality of the action signal collected by the proposed method is significantly higher than that of the other two methods. e high quality of the signal collected by this method helps to accurately extract the training characteristics of athletes and provides a theoretical basis for the accurate collection of abnormal data of athletes' training. e comparison results of the accuracy of collecting athletes' abnormal training data by three methods are shown in Figure 3. e experimental results in Figure 3 show that the accuracy of the proposed method is significantly higher than that of the other two methods. e accuracy of the abnormal data collected by the proposed method is higher than 98.5%; the accuracy of the abnormal data collected by BP neural network method and support vector machine method is lower than 96%. e performance of support vector machine method is the worst. According to Figure 3, the accuracy of abnormal data collection in athlete training of the method proposed in this paper is 4.3% higher than the BP neural network method on average and is 6.5% higher than the support vector machine method. So, we can see that the proposed method has high accuracy of abnormal data collection in athlete training and can be applied to the abnormal data collection in actual athlete training. e comparison results of recall rate of athletes' abnormal training data collected by three methods are shown in Figure 4.
As can be seen from the experimental results in Figure 4, the recall rate of abnormal data collected by the proposed method is significantly higher than that of the other two methods. e recall rate of abnormal training data collected by the proposed method is higher than 98%; the recall rate of abnormal training data collected by BP neural network method and support vector machine method is lower than 97%. e performance of support vector machine method is the worst. According to Figure 4, the recall rate of abnormal data collection in athlete training of the method proposed in this paper is 4.7% higher than the BP neural network method on average and is 6.1% higher than the support vector machine method. So, we can see that the proposed method has high recall rate of abnormal data collection and superior collection performance. e comparison results of F1 value of athletes' abnormal training data collected by three methods are shown in Table 3.
F1 value is an important index to measure the accuracy and recall rate of abnormal data collection. e closer the F1 value of abnormal data collection is to 1, the better the collection performance is. e experimental results in Table 3 show that the F1 value of athletes' abnormal training data collected by the proposed method is significantly higher than that collected by the other two methods. e F1 value of abnormal data collected by the proposed method is higher than 0.93, very close to 1; the F1 value of abnormal data collected by BP neural network method and support vector machine method is lower than 0.86. e results show that the proposed has high accuracy, high recall, and high reliability, which can provide a theoretical basis for coaches to make training plans. e comparison results of mean square error (MSE) of athletes' abnormal training data collected by three methods are shown in Figure 5.
Experimental results in Figure 5 show that the mean square error of the abnormal data collected by the proposed method is significantly lower than that of the other two methods. e results show that the mean square error of abnormal data collected by the proposed method is lower than 0.04; the mean square error of abnormal data collected by BP neural network method and support vector machine method is higher than 0.05. e results show that the proposed method has low mean square error of abnormal data collection and high reliability. Mobile Information Systems AUC comparison results of athletes' abnormal training data collected by three methods are shown in Figure 6.
As can be seen from the experimental results in Figure 6, the AUC value of abnormal data collected by the proposed method is very close to 1, while the AUC value of abnormal data collected by BP neural network method and support vector machine method is very close to 0.5. e results show that the accuracy of the proposed method to collect abnormal data of athletes training is high, and the method of BP neural network and support vector machine to collect abnormal data of athletes training is mostly random classification. e method of this paper to collect abnormal data of athletes training has high accuracy and high reliability. e abovementioned experimental results effectively verify that the method in this paper has high accuracy in collecting abnormal data of athletes' training and has high performance in collecting abnormal data of athletes' training, which can be applied to the practical application of athletes' training. In order to further verify the real-time ability of collecting abnormal training data of athletes by this method, the proposed method is used to collect athletes'   Table 4. Table 4 shows that the time cost of using the proposed method to collect athletes' abnormal training data is the lowest under different sample numbers. e comparison results show that the proposed method can quickly collect athletes' abnormal training data in a short time. is method not only has high accuracy but also needs less time to collect abnormal data. It can quickly obtain accurate abnormal data and has high practicability.
In this paper, we propose a real-time collection method of athletes' abnormal training data based on machine learning. According to [27], this paper proposed a HMM-based asynchronous H ∞ filtering for fuzzy singular Markovian switching systems with retarded time-varying delays. e computational complexity of the method proposed by our paper mainly depends on the characteristics of the training network, while the computational complexity of the method proposed in [27] is mainly dependent on the HMM method. So, we can draw that the performance of the method proposed by us is much better.

Conclusions
With the development of wireless sensor network and microelectronic equipment technology, athletes' abnormal training data collection has been widely concerned in various fields. e sensor equipment is used to collect the athletes' upper and lower limb movement state signals to extract the athletes' training characteristics, and the hidden Markov model is used in machine learning to complete the effective collection of athletes' abnormal training data. Selecting basketball players as the experimental object, the abnormal data collection of athletes' training is realized in the field of basketball. e experimental results effectively verify that the method is highly effective in collecting abnormal data of athletes' training. e research results provide a new collection scheme for abnormal data of sports training. e collected dataset will be made publicly available by the other researchers. For this new system, when we use it, it has a big structure size, which is not very easy to carry, so in order to facilitate large-scale use, its size must be reduced. In addition, we can speed up data processing.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares no conflicts of interest.