Analysis of an Infectious Disease Vaccination Prediction System Based on the MF-Conv LSTM Model

Infectious diseases can seriously threaten people's life safety and have a serious impact on social stability. Therefore, it should improve society’s stability under infectious diseases and ensure the safety of people's lives. A personnel flow feature extraction model based on Multi-Feature Convolutional Long Short-Term Memory (MF-Conv LSTM) is designed based on the characteristics of human daily activity behavior. This can optimize the accuracy of transmission simulation prediction for infectious disease vaccination. When using multi-feature ensemble analysis to extract human daily activity features as input for infectious disease simulation and prediction models, the learner's prediction score for the recurrent infectious diseases reached 0.8705. When using multi-feature ensemble analysis, the predicted scores did not exceed 0.85. The designed infectious disease vaccine transmission prediction model can accurately simulate the infectious behavior of infectious diseases. This provides direction for developing strategies to disrupt the infectious diseases’ spread. This reduces the infectious diseases’ harm to people's personal safety and improves social stability during the spread of large-scale infectious diseases.


Introduction
Infectious diseases are caused by pathogens and can be transmitted between humans or animals.The main transmission routes include direct contact, airborne transmission, food and water transmission, as well as through vector organisms such as mosquito bites [1][2].Once pathogens enter the host's body, they can rapidly reproduce and trigger an immune response, leading to varying clinical symptoms, ranging from mild discomfort to severe illness and even death.The harm of infectious diseases is widespread and far-reaching, not only threatening individual health, but also potentially causing social panic, economic fluctuations, and even affecting national security [3][4].Large scale infectious diseases, such as the COVID-19 pandemic that once broke out all over the world, pose enormous pressure on the global health system, leading to a shortage of medical resources.Meanwhile, this causes a significant blow to economic activities, affecting industries such as international trade and tourism [5].In addition, infectious diseases may exacerbate social inequality and cause greater harm to vulnerable groups.Timely and effective prevention and control measures such as vaccination, public health education, disease monitoring, and rapid response systems can reduce the infectious diseases' spread.This will reduce the infectious diseases' harm, protect the health of the population, and maintain social stability and economic development [6].Therefore, to reduce the infectious diseases' harm to human society, a predictive model for infectious disease transmission based on Multi-Feature Convolutional Long Short-Term Memory (MF-Conv LSTM) is established.
The research innovation lies in fusing multi-scale features through the Bagging method when simulating and predicting the spread of infectious diseases.The experiment covers the mobility patterns of individuals in different regions and considers various information such as socioeconomic factors, population structure, and behavioral habits.The main research contribution is to simulate and predict the transmission behavior of infectious diseases through multi-angle and multi-dimensional analysis.This can more comprehensively capture and simulate the dynamic process of infectious disease transmission.This can enhance the infectious disease simulation and prediction accuracy.
The main research framework is as follows.Firstly, an investigation is conducted on the research status of infectious disease transmission models and prediction models such as Long Short-Term Memory (LSTM).Secondly, research is conducted on establishing personnel flow characteristic models and infectious disease transmission prediction models.Thirdly, experimental verification is conducted on the personnel flow characteristic model and the propagation prediction model.Finally, a research summary is provided.

Literature Review
The spread of large-scale infectious diseases mostly relies on air transmission.Buckee C et al. proposed the viewpoint that social and cultural forces influence the spread and response of infectious diseases to understand their transmission and response capabilities.They believed that social, economic, and cultural forces shaped the dynamics of infectious disease outbreaks.The new data source provided possibilities for studying disease transmission behavior [7].In response to the COVID-19, mathematical modeling is widely used to track and predict the disease spread.James L. P et al. proposed the differences and limitations in predicting these models.Enhancing the model inference effectiveness was crucial for current and future epidemic decision-making [8].Ghanbari B utilized fractional derivatives using the Mita Leffler kernel to study the internal steady-state stability of susceptible and infected prey and predator disease models.The existence and uniqueness of solutions for fractional-order models was proposed.The new operator captured the model's theoretical characteristics, which was more suitable for describing real-world phenomena than integer order equations [9].Zhu Y et al. proposed a statistical transmission model based on case data and analyzed the sensitivity of different assumptions to estimate the early transmission of COVID-19 in China.When the parameter values were between 2.7 and 4.2, the propagation model's simulation effect was the best [10].Bedson J et al. reviewed infectious disease modeling methods to address the spread of diseases and proposed challenges and opportunities for integration with social science research and RCCE practices.Interdisciplinary collaboration and comprehensive disease models were crucial for reducing disease transmission [11].
LSTM is a common prediction model.Bi J et al. proposed an LSTM hybrid prediction method based on Savitzky Golay (SG) and time convolutional networks to accurately predict network traffic.This method combined the advantages of SG filter, TCN, and LSTM, which outperformed state-of-the-art algorithms in prediction accuracy, making it suitable for multiple fields [12].Saeedi A et al. proposed a deep learning framework using electroencephalography for early diagnosis and prevention of severe depression.This framework combined brain connectivity analysis and deep learning architecture.Using one-dimensional Convolutional Neural Network (CNN), a high accuracy of 99.24% was achieved in the diagnosis of severe depression [13].Huang F et al. proposed an improved LSTM to improve the deep learning model for text sentiment analysis, ignoring the sentiment modulation and lower level abstraction.This model integrated emotional intelligence and attention mechanisms, designed an emotion enhanced LSTM, and introduced topic level attention mechanisms.This method obviously improved sentiment classification performance on real datasets [14].Ramaraj P proposed a CNN-based LSTM face detection technique to solve face recognition under unconstrained conditions, using self-channel and self-spatial attention blocks for feature extraction.This method outperformed existing methods on difficult datasets such as Kanpur [15].Chen et al. proposed using LSTM to predict monthly rainfall and compared it with random forests to address the uncertainty in rainfall prediction.LSTM performed better than random forests on both sites, improving prediction accuracy and making it suitable for monthly rainfall forecasting under global climate conditions [16].
In summary, infectious diseases' spread predicting can effectively reduce the transmission rate and improve people's ability to prevent infectious diseases.However, the current simulation schemes for infectious disease transmission have poor simulation effects on a wide range of infectious diseases, leading to poor prevention effects for such infectious diseases.LSTM is a common data prediction model.Multi-feature fusion CNN is adopted to improve LSTM.The improved model is adopted to analyze the personnel flow's characteristics.Personnel flow characteristics and infectious disease compartment models are utilized to simulate and predict the spread of infectious diseases.

Establishment of a personnel flow characteristic model based on MF-Conv LSTM
The movement of personnel within a fixed area is usually related to their life and work.Based on people's general life and work behaviors and habits, the general categories of personnel's actions can be divided into retail and entertainment, supermarkets and pharmacies, parks, public transportation, workplaces, and residences [17].Before establishing a personnel mobility characteristic model, it should collect data on people's mobility.The collected raw data cannot be directly utilized.Therefore, it should standardize the raw data according to equation (1).
In equation ( 1), ′ refers to the standardized data. refers to the specific eigenvalue of a certain type of behavior.̅ refers to the mean eigenvalue.σ refers to the standard deviation.After completing data standardization, these data still need to be partitioned.When dividing the data, the sliding window method was used, which can maintain the temporal continuity of the data and ensure that the samples used for model training have temporal correlation.The sliding window method can flexibly control the granularity and quantity of sample data by setting different window and step sizes.When dividing data, the sliding step size is set to 1, samples' overlap ratio of each class is 85.7%.The sliding window size is 7.When constructing a model of personnel mobility behavior characteristics, LSTM is adopted as the basis.Multifeatures are integrated to analyze the characteristics of personnel mobility.When analyzing the personnel flow characteristics, three types of features are extracted, namely time-domain, frequency-domain, and other features [18].Time-domain features are extracted using principal component regression analysis, which includes maximum, minimum, mean, standard deviation, and median.The frequency domain features are extracted using discrete Fourier transform, which is represented by equation (2).
In equation (2),  refers to the sampling period.() refers to a time-domain signal. − 2  refers to the constant multiplier of the Fourier transform. refers to the frequency domain index.When extracting other features, this study adopts a polynomial curve fitting method for extraction, which can map the data to a high-dimensional space and capture the complex patterns and trends of the data through polynomial coefficients.This method is suitable for capturing nonlinear relationships and can provide an intuitive understanding of data variation patterns.The feature extraction is represented by equation ( 3) [19].
Due to the regional characteristics of personnel flow, local feature extraction is required for different regions.A 2D-Convolutional Neural Network (2D-CNN) is constructed to extract local features and construct a classifier in Figure 1.

A predictive model for infectious disease transmission based on vaccination compartment transmission model
After analyzing the characteristics of personnel flow using MF-Conv LSTM, the output result is the infectious disease transmission prediction model's input data.The Susceptible-Infected-Recovered (SIR) compartment model is a classic infectious disease model utilized to describe infectious diseases' transmission in populations.This model was proposed by Kermack and McKendrick in 1927.LSTM divides the total population into three parts: susceptibility, infection, and recovery.When establishing this model, the following assumptions need to be made.Firstly, the population remains unchanged, without considering factors such as birth, death, or migration.Secondly, the contact rate between susceptible and infected individuals is directly proportional to the total susceptible and infected individuals.The proportionality coefficient means the infection rate.Finally, infected individuals' recovery or removal rate is directly proportional to the total infected individuals.The proportional coefficient means the recovery rate.LSTM is represented by equation ( 4) [20].
In equation ( 4),  refers to the total population. refers to the infection rate. refers to the response rate. refers to the susceptible groups number. refers to the number of infected individuals. refers to the recovered population number.When constructing the vaccination prediction model for infectious disease transmission, this study takes the COVID-19 as the simulation prediction target.According to its transmission characteristics and the vaccination situation of various vaccines in the later stage, an improved LSTM is designed.There is a kind of infected but asymptomatic population in the infected people of COVID-19, which is called Expose.Since COVID-19 can be killed in a short time, a death population (Death) is added in the study.Vaccination has obvious intervention on the epidemic spread.Vaccines need to be considered when building a new model.Figure 4 shows the new model's structure.

Susceptibility
Expose Mild patients Severe patients The COVID-19 vaccine is divided into Class I and Class II according to the type of vaccination.The infected population is divided into mild and severe according to the patient's condition.Severe patients need hospitalization.However, severe patients may not be able to receive inpatient treatment due to medical resource issues.The parameter information of LSTM considering vaccine administration in Figure 4

Experimental environment setting
A simulation and testing experimental environment of MF-Conv LSTM was established to validate the constructed model's feasibility and effectiveness.The device operating system is the Windows 11 64bit operating system.The programming language is Python 3.7.3.The framework is pytorch.When training MF-Conv LSTM, a multi-feature integration analysis was conducted on the characteristics of personnel flow in X region as an example.X region has a total of 42 administrative regions, each containing 3-4 levels.The COVID-19 is an infectious disease with the largest scale of transmission and the deepest social impact in recent years.Therefore, this infectious disease is considered as a simulated analysis object.When constructing a training and testing dataset for multi-feature integrated analysis of personnel mobility, the study utilized personnel mobility data from X region in 2020 as the training set, with 503 pieces of data.The test set was constructed based on personnel mobility data from 2021, with 93 pieces of data.The training data set for simulation and prediction of infectious disease transmission was the real transmission data of COVID-19 from February 2020 to July 2021.The test set consisted of real propagation data from July to November 2021.Table 2 provides detailed information on the experimental environment and training test dataset.

Integrated analysis of multiple characteristics of personnel mobility
The study trained and tested the MF-Conv LSTM based on personnel flow characteristics data in X region from February 2020 to February 2021 in Figure 5.During training, the highest accuracy was 0.94.In the test, the highest accuracy slightly decreased but remained around 0.9.To validate the MF-Conv LSTM's effectiveness, ablation experiments were designed.The learner's prediction scores of the regenerations number in infectious disease infected populations were compared when using different feature extraction methods in Table 3.In Table 3, for learners using a single feature ensemble, the predicted score for the regenerations number in infected populations of infectious diseases was consistently lower than that using a multi-feature ensemble learner.When using only frequency-domain features, the learner's highest prediction score for the regenerations number of infectious disease infected individuals was only 0.6682.When using only time-domain features and local flow features, the learner's prediction score did not exceed 0.8.When using deep learning feature ensemble, the learner's prediction scores all reached 0.8 or above.When using MF-Conv LSTM for feature integration, the learner's infection prediction score reached 0.8705.The other two deep learning ensemble methods did not exceed 0.85.

Analysis of prediction results for the infectious diseases spread
After training and testing the MF-Conv LSTM, the study simulated and predicted the spread of infectious diseases using this model in Figure 7.During the period from February 2020 to August 2020, this model had a good fitting effect on the daily newly confirmed cases of infectious diseases, with a basically complete fit.This indicated that this model's training effect was good during this time period.From September 2020 to October 2020, this model's fitting curve had a significant error, with a daily confirmed cases error of around 10000 during that time period.The fitting effect returned to a higher level in the subsequent time period.Figure 7 (b) shows the simulation prediction results for the infectious diseases spread.The overall fitting effect was good.However, at the beginning of the simulation prediction, there was a significant error, which could reach around 8000 people.The error for the rest time did not exceed 2000 people.The study further analyzed the daily number of new infections and daily deaths under vaccine intervention in Figure 8.  value of approximately 0.85 after convergence.MF-Conv LSTM's highest accuracy in testing was around 0.92.When using multi-feature ensemble simulation analysis, the regenerations number of infectious diseases' highest prediction score by the learner was 0.8705.When using single feature ensemble for simulation analysis, the predicted scores for the regenerations number of infectious diseases' highest prediction score did not exceed 0.85.When simulating and predicting the transmission behavior of infectious diseases, the highest prediction error was 8000 people.The designed infectious disease simulation and prediction model that considers the characteristics of personnel flow can accurately simulate the transmission behavior of infectious diseases.This provides direction for social intervention in the decision-making of infectious disease transmission and enhances society's defense ability against large-scale infectious diseases.However, the designed model has high data dimensions and complex calculations when extracting personnel flow characteristics.Future research will further optimize the extraction and analysis of personnel flow characteristics to reduce the computational complexity.

Figure 1 Figure 2 Figure 3
Figure 1 Local feature-extracted structure of the 2D-CNN The constructed 2D-CNN includes an input layer, four feature layers, a flattening layer, a fully connected layer, and an output layer.Based on human daily habits, the time step is set to 7 when extracting local features from different regions.To further ensure the personnel flow in time-series

Figure 4
Figure 4 Considering the SIR for vaccination

Figure 5
Figure 5 Ablation experimental analysis of the MF-Conv LSTM multi-feature ensemble approach Figure 5 (a) shows MF-Conv LSTM's loss values during training and testing.As iterations continued to increase, MF-Conv LSTM's loss values were continuously decreasing.During model training, during the 10th iteration of training, this model completed convergence with a loss value of approximately 0.85 after convergence.During model testing, the convergence speed decreased slightly and was completed in the 75th iteration, resulting in a loss value of around 1.05 after convergence.Figure 5 (b) shows MF-Conv LSTM's accuracy changes during training and testing.As the training steps increased, the model accuracy during training and testing continued to improve.During training, the highest accuracy was 0.94.In the test, the highest accuracy slightly decreased but remained around 0.9.To validate the MF-Conv LSTM's effectiveness, ablation experiments were designed.The learner's prediction scores of the regenerations number in infectious disease infected populations were compared when using different feature extraction methods in Table3.

Figure 7
Figure 7 Simulation analysis of infectious disease prediction combining MF-Conv LSTM with an improved SIR model Figure 7 (a) shows the model training situation.During the period from February 2020 to August 2020, this model had a good fitting effect on the daily newly confirmed cases of infectious diseases, with a basically complete fit.This indicated that this model's training effect was good during this time period.From September 2020 to October 2020, this model's fitting curve had a significant error, with a daily confirmed cases error of around 10000 during that time period.The fitting effect returned to a higher level in

Forecast Real 8 Figure 8
Figure 8 Simulation results of infectious disease transmission under vaccination intervention

Figure 8 (
Figure 8 (a) shows the simulated prediction results of the daily number of new infections.After vaccination intervention, the infectious disease experienced two peaks of new infections during transmission.The first peak occurred between October 2020 and November 2023, with a maximum daily increase of 38000 new infections.The second peak occurred from December 2020 to January 2021, with a daily increase of approximately 68000 new infections.Figure 8 (b) shows the simulated prediction results of the daily death toll.This model predicted that the daily cumulative death toll would reach its peak in mid January 2021, at around 1400 people.The actual daily cumulative death toll reached its peak in late January 2021, at around 1700 people.After February 2021, the predicted daily cumulative death toll could drop to below 400.
The constructed MF-Conv LSTM personnel flow characteristic analysis model includes two parts overall.Firstly, it is the multi-feature integration training for personnel mobility.This requires extracting frequency domain features, time domain features, dimensional features, and local characteristics of personnel mobility data.Simultaneously, it should construct and train various feature models.The second part is to test and integrate various features.

Table 1
is represented in Table 1.EAI Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 | Y. Wang 6

Table 2
Details of the experimental environment settings EAI Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 |

Table 3
Ablation experimental analysis of the MF-Conv LSTM multi-feature ensemble approach