A Novel Prediction Method Based on Artificial Intelligence and Internet of Things forDetecting CoronavirusDisease (COVID-19)

School of Information Management, Nanjing University, Nanjing 210023, China Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing 210023, China School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 210023, China Center for Data Management, %e First Affiliated Hospital of Nanjing Medical University (Jiangsu Province Hospital), Nanjing 210023, China Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210023, China Department of Geriatrics, %e First Affiliated Hospital of Nanjing Medical University (Jiangsu Province Hospital), Nanjing 210023, China


Introduction
According to the statistics of the coronavirus disease 2019  pandemic reported by the World Health Organization (WHO), there are already more than 56 million confirmed cases and 1.35 million deaths as of 15 : 59 Central European Time (CET),November 20, 2020, which indicate a very serious global epidemic situation. e number of COVID-19-infected patients has exceeded 1 million in many countries, including the United States, India, Brazil, and France. e United States, in particular, has over 10 million confirmed cases of COVID-19. us, it is critical to conduct a status analysis and research on the effects of epidemic prevention and control measures based on different epidemic situations in countries worldwide.
Mathematical models are often used by researchers to derive the nonspreading conditions of infectious diseases and predict and analyze the trends of epidemics and infected populations. Correspondingly, relevant strategies are developed accordingly. One of the currently used epidemic prediction models is the Malthusian growth model [1]. However, this model still has a long way to go before being applied in the real world. A logistic regression model [2], also known as the SI model (S � suspect, I � infected), has been proposed to distinguish between infected and uninfected individuals. e SI model's predictions are also unrealistic when cure factors are not taken into account. Furthermore, the SIS model (S � suspect, I � infected, and R � recovered) was learned by comparing the behaviors in different regions. We propose the SEIR epidemic models (S � suspect, E � exposed,I � infected,and R � recovered) to predict when the cured population without immunity is more vulnerable to reinfections. On the contrary, the classic SIR model is commonly selected for a cured population with strong immunity. is classic SIR model is widely used to describe the overall trends of epidemics owing to its ease of operation and clear and concise structure. is classic SIR model was used, for example, to analyze the 2003 SARS epidemic. e epidemic's evolution and the overall spread patterns of the disease are described [3]. Based on this, the SEIR model introduces the exposed, that is, class E population, which considers that only part of the people who are easily infected and had contact with infected people are infectious, which makes the transmission cycle of the disease longer. However, more detailed factors are not considered in the process of epidemic prediction.
To complete the task of epidemic prediction, we developed a novel prediction method based on artificial intelligence (AI) and the Internet of ings. On account of the existing model, many Internet machine learning algorithms can be employed for the prediction method. As a result, we investigated a research-related work and analyzed the common algorithms used in medical prediction. We aimed to find an optimal algorithm with good convergence characteristics and efficiency to complete the prediction. A new IoT platform for epidemic prediction was built using existing models. Relevant experiments were conducted, and the algorithm's benefits were also validated. e remainder of this paper is organized as follows. We discuss the related work inSection 2and the algorithm model inSection 3. InSection 4, we present the design of the prediction platform. We perform the simulation work inSection 5. Finally, we conclude the paper.

RelatedWork
COVID-19 now has no real control over the world. ere are about 170 million newly diagnosed cases in the world. Previously, there were 445539 newly diagnosed cases in a single day. e total number of deaths in the world is about 3500000, with the number of new deaths in a single day being 10000. ere are 28 countries or regions in the world with more than 1 million newly diagnosed cases. e epidemic model is a result of research on vaccination against smallpox in a paper presented by Daniel Bernoulli in 1760. e mathematical model research began in the early twentieth century. When Kermack and McKendrick studied the Black Death epidemic in London in 1927, they proposed the SIR compartment model. In the analysis of infectious diseases, the SIR mathematical model has preconditions. First, it considers population birth and death, which may have an impact on population size, but the impact is minimal. Second, the SIR model assumes that the susceptible and the infected populations have certain mobility, and the susceptible population will migrate to the infected population by a certain factor. Finally, the SIR model assumes that the infected population will enter the emigrant population with a fixed proportion coefficient, and the state is irreversible. SIR is an effective simulation model of infectious diseases. By dividing the population structure into three groups, namely, susceptible, infected, and displaced, we can simplify the transmission law of infectious diseases and obtain a more accurate transmission law of infectious diseases. e traditional SIR epidemic prediction model divides the population into three categories: those who are not sick but are likely to be infected (S), those who have been infected and can infect (I), and those who have been healed or died (R). However, the factors affecting population dynamics such as birth, death, and migration are not considered. e population is a constant.
Compared with the SIR model, the SEIR model introduces the incubation period and adds the exposed who are in this latent period of infection. e healthy person who has come into contact with the patient does not become ill right away, but as the pathogen's carrier, he or she becomes E. is mechanism is very consistent with the novel coronavirus prediction.
Various teams at home and abroad have researched on epidemic trend prediction using statistical model, SEIR model improved by SEIR model, and machine learning model; however, the prediction results have large fluctuations. Scholars used the SEIR model to predict the inflection point and peak value. e classic SEIR model can be applied to any type of epidemic situation, but the infection of personnel flow must be taken into account. e modified SEIR model was then used to fit and analyze the prediction of COVID-19. However, a significant difference was observed between the prediction result and the number of people reported by the National Health Commission because the impact of the prevention and control measures on the flow of people was not taken into account. Based on existing epidemic prevention and control measures, industry insiders incorporated the flow of people into the SEIR model to forecast the epidemic situation and also concluded the effectiveness of the tourism ban. Researchers from Southeast University published a paper on medRxiv to evaluate the epidemic trend and risk of the COVID-19 outbreak by using the modified SEIR model. While there are many uncertain factors in the personnel network, both the SEIR and modified SEIR models must analyze a large number of parameters, including R0 and removal rate. In this regard, many researchers use machine learning methods for data prediction and random forest methods to divide different cities into different prevention and control levels, which provides a good reference for epidemic prevention and control [4].
At the same time, foreign scholars have conducted much research on the epidemic situation. e main research aim and findings of this paper were as follows: (1) is paper aimed to examine and compare existing machine learning algorithms in the medical field and find the best algorithm model based on AI for 2 Security and Communication Networks creating a prediction platform to prevent and control epidemics.
(2) is paper aimed to combine machine learning with IoT, a platform of the IoT for the prevention and control of COVID-19, based on the traditional epidemic model. (3) is paper aimed to introduce the designed COVID-19 prediction platform in the order of data collection, data cleaning, machine learning training model, and front-and back-end frameworks. (4) e simulation results show that the predictions made by the AI designed in this paper's random forest model are more accurate than those made by the logistic regression and support vector machine (SVM) algorithms.

Algorithm Model Analysis
is section reviews and compares the existing machine learning algorithms used in the medical field, intending to identify the most suitable algorithm model for creating a prediction platform to prevent and control epidemics.

LogisticRegression.
Logistic regression is a classification algorithm commonly used to solve binary classification problems. is algorithm has been extensively studied in the industrial and medical fields because of not only its simplicity but also its strong interpretability. e essence of logistic regression is to use maximum likelihood estimation to approximate the parameters of a given distribution [5]. To date, logistic regression has been applied in many fields, in which there are many medical scenarios [6]. Unfortunately, this algorithm cannot solve the nonlinear problems in the medical field, although it has broad prospects for health care and has been greatly studied in disease diagnosis. Typically, epidemic predictions are not linear, thus limiting the use of the logistic regression algorithm in epidemic prediction. Due to the simplicity of its form, the accuracy rate cannot be guaranteed, which makes fitting true data distribution using the logistic regression algorithm difficult. As a result, the logistic regression algorithm is not recommended for predicting epidemics [7].

Support Vector Machines.
Before the widespread application of deep learning algorithms as machine learning algorithms, SVMs were considered the optimal method for small-sample classification problems. SVM is a nonclustering technique that can calculate distances outside of the plane. When the specific parameters of a given model are assigned by a training set, SVM classification tasks only respond to the support vectors and have no relation with the data dimensions. e computational time and storage requirements are reduced as a subset of the sample set [8]. SVM is a classifier, and the maximum intervals between different cases are its primary indicators. e hyperplane positions can be used to obtain constraints. In other words, SVM is mainly used to ensure the correct classification of two data types in the future. However, the constraint requires a maximum distance between the classification line and that point within the maximum acceptable error. Furthermore, SVM is a kernel-based technique that allows for higherdimensional space conversion using kernel functions. e structure of SVM hyperplane solutions allows them to solve quadratic programming problems while meeting the requirements of duality and convexity. In fact, for solving the convex optimization problem, the optimal plane can be determined according to strong duality [9].

Random Forest.
A decision tree using the object attributes and relationships to build tree-like diagrams that can be obtained using probability analysis. Each branch of the decision tree represents a prediction direction in those diagrams, and each leaf node denotes the final prediction result. e decision tree prediction model classifies data into different classes using different object categories, allowing decision-related information to be intuitively displayed in the decision-making process [10]. On the contrary, a single decision tree is prone to overfitting, resulting in poor generalization ability. When a decision tree performs feature selection, the classification is accomplished based on the most suitable feature [11]. As already known, the key to most classification decisions should be based on a set of features rather than a particular feature. As a result, when dealing with epidemic data with characteristics such as large-scale and multiple features, the classic decision tree algorithm is ineffective [12]. Random forest was proposed in 2001 and has since been widely employed in classification and regression. To improve the classification efficiency, random forest creates the model using the bagging method combined with decision trees [13]. Because each decision tree is trained using a set of independent random samples and employing random attribute selection, there are no correlations between them. When a new sample is needed to make a decision, every tree in the forest is voted on, and the most voted type is selected as the sample type [14]. Figure 1presents the random forest classification method.
Assuming that the model output vector length is 3, the probability vectors can be first obtained by training multiple sets of decision trees of the model, and the final output can then be determined by averaging multiple sets of the probability values. In regression problems, the random forest algorithm can also be employed. However, the overfitting of the decision tree algorithm can be a barrier. By increasing the number of decision trees, random forest prevents model overfitting. Meanwhile, random forest classifiers can deal with missing data, making the method suitable for the analysis of big data in epidemic environments where data collection is difficult. erefore, the epidemic prediction model designed in this paper is based on the random forest algorithm.
As presented inTable 1, SVM is limited to small cluster samples, and its efficiency is low when there are too many observation samples.
e logistic regression algorithm is sensitive to the multicollinearity of the model's independent variables. To reduce the correlation between candidate variables, it is important to select representative independent variables using factor analysis or variable cluster analysis. Random forest can process high-dimensional data (i.e., data with many characteristics) and does not require feature selection. It has excellent anti-interference and overfitting capabilities. In conclusion, for the input of epidemic data of a large order of magnitude, random forest algorithm is the most suitable.
Random forest algorithm is a supervised learning algorithm that uses an algorithm called bagging to combine many decision trees and classifies by voting mechanism. It has the advantages of fast training speed, strong generalization ability, and good classification performance. e following introduces the decision tree first and then the random forest algorithm next as the mathematical foundation of this article.
(1) Decision trees: Decision trees, also known as classification and regression trees (CART), can be used to describe different classes or values output after inputting a set of features. A decision tree is an example of a tree structure. Each internal node, branch, and leaf node denotes a different attribute test, test output, or final test result. Suppose that X is an input vector containing m features and is the output value, S is a training set containing n ob- In the training process, the algorithm divides the input on each node. First, the CART algorithm recursively divides the input space X into two different branches: For better division,(j, d)should minimize the cost function, usually the variance of the child nodes. e variance of the nodepis defined as follows: where Y p denotes the mean value of Y i in node p and then divides the child nodes in the same way. e tree will stop when the maximum number of levels is reached or the number of observations contained in a node falls below a predetermined number. At the end of the training, a prediction function h(X, S n ) based on S n will be established: (2) Random forest: e random forest algorithm uses the Bootstrap sampling method to extract multiple samples from the original samples. It creates a decision tree model based on each Bootstrap sample. e predictions of multiple decision trees are then combined, and the final result is determined by voting. Random forest regression is a strong predictor that incorporates many weak predictors. Randomly select n replaced observation data from the original dataset S n to obtain a Bootstrap sample. e random forest algorithm selects several Bootstrap subdatasets (S 1 n , . . . , S q n ), then applies CART to these subdatasets, constructs some trees, and obtains a prediction function, such as (4).
Suppose that the training set is drawn from the independent and identically distributed random vectors (X, Y), X denotes the input vector, and Y represents the output vector; then, the mean square generalization error of the predicted output h(X) is as follows: e prediction output of random forest regression is obtained by averaging h decision trees h(θ, X k ) , which has the following theorem.
Mark the right part of (6) as PE * , which is the generalization error of the random forest. e average generalization error PE of each decision tree can be defined as follows: Theorem 2. For all θ, there are In the equation, ρ is the weighted correlation coefficient of residual Y − h(X, θ) and Y − h(X, θ ′ ), and θ and θ ′ are independent of each other. eorem 2 provides the conditions for obtaining an accurate regression forest: low correlation between residuals and low-error decision tree. e random forest regression algorithm reduces the average error of the decision tree through the weighted correlation coefficient ρ. e steps of the random forest regression algorithm can be summarized as follows: let θ be a random parameter vector, and the corresponding decision tree is T(θ). Let B be the domain of X; that is, X: Ω↦B⊆R p , where p ∈ N denotes the dimension of the independent variable. Each leaf node of the decision tree corresponds to a rectangular space. Remember the rectangular space of each leaf node as R l ⊆B(l � 1, 2, . . . , L). For each x ∈ B, if and only if one leaf node satisfies x⊆R l , let the leaf node of the decision tree T(θ) be l(x, θ).
Step 1. Use the Bootstrap method to resample; randomly generate h training sets θ 1 , θ 2 , θ 3 , θ 4 , . . . , θ k ; and use each training set to generate the corresponding decision tree Step 2. Assuming that the feature has M dimensions, randomly extract m features from the M-dimensional features as the split feature set of the current node, and split the node in the best split method among the m features. Generally speaking, during the growth of the forest, the value of m remains unchanged.
Step 3. Each decision tree gets the maximum growth without pruning.
Step 4. For new data, the prediction of a single decision tree T(θ) can be obtained by averaging the observed values of the leaf node ω i (x). If an observation value X i belongs to the leaf node l(x) and is not 0, let the weight ω i (x) be e sum of the weights is equal to 1.
Step 5. e prediction of a single decision tree is obtained by the weighted average of the observed value of the dependent variable Y i � (1, 2, 3, . . . k). e measured value of a single decision tree can be obtained using the following equation: Step 6. Use (11) to obtain the weight of each observation by averaging the weight of the decision tree ω i (x, θ) (t � 1, 2, . . . , k): en, the predicted value of random forest regression can be recorded as follows: e flowchart of the random forest algorithm is presented in Figure 2

Prediction Platform Design
In view of the global spread of the COVID-19 epidemic in early 2020, we designed an epidemic prevention and control platform based on machine learning. e applications of this project include patient data analysis, early detection and warning of epidemic situations, rapid screening of suspected patients, and remote diagnosis and treatment. e patient data analysis application service must extract the patient's sign data for relevant detection, and the platform analyzes the data through the knowledge map to assist in the diagnosis in many aspects, significantly expediting virus diagnosis. e application service of early detection and early warning of epidemic situation needs to conduct regular and fixed-point investigation of disease data in various regions, conduct in-depth analysis by using knowledge map, and accurately grasp the warning of epidemic situation. e rapid screening application service for suspected patients collects data from urban infrastructure sensors and fixedpoint sensors. e platform will match the data to the early symptoms of the virus, and people who are exposed to the virus and work in high-risk jobs will be given special attention to achieve the best resource allocation under the weight. Remote diagnosis and treatment application service for home isolation personnel, particularly suspected or close-contact personnel, can easily and quickly enter their information and achieve efficient and real-time remote diagnosis and control of the epidemic situation, and ordinary patients can be visited at home via remote diagnosis and treatment function, avoiding going to the hospital and Security and Communication Networks 5 reducing the risk of infection. e epidemic prevention and control platform based on machine learning designed in this paper realizes the multiprocess integration of investigation, early warning, and diagnosis, and treatment after the outbreak of the epidemic achieves the unified analysis of various data and has the characteristics of accuracy, intelligence, and learnability. e platform architecture is presented in Figure 3.
During the overall integration of the system, the internal integration of each construction unit shall be done first, and then the integration of different construction units shall be carried out according to the interface definition of different system construction units and the order of strong to weak coupling or operation constraints. e interface definition method in the overall design shall be used to refer to the integration of various systems within each construction unit. e interfaces of various subsystems within the construction unit are clearly defined. On this basis, integration within the unit and between different units can follow one of two integration sequences, which can cross and coexist: the strength of coupling and the restriction of operation premise. Decomposition integration can be employed for integration between specific units. at is to decompose the integrated parts of this unit and other units and integrate them with relevant units in the form of a decomposition unit to reduce the complexity of integration and facilitate the positioning of problems. e overall integration test between the integration and relevant units must be conducted after each decomposition unit has completed the integration with specific integration objectives. Simultaneously, equipment and software products with good interconnection and interoperability must be selected. Moreover, attention must be paid during the development of application software to the interaction with other products to maintain consistency. In particular, the selection of a database requires a seamless connection with the heterogeneous database. e integrated system shall be convenient for expansion due to increased demand in the future.

Data Sources.
e data sources involved in this project only include the hardware accessed by the platform, the data entered by the client, and the data of the original hospital systems (His, EMRs, PACS, and RIS), whereas the types of transmission data include user's physical sign data, basic information data, information interaction data, medical data, and publicity data of medical knowledge map. e medical knowledge map contains at least 5000 common disease data and 1000 virus-related data. is project involves structured, semistructured, and unstructured data. From the perspective of medical data storage, the overall data storage capacity of the medical industry is mainly 1-50 TB, and there are significant differences among medical institutions. For the time cycle of medical data, medical records are generally retained for a long time, and the requirements for online time are higher than those in other industries. e retention time of outpatient and emergency records shall not be less than 15 years, and the retention time of inpatient medical records shall be longer (about 30 years).
e medical records of some celebrities will be kept indefinitely. Hundreds of image data must be stored and accessed during a patient's diagnosis activity. In general, clinical electronic medical record data use an XML file format that conforms to the standards, but the file format will continue to evolve. Medical data from two sources are stored in the medical database: one is the acquisition and input of the underlying hardware and the other is the  medical data generated by the data analysis center. ese data will be stored in a structured format, and if they are retrieved, they will be subject to permission access control. After granting access, the system will also collect visitor information to ensure the privacy, security, and traceability of medical data.

Data
Access. e lower sensor is composed of medical sensing equipment, terminal equipment, information operation, and maintenance equipment using mobile medical perception technology. Wireless sensing technology, body area network technology, communication technology, terminal pass-through technology for telemedicine and positioning technology, monitoring network chip technology, and physiological signal acquisition and processing technology are examples of such technologies. Specifically, the lower sensor equipment is mainly composed of medical sensor equipment, terminal equipment, information operation, and maintenance equipment and is used for data collection and input of the prediction platform. In general, unstructured medical data is more serious and may have an impact on the storage quality of the database. e platform described in this article can handle hardware devices from a variety of ecological environments, and the output data is data structured using algorithms. Because of the diversity of unstructured mobile medical terminals, the prediction platform we designed supports a variety of access technologies and networking methods. By gathering multiple existing mobile medical terminals to form an enhanced virtual terminal, the prediction platform can be based on users. e environment where it is located automatically selects a suitable terminal device to access a specific wireless network and forms a multiterminal collaborative medical terminal system through virtual terminals formed by multiple terminals. e coordination of access, connection, transmission, and management of multiple heterogeneous network resources is referred to as multinetwork coordination. Because different medical systems use different heterogeneous networks for the transmission and use multiple wireless access technologies, it is important to overcome the limitations of a single network for multiple existing mobile medical terminals in order to achieve a more accurate and timely infectious disease prediction.

Multiplatform, Multisystem Data Normalization Processing, and Intelligent Analysis.
Before inputting data to the random forest-based prediction platform designed in this research, the data needs to be preprocessed to facilitate the training and prediction of the neural network. Data normalization primarily refers to the distribution of experimental sample data into the intervals [0,1] or [−1,1] via multiplatform, multisystem, and heterogeneous health big data, so that the experimental sample data can be analyzed. e information is dimensionless. When collecting experimental data samples, this platform will generate unique sample data (singular sample data refers to the huge sample vector generated relative to other input sample data). rough this, the problem of gradient explosion and the subsequent decrease in the learning rate can be avoided. According to the data input requirements of the AI model and the neural network, an appropriate normalization method for health big data should be selected among the three commonly used normalization methods: min-max standardization, Z-score standardization method, and Z-score simplification. Moreover, the intelligent analysis of chronic disease data should be analyzed.

Data Security and Privacy Protection
Methods. e security architecture of this prediction platform mainly includes application layer security, transport layer security, and perception layer security. e perception layer's security policies primarily include device authentication, data encryption, security coding, security protocols, and access control. Security strategies such as vulnerability scanning, active defense, security protocols, network filtering, and authorization management are mostly observed in the transport layer. e application layer mainly includes security policies and methods such as security auditing, intrusion detection, hot machine disaster recovery, virtual isolation, cloud antivirus, user permissions, and security management. e platform security architecture is presented in Figure 4.

RegressionValidation
e newly designed COVID-19 prediction platform in this paper includes the following five sections: data collection, data cleaning, machine learning training model, Bootstrap + Vue front-end framework, and Django back-end framework.

Data Collection andData Cleaning.
Before model training, data collection and data cleaning are performed. e dataset contains 43 characteristic values, including monocyte percentage, monocyte count, lymphocyte count, platelet distribution width, and a list of label values. e dataset is randomly divided into an 80% training set and a 20% test set using the Numpy matrix operation library and the pandas library based on Numpy for data processing. e process is shown in Figure 5.

Machine Learning Training Models.
e training models are created using the logistic regression algorithm, SVM, and random forest. Both the training and test sets have parameter passing settings [15,16]. e primary goal of machine learning is to obtain a prediction model by mining the inherent patterns in historical training data and then applying the model to similar data situations [17,18]. e general workflow diagram is presented in Figure 6.
In Section 6, the model training mechanisms and prediction accuracies are simulated and compared.

Bootstrap + Vue Front-End Framework.
After model training is complete, model visualizations are compared. e sign-in and registration pages are shown in Figure 7. For Security and Communication Networks legality verification, e-mail and password must be entered and sent back to the server. e registration information from the registration page is sent to the background via a POST request and saved in the database using the Django ORM model. e default registration is a regular user who can only perform the detection function. After entering the system, a relevant interface will appear, which requires nonrepeated patient numbers and presents the prediction information of all patients. After users click the detection button in the upper left corner, the detection model box will appear. Users must enter patient information, such as whether the patient has a fever or COVID-19, as well as routine blood test results. e data will be sent to the background for prediction processing, and the results will be returned.

Django Back-End Framework.
Django background is primarily used for providing a front-end request interface and returning values required by the front-end template, as well as managing the user permission in the meantime.
Data, including user models, user details, and front-end homepage display information, are stored in the database through the ORM model and returned to the foreground (Figure 7).

ModelPrediction Simulation
e simulation in this paper is based on the epidemic dataset containing more than 40 kinds of characteristic data, including lymphocyte percentages, classifying whether the disease is classified, setting up the five middle training models of random forest and neural future, comparing the accuracy and other characteristics, and concluding.

Simulation Analysis.
Once the models are complete, the parameters required by each algorithm, such as tree number and maximum depth in the model, need to be tuned. Figures 8 and 9present the influences of the aforementioned parameters in the random forest training model on the test set accuracy, respectively. Two parameters are tuned simultaneously in the actual parameter tuning process to select the best accuracy of a test set. Simultaneously, the same parameter tuning process is applied to the other training models, which tunes the parameters with the greatest influence on each algorithm for each optimal Set up training model passing parame ters training accuracy rate     accuracy selection and saves the parameters with the best values as the local models. Finally, the simulation results that are predicted from the datasets that use the real patient information are shown below. As illustrated in Figures 10-12, random forest is selected as the best method for prediction analysis in this scenario when compared with the other methods, each of which has advantages and disadvantages. In Figures 10-12, the red lines indicate the positive probability, and the black lines represent the negative probability.

e Simulation Principle of Random Forest.
is algorithm uses random sampling with the replacement method to select the training set and builds the classifiers accordingly. In addition, multiple decision trees are established and merged for more accurate and stable predictions. Finally, the best classification results are selected.
By voting, the principle of the random forest algorithm is presented in Figure 13. e foundation of random forest is Bootstrap. at is, many new samples of the same size and usability are generated from a sample, and similar samples are generated again from those that have already been created. Bootstrap is also known as a self-help method as it does not use any other sample data [19]. When the sample size is small, this method is considered useful. If the traditional method is used for verifications and segmentations, the sample size will be even smaller, resulting in a larger deviation and a nonoptimal solution [20]. e self-service method not only fails to reduce the training sample size but also leaves.
A validation set: the random forest algorithm is the integration of bagging and decision trees. After multiple samplings, partial samples cannot be extracted from the training set. ese unsampled data are called out of the bag (OOB). OOB is not added into the training set by the model   for fitting, which makes it applicable for the detection of the model generalization ability [21,22].

Conclusions
In this paper, the COVID-19 prediction algorithms based on artificial intelligence were compared. Based on considerations of various characteristic constraints and prediction result accuracies, a prediction platform was established.
rough simulation, it was discovered that random forest has significant advantages in epidemic prediction over logistic regression and the support vector machine. It would perform admirably when applied to the medical platform designed in this paper. Simultaneously, Singh proposed using the unmanned aerial vehicle [23] based on blockchain to achieve contactless transmission in the COVID-19 environment [24,25]. Similar application scenarios such as [26] will be the next development direction of the platform and strived to support more application development in the epidemic environment based on prediction analysis [27].

Data Availability
e patient data used to support the findings of this study are restricted by e First Affiliated Hospital of Nanjing Medical University in order to protect patient privacy. e data are available from e First Affiliated Hospital of Nanjing Medical University for researchers who meet the criteria for access to confidential data.  Figure 13: Random forest algorithm principle diagram. Figure 12: Random forest training model.