An Emergency Measure Completeness Evaluation Method Based on UGC Data

Complete emergency measures are one of the important guarantees for effective emergency response. An evaluation method based on User-Generated Content (UGC) data is proposed to assess the completeness of emergency measures. To implement this method, UGC data comprising emergency needs from the users and emergency plans comprising emergencymeasures for specific emergency were firstly collected using a crawler program. A feature template was then established to identify relationships between different features, and a Conditional Random Field (CRF) model was used to extract emergency measures from the emergency plan and emergency needs from UGC data. )e Siamese network was applied thereafter to compute the similarity between the emergencymeasures and emergency needs.)emissing emergency measures were obtained based on the similarities, and a quantitative method to calculate the completeness was defined. Finally, using province A as a case study, the emergency measure completeness was evaluated and the emergency measures that need to be strengthened and improved were identified.


Introduction
Emergency refers to unforeseen circumstances like natural disasters, accidents, public health incidents, and social security incidents that occur suddenly, causing or possibly causing serious social harm, and such require emergency measures to be taken in order to deal with them [1]. e COVID-19 pandemic is an example of the major public health emergency with a fast transmission rate, complex and multiple variants of the virus, wide infection range, and great difficulty in prevention and control [2]. To minimize the loss of people's lives and property and to maintain social stability in the disaster areas, it is of great importance to ensure the efficient emergency measures after the occurrence of COVID-19 [3].
An emergency plan is a set of emergency measures made in advance to deal with possible emergencies, and it should include emergency measures as complete as possible [4]. e completeness of emergency measures means that the emergency plan should include all the emergency measures that need to be carried out after the occurrence of an emergency, so as to meet all the emergency needs of the public [5]. erefore, the completeness of emergency measures must be evaluated to ensure that emergency plans can achieve ideal emergency disposal effects. Incomplete emergency measures will not only be difficult to meet practical needs but also lead to many emergency dilemmas [6]. For example, in the infant stages of the COVID-19 pandemic, there were a multitude of problems during the pandemic prevention and control such as imperfect emergency plans and insufficient analysis of emergency needs among other problems [7]. ere were some methods to evaluate emergency measure completeness, but these methods have some specific disadvantages such as high cost. With the development of Web2.0 technology, various social platforms have become the first choice for people to express their personal opinions and suggestions. us, when an emergency occurs, people may also choose these platforms to publish their personal emergency needs and these needs reflect the completeness of emergency measures to a certain extent [8]. erefore, this paper proposes evaluating the completeness of emergency measures in the emergency plan based on UGC data. e contributions of this paper are as follows: (1) e paper proposes evaluating the completeness of emergency measures from the perspective of UGC data, which is more economical than traditional methods. At the same time, it defines a quantitative method for computing the completeness of emergency measures. (2) A Chinese corpus is constructed to evaluate the completeness of emergency measures, which can be shared with the researchers. (3) e Siamese network similarity calculation model is applied to match emergency needs and emergency measures, and the completeness of emergency measures is quantified. Not only is the completeness of emergency measures intuitively seen, but also the specific aspects that need to be strengthened and improved are found in this paper.

Related Work
A lot of recent research has been done on emergency plans. From the public emergency response in recent years, it is of great practical significance to evaluate the emergency measures of emergency plans. Previous studies have suggested that the evaluation can be conducted in terms of completeness (demand), emergency responsibility matrix (departmental responsibility), and operability (task) [9]. e completeness of emergency measures focuses on evaluating the corresponding relationship between emergency needs and emergency measures in the emergency process [10]. ere are two main methods of emergency measure completeness evaluation: one is simulation or plan drills and the other is evaluation index systems. For the first method, researchers usually conducted various large-scale exercise experiments with the support of government agencies that provided a large amount of real and effective data for evaluating the effectiveness and completeness of emergency measures in the plan [11]. Curnin et al. [12] took the Indonesian Tsunami and Hurricane Katrina as examples to analyze the factors that may hinder the implementation of an emergency plan during the emergency response process. Kaji et al. [13] applied the medical drill tool to evaluate the emergency drills of several hospitals in Los Angeles and used statistical analysis of the drill results to determine whether the emergency measures of the hospitals were comprehensive. Anna et al. [14] took the agricultural drought in Guatemala as an example and discussed how emergency drills can help identify problems in an emergency plan. Although the operation and implementation of response measures can be observed during drills, not all emergency plans are suitable for drills, since the cost of drills is very high, which makes them impractical.
Other researchers mainly use simulation software or establish an evaluation index system for either one or a class of emergency plans and then combine the evaluation methods to build an evaluation model. An et al. [15] constructed an evaluation index system from five dimensions of legality, scientificity, comprehensiveness, cost, and operability of the plan and evaluated the emergency plan for production safety accidents based on the hesitating fuzzy set method. Sun [16] built a fuzzy evaluation model by the fuzzy comprehensive evaluation method to evaluate and analyze the overall quality of Xi'an subway fire emergency plan. Chen et al. [17] proposed an emergency simulation system of multiperson cooperative urban rail station based on virtual reality, which can improve the problem of insufficient emergency response. Although costs are saved by constructing an evaluation index system and simulation method, the data sources are relatively weak. In addition, these evaluation methods usually fail to point out the deficiencies of the emergency plan content and cannot provide specific and targeted suggestions for revising and improving the emergency plan. e purpose of this paper is to evaluate the emergency measure completeness based on UGC data.
is article thinks that UGC data provides more real emergency needs from people and best reflects the completeness of emergency measures. erefore, emergency-related UGC data is collected first, and then matches are made with the emergency measures in the emergency plan to realize the evaluation of the emergency measures completeness and find the defects in the emergency plan.

Overall Framework
Traditional evaluation methods consume a lot of human and material resources, so this paper proposed a solution based on UGC data to evaluate the completeness of emergency measures. e evaluation process is decomposed into three steps, as shown in Figure 1.
As displayed in Figure 1, the framework proposed in this paper mainly includes the three following parts: (1) Data Acquisition. e data used in this paper includes two parts: one is user comment data (UGC data) obtained from microblogs and the other is emergency plans. For the comment data, a crawler program was designed to retrieve and grab relevant comments on Sina Weibo. For emergency plans, this paper collected the emergency plans for public health emergencies issued by some provinces. e final preprocessing mainly includes two steps: e first is to divide the comments into sentences and extract the sentence in the form of "emergency agencyemergency measures" from the emergency plan. e second is deduplication, Chinese word segmentation, dependency analysis, and other operations.
(2) Need and Measure Extraction. is article regarded emergency measures or emergency needs as the opinion target in sentiment analysis research and used CRF model to extract them. Need and measure extraction is mainly divided into three steps: the first is to select training features according to the characteristics of emergency measures, followed by designing the feature template and, finally, using the CRF++ toolkit for model training in order to obtain the emergency needs in UGC data and the emergency measures in the emergency plan.
e main task here is to match the extracted emergency needs and emergency measures using the text-similarity calculation model and find out the missing emergency measures. e paper defined a comprehensive score of emergency measures according to the matching degree as an evaluation index for the completeness of emergency measures, as shown in the two following equations: where C completeness is the completeness rate of emergency measures; C incompleteness represents the incompleteness rate of emergency measures, where the lower the value, the better the completeness of emergency measures; n is the number of missing emergency measures through similarity calculation; and m is the number of emergency measures provided in the emergency plan.

Extraction Method of Emergency Measures and Emergency Needs Based on CRF
As shown in Figure 1, this paper needs to extract emergency measures from the emergency plan and emergency needs from UGC data and then calculate the similarity between them to get the score of the completeness of emergency measures. is article regards emergency measures and emergency needs as opinion targets and realizes their extraction through the opinion targets extraction method.
Currently, CRF achieves good performances in Chinese word segmentation, part-of-speech tagging, named entity recognition, and other natural language processing tasks, especially in opinion target extraction [18]. is paper uses the CRF model to identify emergency needs and emergency measures and converts them into a maximum probability sequence solution problem. e extraction method mainly includes the following work.
Syntax structure analysis can summarize the rules and make a rule template to extract specific information [19]. erefore, the paper used the dependency parsing interface of LTP (Language Technology Platform) to analyze the dependency relationship between the components in a sentence and to reveal its syntactic structure, such as the phrase structure "subject-predicate-object" and "attributeadverbial-complement." e core verb in a sentence is the central component that dominates other components, but it is uncontrolled by any other component [20]. e example and its notes are shown in Figure 2.
It can be seen from the analysis results that the subject of the sentence is "the government," the core predicate is "regulate," and the object is "market prices." It can be concluded that emergency agencies generally exist in the subject of the sentence, while the emergency measures are generally performed by the core predicate and its object. at is, from "HED" to the end of the sentence is used as the task description text of the department. erefore, the emergency measures or emergency needs in the text can be extracted by selecting the features such as dependency relationships. Table 1 lists all the features used in the experiment.
By analyzing the characteristics of the text, this paper chose the word-based BIEO tagging system to complete the feature tagging work. After the manual feature labeling, this article designed a feature template to establish the relationship between each feature and obtain the optimal model of sequence marking. Finally, CRF++ is trained and tested based on the collected data.

Task Description.
e matching task is to compare the extracted emergency needs and emergency measures, which will have three possible results: in both the plan and comments, only in the plan, and only in the comments, as shown in Figure 3. e first is that emergency measures already contain emergency needs; the second indicates that emergency needs and emergency measures are independent of each other; the last means that the existing emergency measures cannot meet the emergency needs.
is paper mainly focuses on the third result, which judges the completeness of emergency measures based on the matching results.

Similarity Computation.
With the rapid development of deep learning technology, especially after the emergence of Word2Vec in 2013, the calculation of semantic text similarity has made breakthrough progress, and, therefore, the calculation method of semantic text similarity based on deep learning has gradually become the mainstream method in this field [21]. Currently, the classic similarity calculation method in deep learning technology is based on the framework of the Siamese network. Figure 4 shows the general process of the whole network [22].
One of the main characteristics of the Siamese network is the shared weights across subnetworks, which not only reduce the number of parameters but also reduce the tendency of overfitting. First of all, two sentences are input on the left and right, respectively, and the representation vector of the sentence to be tested through two identical embedding layers is obtained. After each sentence has been processed by an LSTM (Long Short-Term Memory) network, the distance between the two output vectors is then calculated by some distance measurement, as shown in equation (3). e distance calculation methods commonly used in text similarity calculation include Euclidean distance, Cosine distance, Manhattan distance, and Chebyshev distance [23]. In general, the smaller the distance is, the more similar the two texts are. e goal of training is to minimize the distance between similar sentence pairs in the embedding space and maximize the distance between different sentence pairs. Finally, the similarity score of the two texts to be matched is obtained.
where h T is the last hidden state of the model, representing the final representation of the sentence. us, h (a) T a is the representation of Sentence 1 and h (b) T b is the representation of Sentence 2.
As shown in Figure 5, although the LSTM sequentially updates a hidden state representation, these steps also depend on the memory unit containing four components (realvalued vectors): the memory state c t , the output gate o t that determines how the memory state affects other units, and an input (and forgetting) gate i t (f t ) that controls the memory content according to each new input and current state, as shown in Figure 5. e following is an update of LSTM parameterized by weight matrices

and U o and bias vectors
e parameter update formula of the gradient descent method is as follows: where η is the learning rate that determines the size of the moving pace; θ refers to a bunch of parameters, such as those mentioned above; L is the loss function. Specifically, each sentence (represented as a sequence of word vectors) x 1 , . . . , x T is passed to LSTM, and LSTM updates its hidden state at each sequence index through equations (4)-(11) and uses its final hidden state as a vector of each sentence. en the similarity between these vector representations is used as a predictor of semantic similarity, and, finally, the similarity score is obtained. If the score is greater than a certain threshold, they are considered to be similar; otherwise, they are not.

Experiment of Emergency Measure and Emergency
Need Extraction ere were two kinds of data used in this experiment: the first was the user comment text about COVID-19 prevention and control, and the second was the public health emergency plans of some provinces. e details of the data are shown in Table 2.
In the experiment, LTP was selected to carry out language processing tasks such as word segmentation, part-ofspeech tagging and dependency parsing on the text, and transforming the text into the data format required by CRF model. Table 3 gives a labeling instance.

Experimental Settings.
In this experiment, CRF++ 0.53 was used to implement the Conditional Random Field model, in which the model parameter values, such as -c-f, were set according to manual experience. e window size of words and parts of speech was 3, and the current window was used for other features. In order to establish the internal relationship between the above four features (word, PoS, head, and relationship), 18 different feature templates were designed by different combinations of the four features.

Experimental Results.
For the results of opinion target extraction, the experiment adopted accurate evaluation. at is, the extraction result is correct only when the extracted opinion target and the annotation answer match exactly. Precision, recall, and F1 value were used as the evaluation criteria.
e calculation formula is shown in formulas (13)- (15), TP (true positive samples) represents the number of labels correctly identified in the test set; FP (false positive samples) represents the number of misidentified labels; and FN (false negative samples) represents the number of labels that are not identified in the test set.
Precision indicates the proportion of predicted positive cases that are real positives. Recall is the proportion of actual positive cases that were correctly predicted. F1 is the harmonic mean of precision and recall.
e experimental results are shown in Table 4. It can be seen that the effect of the model is significantly improved by adding dependency features, which can effectively extract the emergency needs and emergency measures.

Experimental Data and Environment.
e experiment realized the text similarity calculation based on the Siamese network. Since there is no publicly available corpus, this paper manually sorted and marked 5165 pairs of sentences about emergency response extracted by the CRF model. e labeled samples used in training the Siamese network are sentence pairs (sentence 1, sentence 2) and their label is similar_score(0, 1). similar score � 0 means that the two sentences are not similar; otherwise they are similar. Python 3 was used in the experiment, and Keras, a deep learning framework based on TensorFlow, was selected for the construction of the Siamese network.

Experimental Design
(1) Word Embedding Learning. In the word embedding learning module, the paper used a 300-dimensional Word2Vec model pretrained on an external corpus and then mapped each word to a fixed-size vector. More than 30,000 pieces of data used in word vector training are from Chinese Wikipedia, and the training was performed after preprocessing steps such as word segmentation. e basic training settings are shown in Table 5. When an unregistered word appears, it was initialized to a 300-dimensional 0 vector.
(2) Parameter Setting. Adadelta optimizer [24] was selected as the optimization algorithm, and the formula is shown as follows: where Δθ t is the parameter update vector; g t is the gradient of the parameters at the t-th iteration; and RMS[g] t is the root mean square of the gradient at time t.
In addition, gradient clipping was carried out by a threshold value of 1.25. EarlyStopping was added to each layer to prevent overfitting. To solve the problem of sample category imbalance, class_weight was used to map the class index to the weight value, which was used to weight the loss function. e number of hidden nodes of LSTM was set to 50. e batch parameter size was set to 64 and the number of training rounds was set to 20. rough the selection method of the similarity threshold shown above, the similarity threshold was 0.5. e loss function is MSE (Mean Square Error), and the formula is as follows:

Experimental Result.
To evaluate the performance of different distance measurement methods on the task similarity prediction problem, this paper used the Matplotlib library to plot the accuracy of the model, as shown in Figures 6 and 7 . e final experiment adopted Manhattan distance, because it can be seen from Figures 6 and 7 that the cosine has a poor effect, while the Manhattan measurement effect is better. e reason may be that Cosine only calculates the angle between the two vectors, but Manhattan distance also saves the length information of the two vectors.

Mobile Information Systems
is experiment also selected three other models for comparison: benchmark method (calculating cosine similarity between average word vectors), Word Shift Distance, and Smooth Inverse Frequency. Pearson correlation coefficient (R) and Spearman correlation coefficient (P) were selected as the evaluation indexes of the model. e calculation formula is shown in formulas (19) and (20), where y ′ represents the average of the predicted similarity scores for all sentence pairs; and y represents the average of the true similarity scores for all sentence pairs. e rank difference d i represents the ranking difference between the predicted ranking of sample i and the real ranking.
where R is used to measure the correlation between the predicted label and the true label and P is used to measure the correlation between rank variables. e comparison of the results of each model is shown in Table 6. By comparing it to other models, it can be observed that the Siamese network used in this paper is better than the other three models in the evaluation index.

Case Study
In order to more intuitively verify the feasibility of the evaluation method for the emergency measures completeness proposed in this paper, the emergency plan for public health emergencies in province A was taken as an example to evaluate the completeness of emergency measures according to the above process method.
First of all, 24 emergency needs and 69 emergency measures are obtained through the CRF model. en, the similarity calculation model was used to match them and obtain the similarity score of each sentence pair. For example, the matching score of "ensuring price stability in closed managed areas" and "ensuring price stability in market supply in closed areas" was 0.97. e matching score of "set up report telephone to strengthen prevention and control" and "set up temporary traffic health quarantine at the entry and exit ports of traffic stations" was 0.101. ese matching scores are more consistent with manual judgment and are more objective to judge the completeness of emergency measures. erefore, emergency needs were matched, one by one, with all emergency measures in the plan. If all the similarity scores are less than 0.5, it can be determined that this emergency need is the missing emergency measure in the plan. us, this paper found that there are six emergency measures missing, as shown in Table 7.
Finally, according to formula (1), the incompleteness of emergency measures was calculated as follows: erefore, the incompleteness rate of the emergency measures of the plan is 8%; that is, the completeness reaches 92%.

Conclusion
A UGC-based method to evaluate the emergency measure completeness has been proposed in this paper. e method begins with collecting emergency plans and user comments about public health emergencies, followed by extracting emergency needs and emergency measures using a CRF model. e CRF model is trained to extract emergency needs and emergency measures by selecting features based on designed feature templates. A similarity calculation model was then established to calculate the similarity between the emergency needs and the emergency measures. Finally, the emergency measure completeness or incompleteness of the emergency plan was quantified according to a defined  Table 7: Emergency measures missing in emergency plan for public health emergencies in province A.
Serial number Emergency measures 1 Strengthen the management of the recovery of materials for pandemic prevention and control 2 Set up a reporting phone to strengthen prevention and control 3 Strengthen the safety services for the production enterprises of pandemic prevention and control materials 4 Hierarchical management and control gradually realize the management of healthy returning workers 5 Carry out familiarization drills and training for enterprises that resume work and resume production 6 Timely update emergency prevention and control guidelines 8 Mobile Information Systems formula. In order to verify the feasibility of the proposed evaluation method, province A was used as a case and the experimental results prove the effectiveness of the proposed evaluation method. is article regards the emergency measures taken by various departments as a set. For the emergency needs in the comments, the emergency measures that are missing in the set are found, instead of studying the emergency situations of specific departments; that is, the set of emergency department and the set of emergency measures are not corresponding to each other, so that it is impossible to locate the emergency measures that are lacking in specific departments. In the future, further studies can be carried out on specific departments to improve the responsibilities of each department. It can also conduct emotional orientation research on emergency measures existing in the plans and comments based on UGC data, so as to provide support for improving the effectiveness of emergency measures.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.