Next Article in Journal
Shooting Your Accuracy in the Foot? Examining the Short-Term Effect of Playing an Action or Strategy Video Game on Cognitive Control
Previous Article in Journal
Reducible Nonunited Type II Odontoid Fracture with Atlantoaxial Instability: Outcomes of Two Different Fixation Techniques
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Microblog Topic-Words Detection Model for Earthquake Emergency Responses Based on Information Classification Hierarchy

1
School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
2
Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China
3
College of Land Science and Technology, China Agricultural University, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2021, 18(15), 8000; https://doi.org/10.3390/ijerph18158000
Submission received: 4 June 2021 / Revised: 24 July 2021 / Accepted: 26 July 2021 / Published: 28 July 2021

Abstract

:
Social media data are constantly updated, numerous, and characteristically prominent. To quickly extract the needed information from the data to address earthquake emergencies, a topic-words detection model of earthquake emergency microblog messages is studied. First, a case analysis method is used to analyze microblog information after earthquake events. An earthquake emergency information classification hierarchy is constructed based on public demand. Then, subject sets of different granularities of earthquake emergency information classification are generated through the classification hierarchy. A detection model of new topic-words is studied to improve and perfect the sets of topic-words. Furthermore, the validity, timeliness, and completeness of the topic-words detection model are verified using 2201 messages obtained after the 2014 Ludian earthquake. The results show that the information acquisition time of the model is short. The validity of the whole set is 96.96%, and the average and maximum validity of single words are 78% and 100%, respectively. In the Ludian and Jiuzhaigou earthquake cases, new topic-words added to different earthquakes only reach single digits in validity. Therefore, the experiments show that the proposed model can quickly obtain effective and pertinent information after an earthquake, and the complete performance of the earthquake emergency information classification hierarchy can meet the needs of other earthquake emergencies.

1. Introduction

Since 1980, China has been among the top five countries most frequently affected by damaging earthquakes [1]. After an earthquake, the affected areas are usually chaotic [2]. Therefore, an instantaneous efficient emergency management is required to develop plans and operations aiming to decrease casualties and losses [3]. Incorrect and inappropriate emergency responses can cause greater losses than the disaster itself [4]. The scientific nature and timeliness of the earthquake emergency decision-making depends on the acquisition and management of the disaster information, the emergency rescue information, the supply and demand information in the earthquake emergency support, and the social public opinion information [5,6].
Social media information can supplement the seismic monitoring data in earthquake emergencies [7]. In previous studies, researchers found that social information played a more important role than traditional methods in disaster awareness and determination [8,9,10]. Nearly real-time disaster information can be obtained from social media platforms, such as Twitter and Weibo (the largest Chinese microblog sites). Integrating social media information into earthquake decision-making can increase efficiency. The decision-making based on this information that comes from the public can provide public benefit and improve the public’s ability to prevent and mitigate disasters [11].
The earthquake-related information contained in microblogs is interactive, collected in real-time, and socially relevant. In recent years, data mining using data extracted from Weibo has been an interesting research topic, since short blogs contain minimal information and can cover both large and sparse areas [12,13,14]. Many studies on social media data collection, extraction, and analysis have been conducted to meet the requirements of natural disaster management, including earthquakes, floods, and typhoons [8,9,10,15,16,17]. On the Internet, earthquake disaster information is complex, randomly expressed, quickly disseminated, and spread by diverse carriers [18]. The automatic information acquisition from the Internet is the first step in the organization and management of earthquake emergency information. This process can be divided into manual extraction, semiautomatic extraction, and fully automatic extraction [19]. Web crawler is an automatic network information acquisition technology that can process as many network information pages as possible in a short time frame, but many problems exist, such as miscellaneous information, large system resource requirements, and excessive time consumption [20]. Crawler technology has been proposed to address these problems. A crucial part of a topic crawler consists in describing the topic content, which can improve the information processing efficiency, in order to describe the type of earthquake emergency from different angles according to the category. Relevant topics have been discovered by several methods, such as VSM (Vector Space Model) [21], ULW-DMM (an extended method combining Dirichlet multinomial mixture and user-LDA topic model) [22] and other methods. To address the earthquake characteristics, temporal extraction rules, location trigger dictionaries, and attribute trigger dictionaries have been created [23]. Analysing earthquake-related social media information usually requires separating buildings, green plants, transportation systems, water sources, and other classes to evaluate losses [16]. The classification of earthquake emergency information is mainly based on the object that is reflected by the information [24], the demand for information reporting [25,26], and computer storage and use [18,27]. The classification of seismic information is based on the usage of information. According to these studies, classification is the first step in applying data analysis [28]. However, few studies have been reported from the perspective of information collection. Crawling the earthquake-related information with few words can not accurately reflect the subject needs. In a crawler strategy, earthquake information needs to be searched based on earthquake-related keywords [29]. Therefore, we need to perform the classification not only from the perspective of analysis, but also from the perspective of information classification. However, an exhaustive literature review shows that the classification and information extraction from the perspective of public demand in the earthquake emergency response process have not yet been reported.
This paper focuses on topic detection after earthquakes and uses a cross-validation method to construct an information classification system for earthquake emergencies based on Sina microblog data. A subject word detection model is built to determine the different granularities of the word set of the earthquake emergency information classification subject. Microblog data following an earthquake are then taken as an example to verify the validity, timeliness, and application value of the model. Finally, we expect the model to improve the efficiency of earthquake emergency information processing and achieve efficient organization and management.
The remainder of the paper is organized as follows: Section 2 presents the data sources and data pre-processing. Section 3 details the classification hierarchy of the earthquake emergency information construction. Section 4 details the topic-words detection model based on the classification hierarchy. Section 5 applies the model and the hierarchy. Finally, conclusions are given in Section 6.

2. Data Sources and Data Preprocessing

2.1. Data Sources

The data in this paper comprise two parts: earthquake cases containing epicenters, occurrence times, magnitudes, and other additional parameters, which are labelled by the province where the epicenter is located and the serial number of the text; and microblog messages published after an earthquake by the public. The data were crawled with the keyword “earthquake” (in Chinese) on Sina Weibo. Sina Weibo has the same functionality as Twitter, expressing what the public sees and thinks in a timely manner [4]. With the development of the network technology and the popularity of mobile terminals, a large number of messages show instantaneously after an earthquake. In this study, raw messages are obtained using the keywords so as to quickly obtain the targeted information from a large amount of information. These obtained microblog messages include user ID, release time, microblog content (including pictures or videos), publishing location, microblog address, equipment source, etc. Information on the earthquake cases and microblog messages is shown in Table 1.
As mentioned earlier, the study tackles a topic-words detection model based on earthquake information classification hierarchy. Thus, the research is divided into three parts: the research and establishment of the hierarchy, the construction and verification of the detection model, and the application of the model. These three research contents have a chronological relationship. Therefore, in the sequel, all the information is divided into three data sets: data set I is historical data used in the first research contents, data set II is used in the second research contents, and data set III is newly gathered seismic data, aiming at practically applying the model. First, 01YG, 02SC, and 03XJ microblog information (hereinafter referred to as data set I) are used for the research and construction of the earthquake emergency information classification hierarchy. Second, 04YN microblog information (hereinafter referred to as data set II) is used to verify the efficiency of the earthquake emergency keyword detection model. Third, 05SC microblog information (hereinafter referred to as data set III) is used to apply the model and the classification hierarchy.

2.2. Data Preprocessing

Data preprocessing includes preliminary screening, text format normalization, and feature word tagging. In the preliminary screening phase, the information that belongs to earthquake news and topics that are initiated by the public are filtered out. In the format normalization phase, some elements that make the word segmentation difficult are treated. For instance, the traditional Chinese is converted to simplified Chinese, emojis are converted to text, etc. In the feature word tagging phase, messages are segmented using the ICTCLAS Chinese lexical word segmentation system (http://ictclas.nlpir.org/, accessed on 10 September 2012) Microblog messages were divided into words or short phrases and tagged with nouns, verbs, or phrases that can better express the subjects [30]. Stop words were filtered out, while the left words or short phrases that have been marked, are referred to as feature words. The data preprocessing flowchart is illustrated in Figure 1.
In Sina Weibo messages, the sentences and words that are put between the ‘#’ symbol define the topic, which is equivalent to the use of hashtags on Twitter [31]. In this study, messages containing the symbol ‘#’ were filtered out. The principles of preliminary screening and cleaning are shown in Table 2.
The microblog message related to the earthquake is interactive, real-time, social, and contains a lot of hidden value. Although the text is short, it holds a large amount of data with poor standardization. Due to the public’s habits and other reasons, some traditional Chinese characters often appear in microblogs. This causes some difficulties in the word segmentation process. In this study, ChineseConverter.dll in Visual Studio International Pack, is used to make this conversion. On the other hand, emoticons are often used to express feelings more conveniently. Therefore, emojis are converted in the preprocessing phase using the corresponding relationship between emoji and Unicode, as well as Unicode and characters. After word segmentation, the data will be traversed and stop words will be filtered out according to the stop words list. The nouns, verbs, quantifiers, numerals, and time words are then reserved according to the part of speech tagging.

3. Classification Hierarchy of Earthquake Emergency Information Construction

3.1. Process of Establishing the Classification Hierarchy

According to the recent literature on earthquake information classification, this paper combines the basic needs of people in earthquake emergencies. People’s needs in earthquake emergencies consists in the basic life necessities, self-rescue, mutual-rescue, timely disaster information, psychological counseling, and other information about the earthquake. All this information need to be supported by the location information and time information. Therefore, the information is divided to location, time, disaster information, rescue, support, social opinion, etc. Considering the scientific aspects, applicability, and expansibility, the first edition of the earthquake emergency information classification hierarchy is constructed by feature words formed using data set I. The cross-validation method is then used to extract and classify the feature information. By repeatedly verifying the classification results, the threshold of the proportion for classification records is determined, while the classification hierarchy is corrected and reclassified according to the information classification results. K-fold cross validation is carried out by dividing the whole fitting data set into a training data set and a test data set. The fivefold cross-validation strategy can effectively reduce the computational cost in the modeling process, accelerate the sampling speed, and improve the modeling efficiency [32]. To ensure the scientific accuracy of the hierarchy, a fivefold cross-validation method is used. The establishment process of the earthquake emergency classification hierarchy is shown in Figure 2.

3.2. Earthquake Emergency Information Classification Hierarchy

The information classification highly affects the earthquake emergency system. Chinese researchers have made many efforts in this field. For instance, Guiwu Su classified the information into 17 categories [24], Dan Zhu classified the information into 9 categories in view of the application of a short message to report an earthquake disaster [33], and Man Dong classified the information from the command and decision-making of the emergency headquarters of the China Earthquake Administration [5]. In this study, by combining these classifications with the “Regulations on the Emergency Preplans for Destructive Earthquakes”, eight first-level indicators were obtained. Data set I was then divided with repeated verification and the percentage of the unclassifiable record was calculated. Having a percentage of unclassifiable records higher than 10% means that one tenth of the messages will not be classified. Therefore, the second-level indicators need to be corrected and the messages need to be repeatedly reclassified until the percentage becomes less than 10%, and the final version of emergency information classification hierarchy is constructed. According to the classification hierarchy establishment process, the classification hierarchy of the earthquake emergency information, including eight first-level indicators, 23 second-level indicators, and 29 third-level indicators, is finally constructed. The classification hierarchy has summarized the main categories of the earthquake emergency information, based on the microblog message. The second-level and third-level are completed from the perspective of establishing the scheduling relationship between the public’s demand and emergency services’ supply. All the indicators and the meanings of the lowest level indicators are given in Table 3.

4. Earthquake Emergency Information Hierarchy Topic-Words Detection Model

4.1. Topic-Words Detection Model Construction

In the data preprocessing phase, feature words are obtained by word segementation and stop words filtering based on microblog messages. The word frequency statistics are first involved before building the model. Afterwards, the feature words are upward clustered according to Table 3 to obtain more concise topic-words. The coarse granularity and fine granularity topic-word sets are then obtained. Figure 3 shows the data process used to detect the topic-words, the model construction includes three key steps: the feature words standardization, aggregation, and inter-category verification.

4.1.1. Aggregating Feature Words in the Same Category

Feature words aggregation is an upward aggregation process for words belonging to the same category under the earthquake emergency information classification hierarchy. Assuming that α and β are two feature words belonging to category C1, Qα and Qβ are, respectively, the word frequencies of feature words α and β, αi (i = 0 to n) and βj (j = 0 to m), are the words or units constituting the feature words α and β. The word aggregation is performed as in Equation (1). If two different feature words contain the same unit and their meanings do not highly deviate, the repetitive one should be filtered out, while the other one will be kept for the next aggregation. Kα-β is one of the results that are achieved using Equation (1). Aggregation verification is then performed with the next feature word γ in the same category C1. Downward circulation is carried out until all the feature words in C1 are treated. Finally, the feature word set C1′ is the result obtained after aggregating C1.

4.1.2. Checking between Categories

The third-level of the classification hierarchy of the earthquake emergency information is the lowest level. After the feature words under each third-level category are aggregated, it may appear that different categories contain the same feature words. Therefore, these feature words checked between the categories using Equation (2).
Q α < 20 Q β < 20 { = 1 , α x = β y { Y   and   α x = β y = K α β ,   let   α = β = K α β N , aggregates   the   next   word   γ   in   frequency   order = 0 ,   { Q α 20 , α   no   polymerization   process Q β 20 , β   no   polymerization   process                                                                           ( 0 x n , 0 y m )
θ x C 1 = θ y C 2 { Y , θ x C 1 = θ y C 2 = ω { ω   with   mark ,   recover   and   mark ,   then   return   to   the   set ω   without   mark ,   classify   ω   into   fuzzy   classification   N ,   end   the   checking     ( 0 x n , 0 y m )
In Equation (2), C1′ and C2′ are feature word sets of third-level classes in the earthquake emergency information classification hierarchy. θ i C 1 i = 0   to   n and θ j C 2 j = 0   to   m are the feature words of sets C1′ and C2′, respectively. After checking all the feature words between sets C1′ and C2′ based on Equation (2), the new sets C1˝ and C2˝ can be formed. The inter-category verification of other third-level classes is then continuously carried out until the interclass verification of all the classification feature word sets is completed.

4.1.3. Constructing Coarse-Grained and Fine-Grained Feature Word Sets

Information collection and information management are the two main applications of subject words. Efficient information collection needs to ensure data comprehensiveness, which requires a large granularity of subject words. Information management requires accurate information classification, accurate data, and thus detailed and comprehensive granularity of subject words. Therefore, this study divides the feature words into different granularities according to their characteristics.
In Section 3, the feature words are divided according to the lowest level of earthquake emergency information classification hierarchy, which is a fine-grained set. Therefore, to meet the requirement of information collection, this paper focuses on a construction method for coarse-grained feature word sets. All kinds of fine-grained subject word sets, under the first-level category, are first merged together to form a new subject word set. The words with particularly low theme relevance are then eliminated, while aggregation processing is carried out on the same first-level category using the method shown in Equation (1). The verification steps are described in detail as follows:
First, A1 and A2 are two feature word sets under the first-level classification of earthquake emergency information. ϑ i A 1 i = 0   t o   n and ϑ j A 2 j = 0   t o   m are the feature words of sets A1 and A2, respectively.
If   ϑ i A 1 = ϑ j A 2 = σ   0 i n , 0 j m , then respectively filter out ϑ i A 1 and ϑ j A 2 from sets A1 and A2 in order to obtain sets A1′ and A2′. Feature words σ are denoted as σ A 1 A 2 and classified into the fuzzy feature word sets. All the words in sets A1 and A2 need to perform the same verification according to these operations.
If   ϑ i A 1 ϑ j A 2   0 i n , 0 j m , this ends the verification between sets A1 and A2. The next round of inter-category verification is continued until the verification of all the first-level classification feature word sets is completed.

4.1.4. Fuzzy Feature Word Set Processing

The processing of the fuzzy classification feature word set follows the steps given in Table 4. The fuzzy classification feature word set is generated by inter-category checking (Equation (2)) and coarse–fine granularity feature word sets (the upper section). The sets need to be reclassified according to the definition of the classification indicator of the earthquake emergency information.

4.2. Model Validation

The validity P of the topic-word detection model is verified by comparing the timeliness and accuracy of information collection experiments without and with the classification topic-words. In the sequel, the information collection experiments without and with classified topic-words are referred to as ‘the former’ and ‘the latter’, respectively. The timeliness T and accuracy R are determined by two information collection experiments using Equations (3) and (4), whereas Tb and Rb are the time and accuracy for the former, and Ta and Ra are the time and accuracy for the latter. The effectiveness of the feature words is calculated using the number of all the information records r and the number of effective information records q, based on Equation (5). P is then calculated using Equation (6). When the timeliness and accuracy of the information collection are both higher than those of the former experiment, the proposed model is effective for actual application in earthquake emergencies.
T = T b T a
R = R a R b
R i = q i / r i × 100 %   ( i = a , b )
P = T × R

5. Case Analysis and Discussion

In the case analysis, information collection, information classification, and model verification tests are performed using the Python programming language.

5.1. Coarse-Grained and Fine-Grained Word Sets

Earthquake emergency information classification coarse-grained and fine-grained topic-word sets are formed by the proposed model, based on data set I. Part of the fine-grained and coarse-grained topic-word sets are shown in Table 5 and Table 6, respectively.
In the fine-grained topic-word sets, there are 140 words in the five levels of the first level: disaster investigation. In the coarse-grained topic-word sets, there are 93 words in the first level: disaster investigation. Various characteristics exist in the earthquake emergency, depending on the earthquake. Information collection according to fixed subject words may miss important information at the current stage, and this is not satisfactory to dynamic changes in the event [34]. Thus, the topic-word set should be updated in real time to meet the needs of the earthquake emergency information collection and classification. In addition, mining characteristic subject words for information collection helps to expand the dataset, to obtain more accurately captured earthquake information. This can support a reference for scientifically formulating earthquake emergency plans. Therefore, the topic-words validity will be tested from three aspects: the validity of information classification, the timeliness of information collection, and the completeness of the topic-word set.

5.2. Analysis of the Information Classification Validity

Information classification should be accurate and detailed. An information classification experiment is carried out using a fine-grained set and data set II to test the efficiency of the earthquake emergency information classification hierarchy. A record of microblog information often contains a variety of information categories. To collect comprehensive information for different categories, the study uses a multiple classification method, that is, if a record contains multiple categories of information, it will be divided into multiple simultaneous corresponding categories. Therefore, one message may be calculated more than one time in the validity records. The results of the experiment are shown in Table 7.
It can be seen in Table 7 that the proportion of disaster investigation—disaster situations, emergency support—traffic, location information, disaster investigation—abnormal phenomena and time information is more than 50%. This is the largest proportion. The second greatest proportion consists of the social public opinion, emergency rescue, and emergency support, while the disaster situation, location, time, and traffic information are the most concerned, followed by the emergency support, rescue, and social public opinion. By tracing back to the microblog data set, the earthquake location, time, disaster, and other circumstances are given the highest attention by the public, while the attention given to the emergency support and rescue is even higher. In addition, the news propaganda, and the social public opinion information, published by the major official media, also occupy a certain proportion. Using the set of classified topic-words as the standard for information classification is feasible. All the messages in data set II have been classified to these categories, in the classification hierarchy. This approach can implement the classification of earthquake emergency information and accurately perform information classification management.

5.3. Analysis of the Information Collection Timeliness

The coarse-grained subject word set is selected for the information collection experiment, while considering the data comprehensiveness and accuracy. Taking the most sensitive emergency rescue information in the earthquake emergency work as an example, the general topic crawler method b and the topic crawler method a, based on classified topic-words, are used to capture the messages from earthquakes that occurred a week later. Other conditions of the two comparative experiments are consistent. The Ludian earthquake was taken as an example for information collection. In method b, microblog messages were crawled by the “earthquake” keyword, while the message time is between 2014-08-03 00:00:00 and 2014-08-11 00:00:00, which includes the time at which the Ludian earthquake took place, as well as the seven days that followed. In method a, microblog messages were crawled by the first-level A5 words that are used in the coarse-grained word sets as keywords. The message validity is judged based on the earthquake rescue information. The experiment results are shown in Table 8. The daily number of microblog messages obtained by the two methods are shown in Figure 4. Earthquake emergencies are time-urgent and of great importance. Therefore, timeliness and accuracy are required for earthquake emergency information collection.
According to model (6), P is greater than 0, the time of information collection in the latter is less than that of the former, and the information validity is higher. Therefore, we can conclude that the proposed model is effective. More precisely, the time and effectiveness of the experiment without using classified topic-words was 2470 s and 12.77%, respectively. After using classified topic-words, the collecting time Ta and effectiveness Ra are 1913 s and 96.96%, respectively. Note that Ra is calculated by the number of effective information records q and the number of all information records r. Although the use of more keywords in method a is time consuming in the information collection phase, the keywords can focus on the collecting extents. In terms of timeliness, the latter experiment has a small advantage over the former experiment. The number of all the information records that are collected by method a is much smaller than those collected by method b. Using keywords can help to focus on the collected information. The latter experiment outperforms the former experiment in terms of accuracy. Finally, it can be concluded that the obtained results generally verify the effectiveness of the proposed model.

5.4. Analysis on the Validity of Information Collection Based on Topic-words

The previous experiment shows that the validity of a whole category is high. In this section, the validity of a single word is analyzed. The use of appropriate topic-words in collecting information is helpful to improve the effectiveness and accuracy of the subject word extraction [35]. The message and image analyses can provide effective support to the government and rescue organizations [36]. Therefore, a validity experiment is carried out using a single topic-word. For instance, the topic-word “evacuate” is used to crawl microblog messages. The crawler got 28 records, while 27 records represent the information about evacuation. Thus, the validity of the topic-word “evacuate” is 96%. The effective records of all 25 words are shown in Table 9.
It can be seen in Table 9 that the average validity of the 25 topic-words is 78%. The validity of several topic-words is over 80%, while some of them reach 100%. The information collection experiment shows that the overall cumulative validity of the set is 96.96%. Therefore, the emergency rescue topic-words of the coarse-grained set are effective.
However, some topic-words have low validity, such as “resume classes”, “victory”, “accident”, and “find” having a validity not exceeding 30%. The analysis shows that the total number of data records containing the words “resume classes”, “victory” and “accident” is less than 15. These words with less records also have less validity. In the Ludian earthquake, these topics do not reach such attention, and the messages that contain these words do not represent the same meanings. Therefore, these words can be removed from the word sets to meet the needs of the Ludian earthquake. However, for the comprehensive consideration of information acquisition, these words should not be removed from the word sets. In addition, the number of data records containing “find” is 100, but its validity is only 23%. This indicates that these words do not fit the corresponding earthquake, and thus we can determine whether to omit them or not. The experiment is carried out based on the 04YN, the Yunnan Ludian earthquake, and a coarse-trained rescue topic-word set in order to verify the effectiveness of the single words. The validity of other words can be tested using the same method. We can deduce, from the tested and evaluated results, that for each earthquake, the topic-word set should be slightly revised to meet the requirements of earthquake emergencies. In this case, more records with lower validity should be removed from the word sets to fit the Ludian earthquake. Moreover, topic-words having zero effective records, such as“resume classes”, should be removed. Finally, a small number of effective records with a high validity indicates that the public is giving less attention to the corresponding topics. The corresponding topic-words are then effective for collecting information, and they can represent the effectiveness of the word sets from another perspective.

5.5. Analysis of the Topic-Word Set Completeness

The completeness of the topic-word set is judged by the number of new words assigned with the earthquake. After performing the segmentation process using the ICTCLAS system, and the word frequency statistics based on data set II, the obtained top 20 words in word frequency order are shown in Table 10.
The word in the recommendation list will be further treated after topic-word detection. The new words that can accurately express the categories’ meanings will be added to the corresponding category.
A word with a specific seismic attribute will be added to a specific set to expand the data set. Words without actual meaning and that cannot accurately express the theme will be added to a secondary stop word set for filtering invalid data. An uncertain word will be added to the word set to be processed. It will be further processed considering other earthquakes. The top 20 words of the 04YN earthquake are processed and classified following this method. In the fine-grained word sets, the third-level earthquake situation has 13 words, the social mood-positive level has 57 words, and the non-emergency-support level has 7 words. After the treatment, 2 words will be separately added to the third-level. The results are shown in Table 11.
It can be seen from Table 11 that topic-word detection can extract new topic-words that express actual meanings in a short time. This approach can mine the seismic characteristic words as well as new unclassified words to detect subjects to focus on. Simultaneously, the words without actual meaning will be added to the stop word set to be further filtered out. According to the results, the number of newly added topic-words is 6. The new words can be added in a short time. Adding new words to the fine-grained word sets can improve and complete the word sets. As the data set continuously improves, topic-word collection can be effectively improved.

5.6. Hot Topic-Word Application

Topic-words can help the head of the emergency operations center in leading teams to efficiently coordinate emergency management responses [37] and reasonably allocate emergency resources [38]. In this study, we use the topic-word detection model to discover hot topic-words that would supply data support to the heads of the emergency operations center and teams.
On 8 August 2017, Jiuzhaigou earthquake of magnitude 7.0, hit Sichuan Province in China. Its focal depth was 20 km. The earthquake case in this paper is called 05SC. This earthquake affected more than 0.17 million people, while most people in the disaster area, and out of it, wrote what they saw, heard, and felt on microblogs. Microblog message volume per hour is shown in Figure 5.
We crawled the messages released from earthquakes that occurred 72 h after 05SC based on the Sina platform. The coarse-grained word sets are used to collect the earthquake emergency information. There were 16,166 related messages in total. By detecting the topic-words from the microblog information of 05SC, the topic-word cloud and the high-frequency words are shown in Figure 6.
In the word cloud, words with large font sizes and centered positions are the primary focus. The events that are not in the basic classification hierarchy can also be found in time. “Slight injury” (In Chinese “轻伤”) is the word having the highest frequency (1444). Some words that indicate rescue teams and organizations also appeared with a higher word frequency, such as “rescue teams”, “policeman”, “firefighter”, and “soldiers”. Simultaneously, more words with a positive social mood, such as “emotional stability”, and “in good order”, can be found. The word cloud shows the public’s sensitivity and observations after the earthquake. In the 05SC earthquake, the earthquake emergency measures and plans were improved due to the experience brought from previous major earthquakes.
Seventy-two hours after the Jiuzhaigou earthquake occurred, the words “food”, “tent”, “medical materials”, and “communication equipment” also had high frequency. These major rescue resources are usually deployed to the hardest hit areas following a major earthquake [39], which may lead to missing or delaying the implementation in some important areas. The topic-words frequency can help the earthquake emergency decision-making department in quickly focusing on the public needs and formulating a scientific emergency plan.
According to the frequency of the hot topic-words, the top 10 words are all in the word sets. Only “rescue car” and “sniffer dog” do not exist in the fine-grained word sets, in the top 20 words. These two words can be added to the word sets, and they can be classified to the emergency support level. Moreover, hot topic-words in the Jiuzhaigou earthquake demonstrate that the classification hierarchy shows a high completeness. Thus, there is no need to add new categories. Only two words need to be added to the existing word sets. Finally, the word sets generated according to the model can also be applicable to the earthquake.

6. Conclusions

Aiming to solve the problem of time-consuming information collection and a large amount of information processing, during an earthquake emergency response, this study classifies and organizes the information according to the actual needs of the earthquake emergency responses. The paper then constructs an earthquake emergency information classification hierarchy that includes 8 first-level and 29 third-level indicators. Based on the classification hierarchy of microblog data and timely microblog earthquake emergency information, the topic-word detection model is proposed. Afterwards, coarse-grained and fine-grained topic-words are built.
Taking the M6.5 Ludian earthquake of 3 August 2014 and the M7.0 Jiuzhaigou earthquake of 8 August 2017 as examples, only single-digit new words need to be added to the existing word set. The experiment shows that the classification hierarchy and the topic-word set constructed in this paper are relatively complete.
The rapid acquisition of earthquake information after an earthquake occurs is the key to earthquake emergency rescue [40]. The proposed method was compared with a method that only takes “earthquake” as a key word. The results show that the proposed method is faster than the former. It also leads to higher collected information effectiveness. This experiment verifies the effectiveness of the topic-word detection model. By applying the research model to the Lushan earthquake, the high-frequency topic-words after the earthquake can be obtained. This can provide data support for specific earthquake emergency rescue efforts. However, different words can have the same meaning. More precisely, the public can use different words to express the same senses on social platforms. For instance, the word “trapped” (in Chinese, “受困” and “被困”) represents a case where people cannot exit or escape, while words like “food” and “eat” do not have the same words or units, but the meaning is the same. Although in the aggregation phase, these words are considered by the same unit and semantic, this operation is manual. However, a thesaurus can be constructed to make these operations programmed. In a thesaurus construction, all the operations should be included to ensure the integrity of the topic-word set. Social media data can also be used to extract disaster-related tweets for earthquake emergency relief services [36]. However, in word frequency statistics, they need to be unified. The word semantics are key research topics in the application of subject words. In future work, we aim to study and analyze the word semantics. A synonym dictionary will then be constructed according to the characteristics of the earthquake management to highlight the words belonging to the same semantics, as much as possible in the statistics of the top words. This will be helpful in reducing the mistakes caused by neglecting different expressions. Deep research on the word semantics can also identify the most needed resources in the emergency phase to clearly and concretely guide emergency work.

Author Contributions

Conceptualization and methodology, X.S., X.Q., F.C. and X.Z.; investigation and data curation, X.S. and X.Q.; writing—original draft preparation, X.S, X.Q. and F.C.; writing—review and editing, X.S., S.M., J.S. and F.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2018YFC1508901) and the Fundamental Research Funds for the Central Universities (NO. BLX2013034).

Institutional Review Board Statement

Not Applicable for studies not involving humans or animals.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data, models, and code generated or used during the study appear in the submitted article.

Acknowledgments

This work was supported by the National Key Research and Development Program of China (No. 2018YFC1508901) and the Fundamental Research Funds for the Central Universities (NO. BLX2013034). All authors highly appreciate Sina Weibo for providing us with the microblog messages of earthquake cases. The authors also wish to thank Haoqing Shen at Beijing Forestry University for the word cloud suggestion and application.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Centre for Research on the Epidemiology of Disasters (CRED). EM-DAT, The International Disasterdatabase. Available online: https://www.emdat.be/ (accessed on 1 April 2020).
  2. Lu, X.; Cheng, Q.; Xu, Z.; Xu, Y.; Sun, C. Real-Time city-scale time-history analysis and its application in resilience-oriented earthquake emergency responses. Appl. Sci. 2019, 9, 3497. [Google Scholar] [CrossRef] [Green Version]
  3. Xu, Z.; Zhang, H.; Sugumaran, V.; Choo, K.-K.R.; Mei, L.; Zhu, Y. Participatory sensing-based semantic and spatial analysis of urban emergency events using mobile social media. EURASIP J. Wirel. Commun. Netw. 2016, 2016, 1–9. [Google Scholar] [CrossRef]
  4. Xing, Z.; Su, X.; Liu, J.; Su, W.; Zhang, X. Spatiotemporal change analysis of earthquake emergency information based on Microblog Data: A case study of the “8.8” Jiuzhaigou earthquake. ISPRS Int. J. Geo-Inf. 2019, 8, 359. [Google Scholar] [CrossRef] [Green Version]
  5. Dong, M.; Yang, T. Discussion of earthquake emergency disaster information classification. Technol. Earthq. Disaster Prev. 2014, 4, 937–943. [Google Scholar]
  6. Xia, C.; Nie, G.; Fan, X.; Zhou, X.; Pang, X. Research on the application of mobile phone location signal data in earthquake emergency work: A case study of Jiuzhaigou earthquake. PLoS ONE 2019, 14, e0215361. [Google Scholar]
  7. Jung, J.; Moro, M. Multi-level functionality of social media in the aftermath of the great east Japan earthquake. Disasters 2014, 38, 123–143. [Google Scholar] [CrossRef]
  8. Yates, D.; Paquette, S. Emergency knowledge management and Social Media technologies: A case study of the 2010 Haitian earthquake. Int. J. Inf. Manag. 2011, 31, 6–13. [Google Scholar] [CrossRef]
  9. Camponovo, M.E.; Freundschuh, S.M. Assessing uncertainty in VGI for emergency response. Cartogr. Geogr. Inf. Sci. 2014, 41, 440–455. [Google Scholar] [CrossRef]
  10. Liu, S. Crisis crowdsourcing framework: Designing strategic configurations of crowdsourcing for the emergency management domain. Comput. Supported Coop. Work. (CSCW) 2014, 23, 389–443. [Google Scholar] [CrossRef]
  11. Sui, S.; Elwood, S.; Goodchild, M. Crowdsourcing Geographic Knowledge; Springer: Berlin/Heidelberg, Germany, 2013; pp. 15–120. [Google Scholar]
  12. Ren, F.; Zhang, Q. An emotion expression extraction method for Chinese microblog sentences. IEEE Access 2020, 8, 69244–69255. [Google Scholar] [CrossRef]
  13. Dong, R.; Li, L.; Zhang, Q.; Cai, G. Information diffusion on social media during natural disasters. IEEE Trans. Comput. Soc. Systems. 2018, 5, 265–276. [Google Scholar] [CrossRef]
  14. Su, X.; Zhang, X.; Hu, C.; Zou, Z.; Qiu, X. Research on the extraction of earthquake’s hot topic-words from microblog based on improved TF-PDF algorithm. Geogr. Geo-Inf. Sci. 2018, 34, 90–95. [Google Scholar]
  15. Zhao, Q.; Chen, Z.; Liu, C.; Luo, N. Extracting and classifying typhoon disaster information based on Volunteered Geographic Information from Chinese Sina Microblog. Concurr. Comput. Pract. Exp. 2018, 31, e4910. [Google Scholar] [CrossRef]
  16. Yu, J.; Zhao, Q.; Chin, C. Extracting typhoon disaster information from VGI based on machine learning. J. Mar. Sci. Eng. 2019, 7, 318. [Google Scholar] [CrossRef] [Green Version]
  17. Haworth, B. Implications of Volunteered Geographic Information for disaster management and GIScience: A more complex world of Volunteered Geography. Ann. Am. Assoc. Geogr. 2017, 108, 226–240. [Google Scholar] [CrossRef]
  18. Zhang, F.; He, H.; Lv, J.; Deng, S.; Bai, F.; Dong, X. Classification and coding of the earthquake-disaster information based on the internet and their preliminary application. J. Seismol. Res. 2016, 39, 664–672. [Google Scholar]
  19. Ao, J.; Zhang, P.; Cao, Y. Estimating the locations of emergency events from Twitter streams. Procedia Comput. Sci. 2014, 31, 731–739. [Google Scholar] [CrossRef] [Green Version]
  20. Fang, S.; Li, L.; Zhang, X. The research of topic crawler for Macro-anomalies of earthquake. Technol. Earthq. Disaster Prev. 2013, 8, 475–480. [Google Scholar]
  21. Li, Q.; Wei, J.; Hai, Y. Microblog hot topics detection based on VSM and HMBTM model fusion. IEEE Access 2019, 7, 120273–120281. [Google Scholar]
  22. Yu, J.; Qiu, L. ULW-DMM: An effective topic modeling method for Microblog short text. IEEE Access 2019, 7, 884–893. [Google Scholar] [CrossRef]
  23. Han, X.; Wang, J. Earthquake information extraction and comparison from different sources based on web text. ISPRS Int. J. Geo-Inf. 2019, 8, 252. [Google Scholar] [CrossRef] [Green Version]
  24. Su, G.; Nie, G.; Gao, J. The characteritics, classifications and the functions of the information for earthquake emergency response. Earthquake 2003, 23, 27–35. [Google Scholar]
  25. Wang, Y.; Zhu, Y.; Su, Q. Ethnic groups differences in domestic recovery after the catastrophe: A case study of the 2008 magnitude 7.9 earthquake in China. Int. J. Environ. Res. Public Health 2017, 14, 590. [Google Scholar] [CrossRef]
  26. Bai, X.; Li, Y.; Chen, J.; Dai, Y.; Cao, K.; Cao, Y.; Zhao, H.; Gong, Q. Research on earthquake spot emergency response information classification. J. Seismol. Res. 2010, 33, 111–118. [Google Scholar]
  27. Jiao, C. Research on Earthquake Disaster Acquisition and Information Classification. Master’s Thesis, Chengdu University of Technology, Chengdu, Sichuan, 2011. [Google Scholar]
  28. Bai, X.; Liu, X.; Lu, S.; Zhang, X.; Su, W.; Su, X.; Li, L. SEPM: Rapid seism emergency information processing based on social media. Nat. Hazards 2020, 104, 659–679. [Google Scholar] [CrossRef]
  29. Cheng, Y.; Liao, W.; Cheng, G. Strategy of focused crawler with word embedding clustering weighted in Shark-Search algorithm. Comput. Digit. Eng. 2018, 46, 144–148. [Google Scholar]
  30. Chang, P.; Ma, H. Efficient short texts keyword extraction method analysis. Comput. Eng. Appl. 2011, 47, 126–128. [Google Scholar]
  31. Gao, Q.; Abel, F.; Houben, G.F.; Yu, Y. A comparative study of users’ microblogging behavior on Sina Weibo and Twitter. In Proceedings of the 20th International Conference on User Modeling, Adaptation, and Personalization, Montreal, QC, Canada, 16–20 July 2012; pp. 88–101. [Google Scholar]
  32. Li, Z.; Peng, S.; Wang, T. A sequential sampling method of surrogate model based on k-fold cross validation. Chin. J. Comput. Mech. 2021, 38, 1–8. [Google Scholar]
  33. Zhu, D.; Xu, J. SMS-based spatio-temporal information collection and management of earthquake disaster. Sci. Surv. Mapp. 2011, 36, 172–174. [Google Scholar]
  34. Lin, H. Research of Weibo Text Clustering Algorithm Based on K.-Means. Master’s Thesis, Hainan University, Haikou, China, 2016. [Google Scholar]
  35. Onan, A.; Korukoğlu, S.; Bulut, H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst. Appl. 2016, 57, 232–247. [Google Scholar] [CrossRef]
  36. Gao, W.; Li, L.; Zhu, X.; Wang, Y. Detecting disaster-related tweets via multi-modal adversarial neural network. IEEE Multimed. 2020, 4, 28–37. [Google Scholar] [CrossRef]
  37. Huggins, T.J.; Prasanna, R. Information technologies supporting emergency management controllers in New Zealand. Sustainability 2020, 12, 3716. [Google Scholar] [CrossRef]
  38. Avvenuti, M.; Cresci, S.; Vigna, F.D.; Tesconi, M. Impromptu crisis mapping to prioritize emergency response. Computer 2016, 49, 28–37. [Google Scholar] [CrossRef] [Green Version]
  39. Liu, X.; Tang, B.; Yang, H.; Liu, Y.; Chen, X.; Zhang, L. The technical efficiency of earthquake medical rapid response teams following disasters: The case of the 2010 Yushu earthquake in China. Int. J. Environ. Res. Public Health 2015, 12, 4991. [Google Scholar] [CrossRef] [Green Version]
  40. Xu, J.; Nie, G.; Liu, W.; Han, Y. Multiple and heterogeneous earthquake disaster information classification and code. J. Catastrophology 2010, 25, 286–290. [Google Scholar]
Figure 1. Data preprocessing flow chart.
Figure 1. Data preprocessing flow chart.
Ijerph 18 08000 g001
Figure 2. Construction process of the earthquake emergency information classification hierarchy.
Figure 2. Construction process of the earthquake emergency information classification hierarchy.
Ijerph 18 08000 g002
Figure 3. Data process used to detect the topic-words.
Figure 3. Data process used to detect the topic-words.
Ijerph 18 08000 g003
Figure 4. Daily amount of microblog messages; (a) used method a, (b) used method b.
Figure 4. Daily amount of microblog messages; (a) used method a, (b) used method b.
Ijerph 18 08000 g004
Figure 5. Message volume per hour of 05SC earthquake.
Figure 5. Message volume per hour of 05SC earthquake.
Ijerph 18 08000 g005
Figure 6. Word cloud of 05SC earthquake. (a) In Chinese. (b) In English.
Figure 6. Word cloud of 05SC earthquake. (a) In Chinese. (b) In English.
Ijerph 18 08000 g006
Table 1. Earthquake cases and microblog information table.
Table 1. Earthquake cases and microblog information table.
Earthquake CasesEpicenter
Location
Date of
Occurrence
MagnitudeFocal Depth (KM)Start Time aEnd Time aMessages b
01YGYunnan-Guizhou border2012-09-075.7142012-09-07 11:192012-09-14 11:191515
02SCLushan, Sichuan2013-04-207.0132013-04-20 08:022013-05-20 08:0283,456
03XJHetian, Xinjiang2014-02-127.3122014-02-12 17:192014-02-19 17:19176
04YNLudian, Yunnan2014-08-036.5122014-08-03 16:302014-08-10 16:302201
05SCJiuzhaigou, Sichuan2017-08-087.0202017-08-08 21:192017-08-11 21:1916,166
a The information is crawled, for each earthquake, by setting “earthquake” as the topic-word, the content of “start time” as the start time, and the content of “end time” as the end time; b The column “Messages” denotes the number of messages from the start time to the end time.
Table 2. Cleaning principles and processing methods for microblog information.
Table 2. Cleaning principles and processing methods for microblog information.
ObjectsOperationReason
the messages contain ‘[]’filter outthe symbol ‘[]’ is used to identify a news headline in microblog
the messages contain ‘#’filter outmicroblog’s title is usually placed between two ‘#’s to start a topic and attract users
characters such as line breaks and spacesreplaceno practical meaning and interfere with word segmentation
information such as ’Pupil Earthquake’deletesimilar but unrelated message
short text (less than 50 characters)deleteno practical meaning for emergency work because of its length
duplicate informationfilter outdifferent accounts reproduce exactly the same repeated information
Table 3. Earthquake emergency information classification hierarchy.
Table 3. Earthquake emergency information classification hierarchy.
First-Level (A)Second-Level (B)Third-Level (C)Meanings of the Third-Level
location information longitude, latitude, and region
time informationpublishing time message publication time
event time time described in the message
disaster
investigation
disaster
situation
earthquake
situation
description information such as magnitude and epicenter
sense of earthquakefeelings during an earthquake
casualtiescasualty information
destructiondamage caused by the earthquake
abnormal phenomenonabnormal phenomenon information accompanied by the earthquake
social public opinionnews propagandanews notification classnews and notice information during earthquake
news propaganda classeducation and public information of earthquake emergency knowledge
social moodpositiveinformation that contains positive opinion
mediumpertinent emotional information published by people during earthquake emergency
negativenegative remarks during the earthquake
supervisory information information on supervision, reporting and suggestions of relevant measures
emergency rescueemergency information about people in danger, including requests for and implementation of assistance
disaster relief relevant information on the treatment of secondary disasters and accompanying disasters
emergency rescue situation information about disaster relief progress and decision-making
emergency supportother material material information that is difficult to classify
warmclothesinformation about clothes
cotton quiltinformation of materials used against cold
livingaccommodationsinformation about materials related to accommodation, such as dispatching and accommodation locations
sheltersinformation about public shelters, resettlement sites, etc.
traffictraffic informationroad conditions, traffic control situations, dispatchable vehicle information, etc.
foodready to eatinformation about food that is difficult to transport
storable foodinformation about the food that is able to being stored and carried
medicalmedical staffinformation about professional doctors and nurses
medical
equipment
information about medical and aid devices
medicinethe status of medicines used for rescue
blood bankinformation related to rescue blood bank, blood donations, etc.
injury situationinjury location, injury cause, etc.
epidemic preventioninformation related to disease protection
communityelectricityinformation about electricity
communicationinformation about the community
safety and security information about public security issues and security work
rescue teamprofessional teamfirefighters, etc.
trainedinformation about NGOs, voluntary groups
non-professionalunorganized and spontaneous rescue team information
seeksearchinginformation for finding people and objects
servicesprovides information about finding services
psychological information about psychological counselling and abnormal psychology
non-emergencysupport non-emergency donation or supply information
other other non-emergency information
comprehensive classes fuzzy classification but of great significance to the information acquisition
Table 4. The proposed method for fuzzy classification feature word set processing.
Table 4. The proposed method for fuzzy classification feature word set processing.
Steps of the Method
1: μ is the marked feature word in fuzzy classification;
E is one of the categories under the earthquake emergency classification hierarchy; Qμ is the word frequency of μ, and δ is the threshold for the coarse-grained and fine-grained sets;
2: if μ E , then classify μ into category E;
3: trace the word frequency of the words retained in the fuzzy classification,
if Q μ δ ,
then classify μ to the comprehensive category of fine-gained feature word sets;
otherwise classify μ to the comprehensive category of the coarse-gained feature word set;
4: finally, coarse-grained and fine-grained feature word sets are formed.
Table 5. Part of the fine-grained topic-word sets.
Table 5. Part of the fine-grained topic-word sets.
Coding for CategoryCategory NameTopic-Words
A3B01C01earthquake situationmagnitude, earthquake, strong earthquake, record, aftershock, disaster situation, disaster area, photo, earthquake situation, epicentre, disaster- stricken area
A3B01C02sense of earthquaketremble, degree, duration, sustained, fall, move, feel, wake up, shake, violent, shake, sharp, obvious, dazzle, crash, break, strong, intense, sound wave, sound, dizziness, fright, rattle, heartbeat, shock, silly, suffocation
A3B01C03casualtiesreports, deaths, destruction, population, numbers, killed, casualties, wounded, injured, lives, missing, died, damaged, dead, sacrifice, crush, earthquake death
A3B01C04destructionpulling down, scrapping, collapse, tragic situation, secondary disaster, subsidence, falling, breaking, ruins, landslide, destruction, in danger, boulder, collapse, lycopodium, crack, leakage of rain, falling objects, debris flow, broken, wall, gap, mountain, stone, damaged, rubble, tiles, dangerous houses, dangerous buildings, barrier lakes, smashing, serious disaster, earthquake damage, shatter, shock-off, destroy severely
A3B02C00abnormal phenomenawind, dog, flood, cooling, barking, thunder, visibility, temperature, precursor, frogs, lightning, moisture, cave-in, the sky, weather, temperature difference, fog, abnormal signs, fish, rain, cloud
Table 6. Part of the coarse-grained topic-word sets.
Table 6. Part of the coarse-grained topic-word sets.
IDCategory NameTopic-Words
A3disaster
investigation
magnitude, pulling down, scrapping, reports, collapse, tragic situation, tremble, degree, duration, sustained, fall, subsidence, move, tremble, breaking, ruins, wind, feel, dog, flood, landslide, wake up, shake, destruction, cooling, boulder, violent, barking, lycopodium, thunder, deaths, sharp, crack, obvious, dazzle, visibility, debris flow, crash, broken, temperature, precursor, strong, wall, frogs, gap, population, numbers, killed, mountain, lightning, lives, sound wave, sound, missing, moisture, stone, record, died, damaged, dead, the sky, weather, dizziness, rubble, tiles, dangerous, temperature difference, fog, sacrifice, subsidence, rattle, barrier lakes, abnormal signs, fish, cloud, smashing, photo, suffocation, destroy severely
Table 7. Results of the information classification validity experiment.
Table 7. Results of the information classification validity experiment.
CategoriesRecordsPercentage
A1 location information187085%
A2 time information126758%
A3 disaster investigation—B1 disaster situation2201100%
A3 disaster investigation—B2 abnormal phenomenon134761%
A4 social public opinion—B1 news propaganda60427%
A4 social public opinion—B2 social mood79136%
A4 social public opinion—B3 supervisory information20%
A5 emergency rescue—B1 emergency57326%
A5 emergency rescue—B2 disaster relief191%
A5 emergency rescue—B3 emergency rescue situation86639%
A6 emergency support—B1 other material1939%
A6—B2 warm472%
A6—B3 living75634%
A6—B4 traffic192487%
A6—B5 food48422%
A6—B6 medical29313%
A6—B7 community44420%
A6—B8 safety and security40%
A6—B9 rescue team54025%
A6—B10 seek211%
A6—B11 psychological884%
A7 non-emergency—B1 support1276%
A7 non-emergency—B2 other100%
A8 comprehensive classes29413%
Table 8. Results of the information collection timeliness experiment.
Table 8. Results of the information collection timeliness experiment.
MethodKeywordsTime (s)r aq b
bearthquake24702201281
atrapped, deploy, evacuate, standby, in place, resume classes, recover, urgent, investigate, clear, rescue, repair, victory, accident, search, advance, orderly, find, support, execute, helicopter, command post1913230223
ar represents the number of all the information records; b q represents the number of the effective information records.
Table 9. Results of the single topic-word validity experiment.
Table 9. Results of the single topic-word validity experiment.
Topic-WordAll the RecordsEffective RecordsValidity
trapped1818100%
deploy7686%
evacuate282796%
standby1818100%
in place11327%
resume classes500%
recover251040%
urgent919099%
investigate44100%
clear (in Chinese “排危”)11100%
clear (in Chinese “抢通”)2727100%
rescue4646100%
command post5050100%
repair2121100%
victory15427%
accident10220%
trapped6583%
evacuate3267%
search5480%
advance6583%
orderly292897%
find1102523%
support8787100%
execute44100%
helicopter272489%
total65451178%
Table 10. Top 20 words in the recommended list of the Ludian earthquake.
Table 10. Top 20 words in the recommended list of the Ludian earthquake.
RankWordPart of SpeechFrequency
1LudianNS1298
2tapN114
3degreeQ106
4the Communist Youth LeagueNT95
5the deadN78
6determinationV74
7formalAD73
8depthNS70
9huodeN59
10nameN58
11restV58
12prayV57
13publishV52
14fundN50
15donationV49
16siblingN46
17zhaoyangN45
18handN45
19helpV44
20centuryN43
Table 11. Decision of new topic-words in the Ludian earthquake.
Table 11. Decision of new topic-words in the Ludian earthquake.
TreatmentNew Topic-Words and Their Determination
added to the fine-grained word setsdetermination (classify to disaster investigation-earthquake situation), depth (classify to disaster investigation-earthquake situation), rest (classify to social public opinion-social mood-positive), pray (classify to social public opinion-social mood-positive), fund (classify to non-emergency-support)
present
added to a specific setLudian
added to the secondary stop word setdegree, formal, name, publish, sibling, hand, help, century
to be processedtap, the Communist Youth League, the dead, huode, zhaoyang
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Su, X.; Ma, S.; Qiu, X.; Shi, J.; Zhang, X.; Chen, F. Microblog Topic-Words Detection Model for Earthquake Emergency Responses Based on Information Classification Hierarchy. Int. J. Environ. Res. Public Health 2021, 18, 8000. https://doi.org/10.3390/ijerph18158000

AMA Style

Su X, Ma S, Qiu X, Shi J, Zhang X, Chen F. Microblog Topic-Words Detection Model for Earthquake Emergency Responses Based on Information Classification Hierarchy. International Journal of Environmental Research and Public Health. 2021; 18(15):8000. https://doi.org/10.3390/ijerph18158000

Chicago/Turabian Style

Su, Xiaohui, Shurui Ma, Xiaokang Qiu, Jiabin Shi, Xiaodong Zhang, and Feixiang Chen. 2021. "Microblog Topic-Words Detection Model for Earthquake Emergency Responses Based on Information Classification Hierarchy" International Journal of Environmental Research and Public Health 18, no. 15: 8000. https://doi.org/10.3390/ijerph18158000

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop