Mining Twitter Data to Understand the Human Sentiment on Hurricane Florence

ORIGINAL ARTICLE Introduction: Most studies have analyzed how natural disasters exert a severe impact on the regional level in the disaster period based on quantitative methods. This study aimed to highlight how Hurricane Florence exerts an impact on human life and societies across US states in a multitude of periods by employing both qualitative and quantitative methods. Method: This study developed a new app called ―Twitgis,‖ collected 1,433,032 tweets, and employed 57,842 data filtered for Hurricane Florence between 08-21-2018 and 10-01-2018. Results: First, this study showed that the spatial patterns of tweets are differentiated by periods. For example, the spatial patterns of tweets are more concentrated in the south region in the pre-hurricane period, the spatial patterns of tweets are heavily concentrated in the Southeast region in the hurricane period, and the spatial patterns of tweets are more located in the Northeast region in the post-hurricane period. Second, the most retweeted tweet shows that human sentiment plays an important role in disaster information more than news of the hurricane in online communication. The first ranked tweet is about two times higher than the sum of the retweet numbers between the top two and top 20. Third, this study found that people actively utilize Twitter to share a lot of emotions, opinions, information, and so on for Hurricane Florence. For instance, about one-fifth of tweets in the sentiment analysis are emotions for the hurricane event. Conclusion: Governments and policymakers should monitor Twitter data to understand the effects of natural disasters on people and the human environment.


Introduction
atural disasters are some of the most serious catastrophes across the world. After the 2000s, there have been a total of 7,344 natural disasters worldwide as of 2019 (1). More than 8 million deaths and seven trillion US dollars economic have been damaged via natural disasters since the start of the 20th century (2). Therefore, coping with natural disasters has been one of the main tasks for governments and urban planners (3)(4)(5)(6)(7)(8).
Scholars have tried to understand how natural disasters play an important role in human life (9)(10)(11)(12)(13)(14)(15). For instance, the average of natural disasters leads to a fall in growth of 1% of gross domestic product (GDP) upon impact, and a cumulative loss to GDP of 2.6% (16). Natural disasters exert a negative impact on per capita GDP by up to 6.8% on impact (17).
Many authors have attempted to analyze the effects of natural disasters on human behavior and environment based on Social Network Systems (SNS) with the development of information technology after the 2010s (18)(19)(20)(21). For example, individuals actively engage to share information, communicate with each other, and update information based on the 2017 flood in Louisiana (22). Active players and their effectiveness in Twitter play an important role in disseminating critical information during the 2010-2011 Australian Floods (23).
However, most studies have analyzed how natural disasters exert a severe impact on a regional level (22)(23)(24)(25)(26). In other words, prior studies have barely highlighted the relationship between natural disasters and human behavior across regions at the national level. Analyzing the impact of natural disasters across regions is important since natural disasters play a crucial role in not only people in the affected region, but also those in the same country. This is because people in other regions have their friends, family, cousins, or are worried about others, animal life, economic loss, and their country. For example, people who do not live geographically proximate to natural disasters express their concern about the disasters by analyzing Twitter logs from the 2010 Philippines typhoon, the 2011 Brazil flood, and the 2011 Japan earthquake (27).
Also, prior studies have heavily focused on quantitative analyses since Twitter provides big data from numerous accounts (28)(29)(30). However, qualitative analyses of tweets also can play an important role in developing natural disaster policies since they allow scholars to fully understand complex and nuanced contents in the tweets (31).
Therefore, this study highlights how natural disasters play a significant role in human behavior and environment across the US states according to a multitude of periods (the pre-hurricane period, the hurricane period, and the post-hurricane period) based on qualitative and quantitative analyses of Twitter data for Hurricane Florence in 2018. Hurricane Florence is a long-lived category 4 hurricane with the maximum sustained winds of 130 mph (215km/h), which made landfall along the southeastern US (see Figure 1). Florence caused devastating freshwater flooding across the southeastern US, resulting in 22 direct deaths and 30 indirect fatalities (7).

Materials and Methods
This study utilized Twitter to explore the human sentiment of Hurricane Florence across the US states in different periods. Twitter is a real-time microblogging platform for users who post and interact with messages known as tweets. According to official Twitter statistics, as of 2018, 326 million people are active on Twitter, and 67 million in the US. Twitter provides a Twitter Application Programming Interface (API), which is an interface program between a client and a server to build the client's software. Twitter Official API offers three tiers of search APIs: Standard, Premium, and Enterprise. Twitter developers are to publish and analyze tweets, optimize ads, and create unique customer experiences. This study created its own application because Twitter Official API has a limitation; it does not provide access to the past seven days of Twitter data for standard service or 30 days of Twitter data for premium and enterprise service. This study developed a new app called -Twitgis‖ in Twitter developers to collect older tweets more than 30 days by coding a program written in the R language. The Twitgis is a reliable and accurate app since it obtains the client/server authentication from Twitter (see Figure 2). The Twitgis accesses Twitter Official API and employs an API key and secret and a set of access tokens that are authorized by Twitter. R is a language and environment for statistical computing and graphics supported by the R Foundation for statistical computing. The R language is one of the most popular programming languages among statisticians and data miners for developing statistical software and data analysis. This study employed RStudio, which is an integrated development environment (IDE) for R. This study used three keywords (-Florence,‖ -Hurricane,‖ and -Storm‖) to explore how the hurricane affects human sentiment. This study collected about 1,433,032 tweets between 08-21-2018 and 10-01-2018 (six weeks), which consists of the pre-hurricane periods (2 weeks), the hurricane periods (2 weeks), and the post-hurricane periods (2 weeks). This study set some selection criteria for tweets data, and the process of sampling is as follows: first, the tweets should be written in English. Second, the tweets should have the keyword (Florence, Hurricane, or Storm) in the text. Third, the tweets should be posted in the US. Fourth, the tweets should have geotagged information to explore spatial reactions. After filtering the data, this study utilized 57,842 samples, which were about 4.0% of raw data (see Table 1). This study did not show the Twitter ID and the specific location of tweets for privacy protection. This article gave a random ID and showed the state uploading tweets and favorites for each tweet. Also, the present study did not change verbal expressions or wrong grammar in tweets to show the original content from users. The characteristics of the Tweets Figure 3 shows the number of tweets according to days. Each keyword is highly fluctuated by days. For example, the Keyword -Florence‖ demonstrates the highest number of tweets on September 11, whereas the keyword -Storm‖ exhibits the highest number of those on September 12. Overall, the total number of keywords sharply increases from September 6 and decreases from September 17, which is the end date of the hurricane period.   Table 2 highlights that the proportion of tweets in the US states is differentiated by periods. For instance, in the pre-hurricane period, Hawaii places first with the proportion of 11.8, followed by Florida (9.9), Texas (8.9), California (7.6), and New York (5.2). All states in the top 5 are coastal states, which are susceptible to hurricanes. In the hurricane period, North Carolina and South Carolina take first and second (21.4 and 10.2, respectively), ahead of Virginia (8.4), Florida (6.2), and Texas (5.1). This is because North Carolina and South Carolina are the most damaged regions affected by Hurricane Florence during the hurricane period. In the post-hurricane period, North Carolina still ranks first (19.3), whereas New York places second with a value of 7.1. The next states are California (6.7), Florida (6.4), and South Carolina (6.1).  Figure 5 demonstrates that the spatial patterns of tweets are differentiated by periods. In the prehurricane period, the spatial patterns of tweets are more concentrated in the south region, such as Texas and Florida. In the hurricane period, the spatial patterns of tweets are heavily concentrated in the South-eastern region, especially North Carolina, South Carolina, and Virginia. In the posthurricane period, the spatial patterns of tweets are more located in the Northeast region and California.

Retweet analyses
This study analyzed human behavior according to Hurricane Florence based on the retweeted number of tweets. A retweet is a re-posting of a tweet, which helps people to share valuable information and interesting news to others quickly. A retweet can represent the magnitude of the importance of information since people retweet the tweet when they think that the information is valuable for others. This study organized all tweets in the hurricane week in descending order and selects tweets, which are highly related to Hurricane Florence, after reading all texts in tweets. Table 3 shows that many of the top 20 retweets are located in North Carolina. Eight out of 20 tweets were posted in the state. South Carolina shows the second-highest share of retweets (3). This is because the two states are the most damaged states among the US states according to Hurricane Florence. The most retweeted tweet shows that people show the highest concern about animals left behind in the hurricane. The tweet posts many poor dogs in the hurricane-damaged region (see Figure 6). The tweet has the number of retweets about 3.5 times higher than the second-highest tweet, meaning that people are highly interested in poor animals affected by the hurricane and abandoned by the owner. In fact, many people post and retweet the tweets for the poor animals and criticize the people abandoning their pets in the hurricane period. The results showed that people spread messages of commiseration more than the forecast and reports of the hurricane. The second highest tweet is that people were surprised because the waffle house had a storm center to deal with the hurricane problem (see Figure 7). The tweet shows that companies actively consider the hurricane issue to reduce the loss of their profits and plan their business. The tweet shows the number of retweets about 7.6 times higher than the third-highest tweet. Next, people are highly interested in the information and effect of Hurricane Florence (tweet 3, 14, 17, 18, and 20). Also, many people pray that their safety not to be affected by the hurricane (tweet 5, 9, 13, and 15). In contrast, other people showed a good sense of humanity for the hurricane (tweet 4, 10, and 16). Many people also uploaded their opinion related to the hurricane situation (tweet 6, 7, 8, 11, 12, and 19)

Human sentiment for Hurricane Florence
This study analyzed the human sentiment of 1,000 tweets according to the number of retweets in descending order in the hurricane period by interpreting the tweets in person. While some statistical programs, such as Linguistic Inquiry and Word Count (LIWC), OpenText Sentiment Analysis, and SAP HANA Sentiment Analysis, can be used for sentiment analyses, they cannot interpret human's delicate expressions, such as metaphorical or irony expressions, can have some bugs in programs, or can put tweets in the wrong categories because they are programmed to just analyze keywords, not interpreting the whole nuance. In contrast, this study interpreted all 1,000 tweets elaborately. This study divided the sentiment categories into strongly positive tweets, positive tweets, natural tweets, negative tweets, and strongly negative tweets.
After analyzing the sentiment of individuals, the tweets were categorized as follows: strongly positive tweets (28), positive tweets (91), natural tweets (727), negative tweets (51), and strongly negative tweets (103) (see Figure 8). Strongly positive tweets tend to express thanks to people who help them or others. For example, -My hero!!!! Noah's Ark except it's a school bus: Truck driver rescues 64 dogs and cats from floods of Hurricane Florence.‖ Positive tweets are highly related to the cancellation of schools or work and beautiful scenery created by the hurricane. For example, -Beautiful storm near Shorewood, IL. #ILwx @NWSChicago.‖ Natural tweets tend to include objective statements, such as breaking news, forecast, reports, information, and notification. For instance, -#BREAKING: #HurricaneFlorence is now a CAT.4 MAJOR hurricane.... winds are now up to 130 MPH with an SLP of 953mb. #Florence looks remarkable on satellite. It is forecast to remain a MAJOR HURRICANE for the majority of the next five days.‖ Negative tweets are apt to describe their anxiety or damage affected by the hurricane. For example, -It ain't good when you look up and a CNN hurricane reporter is standing on the major road in your hometown...‖ Strongly negative tweets are inclined to show their anger on people who leave their pets in the hurricane area. For instance, -As everyone is evacuating for this storm PLEASE don't forget about your pets! Don't leave them behind!!!! If you don't feel safe to stay at home why would you make them!‖ Next, this study categorized tweets into 15 groups based on the content in the text (see Table  4). After categorizing tweets, tweets in the emotion category showed the highest number [ [25], and scenery [20].

Conclusions
This study provided some important findings as follows: first, the spatial patterns of tweets are differentiated by periods. For example, the spatial patterns of tweets are more concentrated in the south region in the pre-hurricane period, it is heavily concentrated in the Southeast region in the hurricane period, and it is more located in the Northeast region and California in the posthurricane period.
Second, more than half of the top 20 retweets [11] are located in North Carolina and South Carolina. This is because the two states are the most damaged states among the US states according to Hurricane Florence. The most retweeted tweet shows that people are highly interested in poor pets left behind in the damaged region. The first ranked tweet is about two times higher than the sum of the retweet numbers between the top two and top 20.
Third, this study found that roughly three fourth of tweets consist of natural sentiment. This is because people upload important information, such as notification and forecast, on Twitter. Especially, people retweet the posts to let people understand the changed schedule affected by Hurricane Florence or to share information about how to prepare the hurricane damage. The following tweets are strongly negative (10.3%), positive tweets (9.1%), negative tweets (5.1%), and strongly positive tweets (2.8%). This study also highlighted that people show the highest number of tweets in the emotion category [181], meaning that people actively express their feeling during the hurricane period, followed by the information category [

Discussion
This study suggests some important implications as follows: first, people in indirectly affected regions, as well as those indirectly damaged regions, are also highly interested in the hurricane event. Therefore, governments should manage natural disaster information at the national level to relieve the anxiety of people. Not only that, releasing natural disaster information for other regions would let people in other regions help those in the damaged regions, with materials such as disaster relief and daily necessity, based on the information network and the magnitude of the damages.
Besides, this study suggests an interesting finding, that is, people are interested in messages of commiseration more than the forecast and reports of the hurricane. In other words, human sentiment plays an important role in disaster information more than the news of the natural disaster. Therefore, governments and policymakers should take care of the emotions of people as well as provide the forecast and reports of the hurricane. Relieving negative feeling would be one of the main roles for governments and urban practitioners during natural disaster periods.
Since the people utilize Twitter to share a lot of opinions, information, emotions, and so on, governments and policymakers should monitor Twitter data before, during, and after the natural disaster period to understand the effects of natural disasters on people and property. For instance, people post their tweets when there is a serious accident or damage caused by natural disasters, and governments and emergency responders could find accidents by monitoring Twitter to save human life or reduce natural disaster damages.