Tracking Flooding Phase Transitions and Establishing a Passive Hotline With AI-Enabled Social Media Data

Flooding management requires collecting real-time onsite information widely and rapidly. As an emerging data source, social media demonstrates an advantage of providing in-time, rich data in the format of texts and photos and can be used to improve flooding situation awareness. The present study shows that social media data, with additional information processed by Artificial Intelligence (AI) techniques, can be effectively used to track flooding phase transition and locate emergency incidents. To track phase transition, we train a computer vision model that can classify images embedded in social media data into four categories – preparedness, impact, response, and recovery – that can reflect the phases of disaster event development. To locate emergency incidents, we use a deep learning based natural language processing (NLP) model to recognize locations from textual content of tweets. The geographic coordinates of the recognized locations are assigned by searching through a dedicated local gazetteer rapidly compiled for the disaster affected region based on the GeoNames gazetteer and the US Census data. By combining image and text analysis, we filter the tweets that contain images of the “Impact” category and high-resolution locations to gain the most valuable situation information. We carry out a manual examination step to complement the automatic data processing and find that it can further strengthen the AI-processed results to support comprehensive situation awareness and to establish a passive hotline to inform rescue and search activities. The developed framework is applied to the flood of Hurricane Harvey in the Houston area.


I. INTRODUCTION
Urban flooding is becoming a national threat, causing billions of dollars of losses every year. Urban flooding can be attributed to multiple drivers including riverine, coastal, pluvial (flooding due to overwhelming precipitation), and nuisance flooding (flooding due to sea-level rise, also called ''sunny day flooding'') [1]- [3]. Comparing to the traditional method of water basin-scale flooding analysis, analyzing and predicting urban flooding require data of high spatial and temporal resolution. Such high-quality geo-tagged urban flooding data are challenging to collect, because: remote sensing is limited by certain weather conditions, such as cloud The associate editor coordinating the review of this manuscript and approving it for publication was Anna Visvizi . coverage; sensor networks are costly to install and maintain in urban areas; insurance reports are usually inaccessible and delayed in time; and government surveying is limited in coverage and accuracy [4].
Social media is an emerging data source that has a great potential to meet the rising data challenge of urban flooding. There are two main types of data that can be collected through social media, images and texts, which have been used in previous studies. Jongman et al. [5] analyzed the text of flood-related tweets sent by disaster response organizations to understand the location, timing, causes, and impacts of floods. They found that tweets, combined with satellite data, can reasonably outline the affected location and provide one to several days of early warning. They also found social media data can detect unexpected or controversial flood events such VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ as dam breaking. Reference [6] used social media photos to estimate the water extent and depth for the June 2013 flood event in Dresden, Germany. They developed a tool called ''PostDistiller'' to filter the geolocated posts that contain links of photos. Combing a manual picking method, they were able to identify the flood-relevant tweets for a flood extent estimate. A recent study of [7] developed a data fusion scheme to combine texts and photos to improve flooding estimate. More complete surveys of the application of crowdsourcing/citizen science advances can be found in [8] and [9]. They reviewed crowdsourcing-based data acquisition methods that have been used in a broad spectrum, covering seven domains of geophysics, i.e. weather, precipitation, air pollution, geography, ecology, surface water, and natural hazard management. Social media-based disaster surveillance has several advantages compared to traditional methods. First, the data stream provides a continuous coverage on wide geographic areas with high resolution. Second, this data source is much cost effective -a typical social survey at the city level can require several years of dedicated resources investment [10], while a social media-based survey at the household level only requires limited resources and can be achieved within months [11]. Third, social media data provide real-time monitoring so that emergency managers can access the evolution and situation transitions to make informed decisions [12].
Although social media data can be downloaded and stored in real-time, extracting information in real-time is challenging, which requires a rapid and automatic processing tool to help deal with the high volume, fast velocity, and strong heterogeneity (variety) -the 3 V's in big data [6]. Artificial Intelligence (AI) was introduced to address this issue. Reference [4] created a new method using AI to automatize this process, in which Natural Language Processing (NLP) techniques are applied to the textual data collected from Twitter. They used a topic modeling scheme to filter the relevant tweets and Named Entity Recognition (NER) to extract flooding locations. They found social media-based flood monitoring can complement the existing means of flood data collection. Reference [13] used a topic modeling approach, the Term Frequency-Inverse Document Frequency (TF-IDF), to automatically filter flooding topics, but location names and geospatial distributions of the relevant tweets were not extracted. Reference [7] developed an automatic tool for mining both visual and textual information. They trained a Convolutional Neural Network (CNN) model to classify the visual content of flooding pictures and used keywords from the texts to refine the classification. They reported that the new strategy can enhance the accuracy of flooding estimate without sacrificing the recall rate. In a general context, new methods were developed to detect events based on social media data: [14] developed a fuzzy scheme to detect evolving events at multi-time scales. Reference [15] developed an exemplar-enhanced supervised learning scheme to reduce uncertainties of social media data classification. Generally speaking, the application of AI methods in flooding observation is still in its infancy and has a great potential to revolutionize flood monitoring capabilities, especially in underdeveloped countries where monitoring infrastructure is lacking [16].
Disaster phase terms, invented in 1930s, have long been used to help describe, examine, and understand disasters and to help organize the practice of emergency management in the US [17]. Researchers developed several lists to apply the concept to disaster related social media data. For example, [18] coded social media messages into ''Caution, Advice, Fatality, Injury, Offers of Help, Missing, and General Population Information'' and [19] used a list of ''Caution and Advice, Casualty and Damage, Donation and Offer, and Information Source''. A remarkable work conducted by [20] developed a detailed framework to classify social media texts into stages that follow the disaster management scheme of ''preparedness, emergency response, and recovery''. They skipped the category of ''Mitigation'' because it is related to long term activities and does not reflect the real-time changes of social media data. They manually examined more than 10,000 tweets from natural disaster events to develop a keyword-base classifier. To our knowledge, such a classification scheme is only applied to textual content from social media and no study has attempted to classify images from social media following the disaster management phases. Since image data may contain much richer information than texts, a visual classification can help recognize the phase transition in disaster management. Our study aims to fill this gap.
Another type of urgently needed information is the geographic location of flooding incidents. During flooding events, locations are frequently mentioned in social media messages. Accurately extracting locations from them can help understand the emergency situation on the ground. Two strategies were developed to achieve the task. First, geographic locations can be obtained using geo-tagged social media messages (i.e., geotagging). This strategy is straight forward and relying on the user's volunteering contribution. But since Twitter stopped the precise geo-tagging feature in June 2019, this strategy can become less useful as before [21]. Another strategy is to recognize the locations mentioned in the message content and use a gazetteer to convert the locations to coordinates. Reference [22] named these two strategies as tweet-from locations and tweet-about locations. While tweet-from locations are usually in a structured format of longitude and latitude, tweet-about locations are embedded in natural language text and can be difficult to extract. The tweet-about strategy involves a critical step called geoparsing, which is the process of recognizing place names, or toponyms, from text and identifying their corresponding locations [23]- [25]. A number of geoparsing tools, or geoparsers, have been developed, such as GeoTxt [26] and TopoCluster [27]. However, existing geoparsers often use a traditional machine learning model, such as Conditional Random Field (CRF), for recognizing locations from texts, and there is limited research, such as [28], on applying deep learning models for extracting locations from social media.
This study presents a deep learning-based framework for flood monitoring with a combination of Computer Vision (CV) and NLP techniques that extract information from both the visual and textual content of social media. The major contribution and novelty of the paper are: 1) a new protocol to classify social media images into disaster phase informed categories using AI; 2) a deep learning-based scheme to locate emergency locations from textual content of social media; and 3) an original framework that mines both photos and texts in tweets to capture disaster-related information for search and rescue activities. The developed framework is applied to the flooding event of Hurricane Harvey that devastated the Houston area in 2017. This case study provides a testing bed to explore the best use of social media data for flooding related research. Different from [7], our study provides a fourcategory classification on the transition of disaster phases comparing to their flood/no-flood binary classification, and we focus on extracting street-level locations of flooding incidents.

II. METHOD
An innovative framework is developed to extract information from Twitter (see Figure 1). It consists of two routes of data processing: texts and photos. The text processing involves two steps: (1) toponym (or place name) recognition and (2) toponym resolution. For toponym recognition, we use a deep learning based NER tool, NeuroNER [29], which trains a long short-term memory (LSTM) model, a variant of the recurrent neural network based on the CoNLL 2003 dataset. It has been shown that neural network based models generally outperform traditional models such as CRF in the task of NER [30], [31].
Step (1) recognizes place names mentioned in tweets such as ''Houston'' and ''Solomon Rd'', while Step (2) assigns geographic coordinates to the recognized place names. GeoNames is the most comprehensive open gazetteer which contains over 25 million place names throughout the world. However, GeoNames primarily covers city and town names, and does not contain detailed information about roads whose names are also mentioned in social media. Therefore, we make use of the TIGER data from the US Census which provide the most authoritative road network data in the United States. The recognized place names are located to the center of a city (e.g., ''Houston'') or the center of a road segment (e.g., ''Solomon Rd''). GeoNames and TIGER road data are downloaded and organized in a local PostgreSQL database with PostGIS extension. Spatial indexing is enabled to speed up the geolocating process.
In processing photos, we develop a TensorFlow-based classification scheme. Two deep learning architectures were adopted to classify the images, i.e., Convolutional Neural VOLUME 8, 2020

Networks (CNN) and Residual Neural Networks (ResNet).
The CNN (also known as ConvNet) is composed of a group of layers based on their functionality: Convolutional Layers, Pooling Layers, and Fully-Connected Layers. ResNet is a Neural Networks scheme that allows skips of layers in the network structure. The present study used the CNN and ResNet v2 architectures developed in [32] to perform photo classification. The selected training data is also augmented using re-scaling, rotation, horizontal shifting, zooming, and flipping. Keras [32], a Python-based deep learning code, was employed to realize the architectures with the CUDA libraries on the GPUs (Graphics Processing Units) of a computer cluster hosted at the School of Engineering, Rutgers University.
A new classification method for social media images is developed based on the CV algorithms to support the understanding of phase transitions and inform the decision makers. Following the four phases of disaster management [11], we decide to classify the images into four categories -preparedness, impact, response, and recovery. Similar to [11], the ''Mitigation'' category is not used because it refers to long-term activities that cannot reflect the changes of realtime data in the course of disasters. We create the ''Impact'' category to capture first-hand onsite witness evidence, which provides exclusive information that other media sources, such as newspapers, do not cover. Different from textual classification that relies on strict definitions of keywords, we adopt an empirical protocol to classify the images.
In general, we use the following rules to label the training dataset: 1) Impact: first-hand onsite witness of flood scenes that can be found in social media only. 2) Preparedness: flood warnings, preparedness tips, and forwarded weather forecasts. 3) Recovery: activities about community rebuilding and flood cleaning. 4) Response: forwarded media reports about flooding events and rescue activities. We randomly selected 6542 images from a Twitter database (specified below) to prepare a training dataset, in which the Impact, Preparedness, Recovery, and Response categories contain 4617, 772, 79 and 1074 images respectively. The sample images are shown in Figure 2.
After training, the best CV model (specified in the result section) was selected to process all of the 151,979 images collected in the Twitter dataset. The processed texts and classified images are merged to a converging database for a detailed manual filtering to achieve the quality of practical applications (specified in the result section).
The present method is applied to the flooding event of Hurricane Harvey in the Houston area. Hurricane Harvey was a Category 4 storm. It started on August 17, 2017 and ended on September 2, 2017, with a landfall on Texas and Louisiana. The caused $125 billion of catastrophic flooding and many deaths were tied with 2005's Hurricane Katrina as the costliest tropical cyclone on record [33]. Many affected areas recorded more than 40 inches of precipitation, ranking Harvey the wettest tropical cyclone in the recorded history of the United States.
A Twitter dataset for Hurricane Harvey [34] was obtained through the Libraries of the University of North Texas. The dataset was collected using a comprehensive keyword list from Twitter covering the period of August 18, 2017 to September 22, 2017. The dataset contains a total of 7,041,794 tweets and retweets and the total file size is 43.2 GB in the format of JSON. Of the total original tweets, only 7537 are geo-tagged [35]. To compare with the Twitter data, modeling results of HAND (Heigh Above Nearest Drainage) is downloaded from CUASHI (Consortium of Universities for the Advancement of Hydrologic Science, Inc. https://www.cuahsi.org/) [36]. HAND is an approach to representing the relative elevation of a land surface cell above the cell in the river/stream to which it flows. This method is a digital terrain-based flow analysis that does not require detailed cross-sectional surveys [37], [38].

A. COMPUTER VISION TRAINING AND PROCESSING
Metrics are used to quantify and compare the performance of the image classifiers: precision, recall and F1-score. Precision is the fraction of retrieved documents that are relevant to the query; Recall, used with precision, is the fraction of the relevant documents that are successfully retrieved; F1-score is a function of Precision and Recall to balance the two metrics, which is considered a more comprehensive measure of data extraction performance. They can be expressed mathematically: Every image is rescaled to two different levels of resolution: 32 × 32 and 256 × 144 (144p). For the different levels of resolution, we apply the metrics to cross-validate the CV processing with two different deep learning architectures -CNN and ResNet. The results are shown in Table 1. In general, we cannot find a major difference between the two levels of resolution. This result is a little surprising, because the high resolution processing that takes more memory and a higher computational cost is expected to outperform the lower level. Regarding the different architectures, ResNet is generally better than CNN, which is an expected result as ResNet is a more flexible algorithm that is upgraded from CNN. Among the categories, the best classification performance is found in the ''Impact'' category using the ResNet classifier with the 32 × 32 resolution, which reaches the highest F1-score of 0.88. The reason can be attributed to the large size of the training data. In contrast, the ''Recovery'' category has the worst performance due to the low number of training data records. Based on the analysis of Table 1, we can conclude that ResNet with the resolution of 32×32 is the best scheme, and therefore, we choose this scheme to process all the other images from the Twitter database.

B. PHASE TRANSITION
The selected CV scheme (ResNet with the 32 × 32 resolution) is applied to the Twitter database. A total number of 22,390 images are classified, and we found 12,829 images are classified to the ''Impact'' category, 5,064 to the ''Preparedness'' category, 43 to the ''Recovery'' category, and 4,454 to the ''Response'' category. Their daily volume is shown in Figure 3. We observe that the ''Preparedness'' category peaks on Aug 25 -2 days ahead of the ''Impact''. Both the ''Recovery'' and ''Response'' categories reach the maximums on Aug 29, 2 days after the ''Impact'' category. The order of the peaks is consistent with the three phases of disaster management: preparedness-response-recovery, and it demonstrates the reliability of the four-category classification. The daily volume clearly shows phase transitions, which can be generated real-time to inform the priority of disaster management and used to coordinate response activities.

C. SPATIAL DISTRIBUTION OF GEOLOCATED TWEETS
The daily spatial distribution of the geolocated tweets from Aug 28 to Sep 3, 2017 is shown in Figure 4. The geoparsed tweets represented by white dots concentrate in the urban area. The geoparsed tweets containing links to images are a small portion of the total volume and sparsely distributed over the city area. Furthermore, the ''Impact'' tweets were observed to position closer to major roads than other categories. Note that the ''Recovery'' category is ignored in this map due to its extremely low number.
The distribution of the geoparsed tweets are compared with the flood extent predicted by the HAND model [36]. The results show that the tweet distribution is random in the city except the flood patches, e.g. the Addicks-Barker Reservoir area. We did not find any strong correlation between the geoparsed tweet distribution and flood extent of the HAND model through a visual examination.

D. ESTABLISHING A PASSIVE HOTLINE
The CV classifier establishes a rapid topic filter to the noisy and high volume social media data to provide a processed data stream for further uses. This AI-based algorithm still involves much uncertainty, which can be reduced by a manual processing step. Here we demonstrate two examples that use such manual processing to refine the quality of filtered data and increase data value for practical applications.
The first demonstration is targeted to collect onsite, first-hand flooding scene evidence. In this demonstration, the images of the ''Impact'' category are first filtered using the CV classifier and then manually processed. 698 images are identified. This procedure is similar to the manual examination in [39] except that the ''PostDistiller'' is a deeplearning filter herein. The manually selected dataset captures special situation awareness information that can be used to complement the general understanding of flooding development -the filtered dataset highlights a series of issues that official media may ignore. For example, we find visual witnesses of flooding incidents in nursing houses, floodingtrapped animals (e.g. horses and dogs), unusual animal presences in cities such as crocodiles, snakes, and fireants, the emergency of sewer manholes, road damages, and indoor floods. These issues are usually not disaster management priorities but can help inform emergency responders. The contribution of the easy-to-ignore information adds an ''orthogonal'' dimension to the conventional data sources according to the informatics theory.
The second demonstration is targeted to identify the tweets of street-level resolution from the first-hand onsite witness photos in Category ''Impact''. These photos are provided through unprofessional photography of Twitter users. The targeted tweets are valuable because they can contribute an ''orthogonal'' dimension to the mainstream information of official sources such as FEMA and NOAA and public media such as local TV stations and newspapers. To achieve this goal, the place names recognized from tweets are compared with a local collection of city and town names. The tweets with place names that are different from the state, city or town names are filtered. The ones containing photos of Category ''Impact'' are further extracted from the filtered group ( Figure 1). From the dataset of Hurricane Harvey, we ultimately identified 13 tweets that meet the criteria and the samples are shown in Figure 5. These photos are useful for emergency response teams, because they provide street-level locations and rich onsite information to guide search and rescue activities. The developed scheme has a great potential to establish a passive hotline, which is more cost effective and wider in coverage compared to the traditional active hotline that relies on dedicated and specially trained staffs to receive and respond to information contribution.

A. WHAT's THE BEST USE OF AI-ENABLED SOCIAL MEDIA DATA?
The present study uses novel deep learning methods for high accuracy temporal-spatial social media data mining. The AI-enabled data processing framework extracts information from high-volume and fast-updating social media sources to provide an ''orthogonal'' dimension to flood situation awareness. However, compared to conventional flood information sources, AI-enabled social media processing is still challenged by high uncertainties. A key question arises: what's the best use of AI-enabled social media data for flooding mitigation? The present study attempts to answer this question by identifying three promising applications: 1) establishing a passive hotline to guide emergency search and rescue activities, 2) tracking phase transitions to inform disaster  First, establishing a passive hotline is a promising direction that AI-enabled social media data may reach the level of practical application. As indicated in the past studies that social media data is useful for emergency detection, AI can enhance this application in capturing, geolocating, and classifying social media data. With the new photo classifier (with the potential of coupling textual classification such as in [11]), there is a hope to build an exclusive and unique data source for passive situation surveillance. Thanks to the developed automatic ''PostStiller'', this new framework will also help reduce the cost and time of the manual refining process if necessary.
Second, AI-enabled social media data shows a good accuracy in reflecting the general trend of disaster development, thanks to the high temporal accuracy. Using the newly developed photo classifier and text geolocator, one can track phase transitioning in real time. Accordingly, disaster management team could shift the mitigation effort priorities in time. For example, if the peak of the Preparedness category is observed and the Impact category is found rising, the management team can quickly shift from preparedness to search and rescue missions. In addition, a topic clustering model can be built to ensure the level of accuracy. Especially, if only largescale spatial distribution is needed such as at the city or state level, the AI-enabled processing can deliver a promising result.
Third, it is still not practical to use AI-enabled social media data as a standalone data source to support the validation and calibration tasks for numerical models. Since social media provides a wide and continuous monitoring capability, research community has long wished to use this unconventional data source to calibrate and validate flood forecasting models (e.g. [4]). However, given the relatively high uncertainty and numerous challenges in the completeness and accuracy of geolocating, this task is still too challenging with existing level of data processing. In addition, social media data only partially sample social activities, so it cannot completely represent the whole field. This problem can become more severe as data privacy issue is becoming more urgent. A remarkable milestone is the recent announcement from Twitter that the service of high-accuracy GPS geotagging is suspended. Hu and Wang [21] assessed the impact of this change on the geospatial twitter analysis and discussed several potential means to respond to it. AI is a fast-developing field and every year numerous new models are developed for improved data mining performance. We expect that new AI techniques will improve the level of accuracy in the near future and rapid adoption of new algorithms can help advance the field. In addition, we share our thoughts on a few directions that could contribute to flood applications with AI-enabled social media data.
First, data fusion that combines social media, remote sensing and other data sources is considered to be an emerging research opportunity to provide quality data products for flood management. This scheme takes the advantage of each data source to create a more complete monitoring. Combining high fidelity data sources such as ground sensor networks, it is possible to meet the data quality and accuracy demands of flooding model validation and calibration. An effort has been made in the literature [40].
Second, there is a challenge in using social media data for operational real-time forecasting. The rich information collected from social media could provide a new lively and sensational visual and textual stream to inform the forecasting end users. For example, real-time and first-hand witness photos and Twitter texts could be pushed through mobile phone apps to show the onsite situation and reflect the concerns of affected communities.
Third, data originated from social media is also valuable for training automatic systems. For example, smart infrastructure and autonomous driving systems all need first-hand data to train their systems in order to perform the designated tasks. Social media data coming from the first hand witnesses hence are valuable for helping developers to adjust the model for real-time situations.
Last but not least, misinformation is often found in social media stream and has been seen in the context of disasters [41]. Such misinformation can mislead disaster response efforts and waste limited resources by spreading rumors or creating panic situations. Accordingly, we need methods that can automatically detect fake and spam social media messages [42]. Those methods can be integrated with information extraction models to form a complete pipeline that derives true and actionable knowledge from social media data.

V. CONCLUSIONS
The present study develops an original framework of social media data mining for flood monitoring. This framework consists of two tracks to process Twitter data, i.e., the CV-based image classification/topic filtering and the NLP-based geolocating. Both tracks use novel deep learning algorithms and are shown valuable in flood monitoring. This study shows the feasibility of developing an image classifier to track disaster phase transitions, using local gazetteer with the LSTM NER model to improve text-based geolocation, and combining photo-based social media classification and text-based geolocation to extract useful tweets. Manual data examination based on the automatically filtered tweets are proven promising in enhancing data mining results. It can also be conveniently used to establish a passive hotline to guide search and rescue activities. The best use of social media data for disaster management is discussed and promising future directions are outlined.