Review article: A systematic review and future prospects of flood vulnerability indices

. Despite the increasing body of research on flood vulnerability, a review of the methods used in the construction of vulnerability indices is still missing. Here, we address this gap by providing a state-of-art account on flood vulnerability indices, highlighting worldwide trends and future research directions. A total of 95 peer-reviewed articles published between 10 2002-2019 were systematically analyzed. An exponential rise in research effort is demonstrated, with 80% of the articles being published since 2015. The majority of these studies (62.1%) focused on the neighborhood followed by the city scale (14.7%). Min-max normalization (30.5%), equal weighting (24.2%), and linear aggregation (80.0%) were the most common methods. With regard to the indicators used, a focus was given to socio-economic aspects (e.g. population density, illiteracy rate, gender), whilst components associated with the citizen’s coping and adaptive capacity were slightly covered. Gaps in current research 15 include a lack of sensitivity and uncertainty analyses (present in only 9.5% and 3.2% of papers, respectively); inadequate or inexistent validation of the results (present in 13.7% of the studies); lack of transparency regarding the rationale for weighting and indicator selection; and use of static approaches, disregarding temporal dynamics. We discuss the challenges associated with these findings for the assessment of flood vulnerability and provide a research agenda for attending to these gaps. Overall, we argue that future research should be more theoretically grounded while at the same time considering validation as well as 20 the dynamic aspects of vulnerability.


Introduction
Floods affect billions of people worldwide (Zarekarizi et al., 2020).Indeed, according to the Emergency Events Database (CRED, 2019), around 50,000 people died and approximately 10% of the world population was affected by floods between 2009 and 2019.Due to population growth and climate change, more frequent and widespread floods are anticipated (Hirsch and Archfield, 2015;Leung et al., 2019).Therefore, flood risk management is required for mitigating potential damages.
Nowadays there is a consensus that risk (i.e. the potential for adverse impacts), is not driven solely by natural hazards (e.g.floods, droughts), but depends on the interactions between hazards, exposure, and vulnerability (IPCC, 2012(IPCC, , 2014)).In this regard, vulnerability plays an important role in flood risk assessment.It encompasses multiple social, economic, physical, cultural, environmental and institutional characteristics which influence the susceptibility of the exposed elements to the impact of hazards (Birkmann et al., 2013;UNDRR, 2017).Due to its importance, the need to understand and assess flood vulnerability has been highlighted by international initiatives such as the Sendai Framework for Disaster Risk Reduction 2015-2030(UNISDR 2015)).
In response to this, numerous studies have been undertaken to better understand flood vulnerability.Nevertheless, both the terminology and methodology used in these assessments are still a subject of discussion (Aroca-Jiménez et al., 2020;Kelman, 2018).In fact, some consider vulnerability as a function of exposure and susceptibility (Balica et al., 2009;IPCC, 2001;Turner et al., 2003;UNDP, 2014), while others separate these concepts (Dilley et al., 2005;Fedeski and Gwilliam, 2007), as it is possible to be exposed to a hazard and not to be vulnerable.For instance, a person may live in an area prone to natural hazards, but have sufficient alternatives to modify the structure of his house to prevent potential losses (Cardona et al., 2012).
A wide range of approaches have been proposed for assessing flood vulnerability.The most commonly used methods are: stage-damage functions (Papathoma-Köhle et al., 2012, 2017;Tarbotton et al., 2015); damage matrices (Bründl et al., 2009;Papathoma-Köhle et al., 2017); and vulnerability indices (Birkmann, 2006; de Brito et al., 2017;Kappes et al., 2012;Moreira et al., 2021).The first two methods assess only the physical vulnerability, neglecting the social vulnerability of their inhabitants (Koks et al., 2015).However, the capacity of households to cope, adapt and respond to hazards is equally important to assess the potential impacts of floods (de Brito et al., 2018).Therefore, given the importance of holistic studies on vulnerability to ensure better representation of reality, the use of vulnerability indices is recommended (Balica et al., 2013;Birkmann et al., 2013;Fuchs et al., 2011;Nasiri et al., 2016).Indices serve as a summary of complex and multidimensional issues to assist decision-makers, to facilitate the interpretation of a phenomenon, to increase public interest through a summary of the results.
Flood vulnerability indices are, therefore, a tool to measure the vulnerability degree throughout the aggregation of several indicators or variables.Despite their advantages, indices can present misleading messages if they are poorly constructed or misinterpreted.Hence, a clear understanding of the normalization, weighting and aggregation methods used to build an index is required (Moreira et al., 2021).
Over the past years, several review articles about flood vulnerability have been published.For instance, Rufat et al. (2015) reviewed 67 articles to identify the leading drivers of social vulnerability to floods.Rehman et al. (2019) and Fatemi et al. (2017) reviewed different methodologies used for assessing flood vulnerability.Jurgilevich et al. (2017) systematically reviewed 42 climate risk and vulnerability assessments.More recently, Diaz-Sarachaga and Jato-Espino (2020) evaluated 72 articles related to the appraisal of vulnerability to different types of hazards in urban areas.Some studies also analyzed different methods and index construction designs to understand which decisions have the greatest influence on the vulnerability outcomes.For instance, Nasiri et al. (2016) compared damage-curves, computer modeling and indicators to evaluate flood vulnerability.Similarly, Schmidtlein et al. (2008) and Tate (2012Tate ( , 2013) ) examined the sensitivity of the results to changes in the construction of the vulnerability index.
Notwithstanding these advances, to the best of our knowledge, no study has conducted a systematic review of flood vulnerability indices with a focus on the different stages involved in the construction of flood vulnerability indices.The investigation of the methods used for normalizing, weighting, aggregation and validation and the implications for each choice for vulnerability assessment has received little attention so far.In addition, even though there have been recent advancements in the field (e.g.Cutter and Derakhshan, 2020), the temporal dynamics of flood vulnerability have not been tackled by the existing reviews.This is particularly important given that certain adaptation policies and strategies may reduce short-term risk probability, but increase long-term vulnerability and exposure (Cardona et al., 2012).Therefore, a better understanding of the methods used in each step of the index construction, the vulnerability temporal dynamics (e.g., pre and post flood), and the uncertainty involved is needed for advancing research on flood vulnerability assessment.
Considering the aforementioned gaps and given the proliferation of methods for building vulnerability indices, it is pertinent to review the development of this field.Hence, here, we carried out a systematic literature review of indices used to assess flood vulnerability.A focus is given to urban and riverine floods.The following questions guided the analysis: (1) Which spatial scale was considered?(2) Which indicators were most commonly used to measure flood vulnerability?(3) How were the temporal dynamics of vulnerability addressed (e.g.pre or post-flood event)?(4) Which methods were most commonly applied in each stage of the index building process (i.e.normalization, weighting, aggregation)?(5) To which extent did these studies conduct validation and apply uncertainty and sensitivity analysis?In addition to highlighting existing challenges, we also point out directions for further research.

Overview of indicators and indices
In general, indicators consist of various pieces of data capable of synthesizing the characteristics of a system.When these indicators are aggregated they are called index or composite indicator (Saisana and Tarantola, 2002).Overall, the construction of an index comprehends 7 steps (Fig. 1).First, the phenomenon to be measured is defined, so that the index results can provide a clear understanding of this phenomenon (Nardo et al., 2008).Then, the indicators used to measure the phenomenon are selected.This should be done carefully as the results reflect the quality of the selected indicators.In the third step, the relationships between the selected indicators are identified.Indicators with similar characteristics can be grouped aiming to reduce the number of variables.To this end, statistical analysis (e.g.principal component analysis -PCA) or expert knowledge can be used to decide whether the indicators are sufficient or appropriate to describe the phenomenon (Nardo et al., 2008).After selecting the indicators, they need to be normalized to a common scale before being aggregated into an index as they usually have different units of measurement (see Table 1 for the main normalization methods).By doing so, problems with outliers can also be reduced (Jacobs et al., 2004).Rescales values between 0 (worst rank) and 1 (best rank).It subtracts the minimum value and divides it by the range of the maximum value subtracted by the minimum value.

Jha and Gundimeda
(2019) Distance from the group leader Rescales values between 0 and 1.It is defined as the ratio of the value of the indicator to its maximum value.Munyai et al. (2019) Division by total It is defined as the ratio of the value of the indicator to the total value for the indicator Garbutt et al.
(2015) Note: y is the transformed variable of x for indicator i for unit n.P i is the i-th percentile of the distribution of the indicator x in , and pan arbitrary threshold around the mean.
The fifth step comprises the weighting and aggregation of the indicators.Weights can be assigned to indicators to demonstrate their importance in relation to the studied phenomenon (see Table 2 for the main weighting methods).Given that it may be difficult to find an acceptable weighting scheme, equal weights are often used, which implies that all criteria are "worth" the same ( de Brito et al., 2018).Alternatively, an equal weighting scheme could be a result of a lack of knowledge about the indicators' importance.After the indicators are weighted, they are aggregated.The most common aggregation methods are linear and geometric.The linear method consists of the weighted and normalized sum of indicators whereas the geometric aggregation represents the output of the indicators whose exponent is their assigned weight (Nardo et al., 2008).
The sixth step consists of sensitivity and uncertainty analyses (see Table 3 for the main uncertainty and sensitivity methods).
The first evaluates the contribution of the uncertainty source of each indicator to the variance of the results, while the latter focuses on how the uncertainty of each indicator propagates through the index structure and affects the outputs (Saisana et al., 2005;Saisana and Tarantola, 2002).
The final step comprises the validation of the index results.This is crucial to verify if they are consistent with the real system and have a satisfactory precision range.Validation can be achieved by using independent secondary data that refer to observable outcomes.Since vulnerability is not a directly observable phenomenon, its validation requires the use of proxies such as mortality and build environment damage (Schneiderbauer and Ehrlich, 2006), post event-surveys (Fekete, 2009), number of disasters (Debortoli et al., 2017) and emergency service requests (Kontokosta and Malik, 2018).It is a set of methods based on multiple criteria and objectives for structuring and evaluating alternatives.de Brito et al. (de Brito et al., 2018) Table 3. Characteristics of the main methods for uncertainty and sensitivity analysis used for building indices.

Method Description Reference
One-at-a-time sensitivity analysis By changing input data parameters, it was verified how these disturbances affected the results when all the other parameters remained constant.de Brito et al. (de Brito et al., 2019)

Monte Carlo simulation
Computational algorithm which uses a probabilistic method that uses repeated random sampling Feizizadeh and Kienberger (Feizizadeh and Kienberger, 2017)

Statistical tools
Use of statistical tools such as regression, correlation analysis and cross-validation Moreira et al. (2021), Nazeer and Bork (Nazeer and Bork, 2019) 3 Methods A bibliographic search was performed by focusing on studies that constructed flood vulnerability indexes.The Web of Science (WoS) database was used to identify peer-reviewed articles published since 1945, using the following keywords: (("flood" OR "flooding") AND ("index" OR "composite indicator") AND "vulnerability" NOT "coast*").Only the abstract, title, and keywords were searched.This narrowed the search space substantially.
These queries elicited over 348 articles published between January of 2002 and December of 2019.At first, the title, abstract, and keywords were screened manually to exclude articles that are not useful for the purpose of the present study.After this preselection, the full text of 84 selected papers was revised in detail.An additional of 11 key articles were included.They were not found in our original search even though they built vulnerability indices.This occurred because the keywords "index" or "composite indicator" were not mentioned in the article´s abstract, title and keywords.Hence, this limitation should be acknowledged as relevant articles may have been disregarded.
A complete list of the reviewed papers is presented in the Supplementary Material S1.

Flood vulnerability indices at a glance
An increasing number of studies that built flood vulnerability indices can be observed in recent years, with about 80% (n=76) of the articles being published since 2015 (Fig. 2a) -the year the Sendai Framework for Disaster Risk Reduction (UNISDR, 2016) was agreed among several Member States.Therefore, the growing number of publications may result from the increasing awareness of flood-disasters prevention and reduction policies.The increasing number of vulnerability indices studies could also be attributed of the easiness of using indices to address complex and multidimensional issues such as flood vulnerability in contrast to methods that demand more data (e.g.damage curves) which are often not suited for considering the social components of vulnerability.Alternatively, this increase may just match a general rise in published papers.To investigate this, we calculated the increase of flood vulnerability studies in relative terms based on a normalization according to the number of all flood publications in the WoS database.Results show that the increase in research on flood vulnerability indices is significantly greater than the increase of published flood articles (Appendix A Fig. A1).
Overall, most of the assessments were conducted in Asia (45.3%), followed by America (24.2%), encompassing 38 countries (Fig. 2b).This was expected as, according to the EM-DAT statistics, between 2002 and 2019 Asia showed the highest amount of deaths caused by floods (1027 deaths) (CRED, 2019).As such, the studies are highly concentrated in a few countries, namely China (n=14), Brazil (n=8), India (n=6), Pakistan (n=6), and United States (n=6).Meanwhile, there were fewer studies in East and West Africa despite the frequent occurrence of floods and the high mortality they cause across these regions.
In terms of spatial scale, most of the studies were conducted at the neighborhood scale (62.1%), followed by city (14.7%), household (12.6%), group of cities (7.4%), various scales (2.1%), and state (1.1%).Similar outcomes were obtained by Diaz-Sarachaga and Jato-Espino (2020), which found out that vulnerability studies at national and regional scales are infrequent.
The neighborhood scale was the dominant scale in all continents (Fig. 3) as it is the smallest unit where data is available for large areas, generally through census data.Only 8 studies (8.4%) were conducted at the basin level (i.e. group of cities) and few articles (n=2) conducted assessments across various scales.For instance, Balica et al. (2009)  Around 40.0% of the studies were applied to urban areas, 15.8% to rural areas and 44.2% to both.The high prevalence of studies that consider both urban and rural areas is related to the data availability, as the census tracks usually encompass the entire perimeter of a municipality.At the neighborhood scale, most studies considered only urban areas (53.4%) (Fig. 4).
Conversely, studies at the household scale were developed mainly in rural areas (58.3%).This can be explained by a lower availability of detailed geospatial data in rural areas worldwide (Zhang and Zhu, 2018;Zielstra and Zipf, 2010).Therefore, in these cases, it is necessary to collect data via household surveys.

Coping capacity
Early warning system 11 (11.6%)Past flood experience 7 (7.4%)Emergency committee 5 (5.3%)Flood insurance 5 (5.3%) The studies used a median of 16 indicators.Although 32.6% (n=31) of the studies used more than 20 indicators (e.g.Sam et al., 2017), most of them (58.0%) did not apply any method for reducing the number of variables.Among the studies which conducted reduction, the mostly-used technique was the PCA, which was applied to 35.5% (n=11) of the indices that used more than 20 indicators (e.g.Aroca-Jimenez et al., 2017;Grosso et al., 2015;Török, 2018).In addition to PCA, some studies used other approaches, for example, based on expert opinion (e.g. de Brito et al., 2018) or based on indicators with a high Pearson correlation (e.g.Kotzee and Reyers, 2016).

Temporal dynamics
In order to identify if the articles included the temporal dynamics of vulnerability, the indices were classified into: pre-event (before), event (during) and post-event (after) (Kobiyama et al., 2006).Most of the studies focused on assessing past vulnerability trends (88.4%) and only 12.6% explored post-event flood vulnerability (e.g.Carlier et al., 2018;Miguez and Veról, 2017).None focused on the vulnerability during the event or elaborated projections for future vulnerabilities.
The indicators used are different according to the temporal scale considered.Post-event indices encompassed human, economic and material damages to quantify the flood vulnerability (Table 5).Variables such as mitigation, damages and coping behavior after experiencing a flood were often considered (Abbas et al., 2018).For instance, Rogelis et al. ( 2016) compared the results of the most vulnerable areas by ranking the basins according to the observed impacts from highest to lowest damage in terms of: number of fatalities, injured people, evacuated people, and number of affected houses.

Indicator normalization, weighting and aggregation
Concerning the indicators normalization, the most used approach was Min-Max (30.5%), followed by Z-score (12.6%) and Distance from the group leader (12.6%)(Table 6a).Five studies applied other methods.For example, Aroca-Jimenez et al.
(2017; 2018) expressed the indicators' values in percentage or per capita value, and de Brito et al. (2018) used fuzzy functions to normalize the indicators.It is important to note that most papers did not specify the normalization method used (11.6%), which limits the reproducibility of the study results.
Among the weighing approach types, statistical methods were the most applied (30.5%), especially the PCA method (21.1%).
The high use of PCA can be attributed to the pioneering work by Cutter et al. (2003), which recommended the use of a factor analytic approach.Other less common statistical methods include dividing the indicator values by the total or maximum value (Okazawa et al., 2011), hot spot analysis (Kubal et al., 2009) and the unequal weighting method (Kablan et al., 2017).
Many authors recommend the use of participatory methods for weighing the indicators such as the use of multicriteria decisionmaking (MCDM) tools (Evers et al., 2018).It is assumed that, if practitioners and experts are involved in creating an index that they find useful, it is more likely they will trust its results (Oulahen et al., 2015).Furthermore, participation is believed to be a key component in fostering effective disaster risk reduction (Fekete et al., 2021).In the present study, the analytical hierarch process (AHP) was the most common MCDM technique, which corroborates the results by de Brito and Evers (2016).
These authors attributed this preference to the fact that AHP is a straightforward and flexible method.This method was applied separately in 10 papers and together with other methods in 5 papers, totaling 16.0% of the reviewed articles.Among the less common MCDM methods, Promethee (Daksiya et al., 2017) and the analytical network process (ANP) (de Brito et al., 2018) techniques are worth mentioning.
A total of 7 articles used other weighting methods, including the entropy method (Lianxiao and Morimoto, 2019), Delphi technique (Yang et al., 2018b); and expert scoring (Wu et al., 2015).Furthermore, about one-fourth (24.2%) of the papers attributed equal weights and 6.3% did not specify the weighting method used (Table 6b).Some authors preferred not to weight indicators because they assumed that these indicators are equally important for vulnerability calculation (Yoon, 2012), whereas others pointed out that there is insufficient evidence to attribute importance to one factor over another (Fekete, 2011).
In terms of aggregation, the majority of the articles (80.0%) used linear aggregation, followed by geometric aggregation (10.5%) and mixed methods (4.2%).The linear method is useful when all indicators have the same unit or after they are normalized.The geometric aggregation is preferred when the interest is to assess the degree of non-compensation between the indicators.In linear aggregation, compensation is constant, while in geometric aggregation the compensation is lower for indices with low values (Nardo et al., 2008).Nevertheless, the geometric method has a limitation when indicators with very low scores are compensated by indicators with high scores (Gan et al., 2017).It is important to mention other aggregation methods used (5.3%).For instance, Abebe et al. (2018) used the Bayesian Belief Network, which is formed by a graphical network representing the cause-effect relationships between the different indicators (Pearl, 1988).Yang et al. (2018bYang et al. ( , 2018a) ) applied the Shannon entropy method.More recently, Amadio et al. ( 2019) used a non-compensatory aggregation method to compensate the low performance of one indicator by some higher performance of another indicator.Finally, Chiu et al. (2014) used the Fuzzy Comprehensive Evaluation Method to aggregate the indicators.

Uncertainty, sensitivity and validation
Each step of the construction of flood vulnerability indices carries uncertainty (Saisana et al., 2005), which is added to the ones associated with the randomness of flood events (Merz et al., 2008).Therefore, to ensure a better index quality and verify the results' robustness, uncertainty analysis should be conducted.Despite its importance, only 3 (3.2%) of the reviewed papers performed uncertainty analysis: Nazeer and Bork (2019) observed variations in the final results changing input variables; Feizizadeh and Kienberger (2017) and Guo et al. (2014) applied Monte Carlo simulation and set pair analysis, respectively.
Spatial SA was conducted by de Brito et al. (2019) performed spatial by computing the vulnerability class switches when the indicator weights were changed.Only Feizizadeh and Kienberger (2017) performed the global sensitivity analysis.
Although the number of flood vulnerability studies has increased, few studies attempted to validate the obtained outcomes (Fekete, 2009).Of the reviewed articles, only 11 (11.6%)validated the results, mostly using maps with flooded areas (n=7), flood damage (n=3), and expert's opinion (n=1).

Persisting gaps and future research
Despite the increasing number of research on flood vulnerability indices since 2015, a series of persistent knowledge gaps of methodological nature were found in our systematic review.Here, we summarize these gaps and provide a research agenda with needs that should be addressed in the future.
The first challenge refers to the spatial scale as vulnerability is not only site-specific but also scale-dependent (Ciurean et al., 2013).The choice of the spatial scale in the reviewed articles was mostly driven by data availability and hence most of them were applied at the neighborhood level using census tracks.Despite the availability of census data at the country level, there were no studies at the national level and only 8 papers (8.4%) constructed vulnerability indices using data at the basin scale.
Nevertheless, these scales are often used for flood risk management and are necessary to enable the comparability of regions and to define hot-spot areas where intervention is needed (Balica et al., 2009;Fekete et al., 2010).Conversely, studies at the household level were rare in our sample (n=12).Yet, aspects related to the citizens' coping capacities can only be captured at this spatial scale.
An additional issue is the problem of down-or up-scaling that implies different levels of generalization.Hence, multi-level and cross-scale studies are needed.They allow for a better understanding of scale implications, including their benefits and drawbacks.A better understanding of the linkages between urban-rural linkages is also required instead of studying them in isolation.To this end, the framework proposed by Jamshed et al. (2020) could be used.This framework considers, either qualitatively or quantitatively, how rural-urban linkages can influence the occurrence of floods and how these shapes the vulnerability of rural households.It considers rural areas not as secluded units, but rather as interlinked with cities.
A further gap is that indicators related to the populations' coping and adaptive capacity were rarely used.A focus was given to social indicators that increase people´s vulnerability.Similar to the scale choice, the preference for these indicators is driven due to data availability as social indicators (e.g.age, gender) are easily accessible.Nevertheless, the capacity of people to anticipate, cope with, resist and recover from disasters is equally important to understand the risk.In fact, even poor and vulnerable people have capacities (Wisner et al., 2012).Therefore, when dealing with flood vulnerability, other relevant indicators such as risk perception (Carlier et al., 2018), past flood experience (Beringer and Kaewsuk, 2018) are important.
However, data on these are often not readily available, thus requiring local research, which demands time, financial resources and a multidisciplinary team.Indeed, information on citizens' adaptive behavior and perception requires longitudinal or quasiexperimental studies that allow to capturing behavioral dynamics over time (Kuhlicke et al., 2020).For instance, recent advancements have been made by applying geostatistical methods to psychosocial survey data (Guardiola-Albert et al., 2020).
As an alternative, people's risk perception could be derived from widely available data sources, including, for instance, Google trends (e.g.Kam et al, 2019) and twitter statistics (Dyer and Kolic, 2020).Nevertheless, it should be noted that such approaches can be considered only where the use of social media and search engines are prevalent across the society.In countries where the use of digital technologies is not widespread there is the risk that the marginalized population is left out of the analysis.
Still with regard to the indicators used, many studies used variables that thematically overlap with each other.In this context, some indices used more than 20 indicators to measure flood vulnerability and did not apply any technique (e.g.PCA, expertbased) to reduce this number.This can decrease the explanatory power of the index.In this context, besides PCA, potential exist to apply dimensionality reduction techniques such as the t-distributed stochastic neighbor embedding (t-SNE) (Anowar et al., 2021).A further issue is that the reason for variable selection was often not given or it was justified based on previous studies, without taking into consideration the study area specificities or conceptual frameworks.Due to the difficulty and time involved in developing indicators, low-quality databases are normally used (Freudenberg, 2003).For adequate indicators' selection, the analytical soundness, measurability, relevance to the phenomenon being measured and the relationship to each other (e.g.interrelationships and feedback loops) should be taken into account.Furthermore, more theoretically grounded research is needed to generate robust evidence for selecting the input indicators.
All of the vulnerability indices reviewed here are static and represent a snapshot of vulnerability.Hence, they do not capture the complexities and temporal dynamics of vulnerability.Few studies focused on measuring flood vulnerability pos-event.
Nevertheless, the drivers of vulnerability can vary considerably over time.Results by Kuhlicke et al. (2011) and Reiter et al. (2018) show that the exposed citizens (e.g.elderly and children) may be less vulnerable during the preparatory phase of a flood but highly vulnerable during the recovery phase.Hence, incorporating the phase of the flood disaster is key to improving the validity of vulnerability indices (Rufat et al., 2015).To account for temporal context, the indicators can be differentiated according to the different phases of a disaster: preparedness, response and recovery phases.Such a phase-oriented approach could inform variable selection and weighting.In addition to this, there is a need for research looking into future vulnerabilities a forward-looking perspective is needed for preventive flood risk reduction (Birkmann et al., 2013;Garschagen and Kraas, 2010).These could make use of, for instance, population growth projections or by employing tools such as qualitative futuring techniques (Hoffman et al., 2021).Nevertheless, it is important to notice that this can further increase the uncertainty of the vulnerability modelling outcomes.Still, exercises like this can provide a glimpse on plausible futures.
Similar to the selection of the indicators, several articles did not indicate why a specific normalization and weighting technique was chosen.Additionality, some did not explicitly specify any normalization (11.6%) or weighting (6.3%) method.
Nevertheless, the use of arbitrary techniques without testing different methods and their assumptions, increases the subjective judgement error.Hence, it is imperative for further studies to be more rigorous and present the reasoning behind such choices.
Furthermore, there was an over-reliance on the use of the AHP weighting method and studies comparing different normalization and weighting techniques were rare (7.4%).Future research should tackle this by exploring different alternatives for evaluating indicator weights (e.g.expert-based, MCDM, statistical approaches) and compare the findings by means of validation and sensitivity analyses.
A final persisting gap is that few vulnerability indices conducted any sort of validation, sensitivity and uncertainty analysis.
Less than 14% of the studies validated the obtained results.To this end, impact data was often used (e.g.Rezende et al. 2019).
Only 9.5% have conducted sensitivity or uncertainty analysis.This can lead to vulnerability outputs incoherent with the local reality, either over or underestimating the vulnerability.This, in turn, has direct implications for flood risk management.In this regard Fekete (2009) points out the difficulty of finding empirical evidence about vulnerability because vulnerability is multidimensional and not directly observable.Thus, further research is needed on the validation of vulnerability outcomes (including technical and user validation) and analysis of the sensitivity of the contribution of individual indicators to the obtained results.Potential exists to apply global sensitivity analysis, which is already widely applied for building compositeindicators for other fields of study (Luan et al., 2017;Saisana and Saltelli, 2008).
Besides the aforementioned methodological gaps, it is important to emphasize that the theoretical framework adopted influences the methodological choices that are made when constructing vulnerability indices.Even though we have not analyzed the theoretical constructs used by each study, when reading the articles it became clear that several of them do not specify how they conceptualize vulnerability.Furthermore, there are ambiguities in how vulnerability is understood (Kelman, 2018).For instance, some authors consider coping and adaptive capacity as components of flood vulnerability (e.g. de Brito et al., 2018;Feizizadeh and Kienberger, 2017).Others include flood hazard characteristics or exposure (e.g.Carlier et al., 2018;Chaliha et al., 2012) as part of vulnerability.Hence, we argue that a stronger theoretical underpinning of research is needed for producing scientifically rigorous and comparable research.Within this context, future work could investigate how different terminologies and theoretical constructs are defined and applied across different flood vulnerability case studies.
Future reviews could also look into the methodology used to collect information on vulnerability indicators (e.g.survey, public databases) as this influences the choices that can be made at each stage of the index construction.

Conclusions
The present study reviewed 95 articles from 38 countries that constructed flood vulnerability indices.In summary, despite the increasing number of studies and advances made, the review has revealed and re-confirmed a number of persistent knowledge gaps.Temporal dynamics aspects of vulnerability were often disregarded.Only 11.6% of studies focused on indicators that address post-event conditions related to flood damage and consequences and none of them investigated future vulnerabilities.
Coping and adaptive capacity indicators were frequently ignored as obtaining this data demands time and effort.Most did not apply sensitivity (90.5%) and uncertainty analyses (96.8%) nor performed results validation (86.3%).This demonstrates a limitation of the reliability of these indices.It is clear from the literature that the challenge for further research is to foster the development of dynamic vulnerability assessments that consider the coping capacity of citizens and take the uncertainty involved in all steps of the index building process into account, including the selection of indicators, normalization, weighting, and aggregation.This is required in order to advance our understanding of flood vulnerability and support pathways towards flood risk reduction.355

Fig. 1
Fig. 1 Overview of the different steps involved in constructing an index.
for factor extraction.The weights are obtained from the rotated factor matrix since the area of each factor represents the proportion of the total unit of the variance of the indicators that is explained by the factor.Gu et al.(2018) Entropy methodWeights are assigned based on the degree of variation of the indicator values.the contribution of each indicator for the studied problem.Shah et al. (2018) Public opinion They focus on the notion of people's concern about certain problems measured by the indicators.

Fig. 3 .
Fig. 2. Flood vulnerability index studies: (a) Temporal distribution from 2002 to 2019.For the standardized number of articles according to the total number of publications in the WoS database see Appendix A Fig. A1; and (b) Geographical distribution.

Fig. 4 .
Fig. 4. Classification of studies in terms of rural and urban areas and spatial scale.
Fig. A1 Normalized number of flood vulnerability indices and flood articles according to the Web of Science database.For the Flood articles search, the keyword "flood*" was used.360

Table . 4. Most commonly-used flood vulnerability indicators. Only indicators used in at least 5 articles are shown here. This cut- off-point was defined for clarity purposes as more than 600 different indicators were mentioned in the 95 reviewed articles.
(Schneiderbauer and Ehrlich, 2006)cators grouped into social, economic, physical and, coping capacity dimensions.In Table4).This is similar to the results obtained byRufat et al. (2015), who found out that the most used indicators are poverty and deprivation, per capita income, unemployment rate, elderly and children.Nevertheless, widely used indicators authors were not identified or were rarely used in our sample.These include, for example, stress and mental health, hygiene and sanitation, social networks, and experience with floods(Schneiderbauer and Ehrlich, 2006).185