The utility of using Volunteered Geographic Information (VGI) for evaluating pluvial ﬂ ood models

Pluvial ﬂ oods are increasingly threatening urban environments worldwide due to human-induced climate change. High-resolution,state-of-the-artpluvial ﬂ oodmodelsareurgentlyneededtoinformclimatechangeadaptationand disaster riskreductionmeasures but are generallynot empirically testedbecause oftherarity oflocal high-intensity precipitation events and the lack of monitoring capabilities. Volunteered Geographic Information (VGI) collected by professionals, non-professionals and citizens and made available on the internet can be used to monitor the dynamic extent of a pluvial ﬂ ood during and after an extreme rain event but is sometimes considered to be unreliable. In this paper,weexplorethe generalutility ofVGItoevaluatetheperformance ofpluvial ﬂ ood modelsandgain new insights toimprove thesemodels. As background for our research, weuse the capital cityof Budapest, which recentlysuffered three heavy rainfall events in just ﬁ ve years (2015, 2017 and 2020). For each pluvial ﬂ ood event, wecollected photo-graphicevidencefromdifferentonlinemediasourcesandestimatedtheassociatedwaterdepthsatvariouslocationsin thecityfromtheimagecontext.Thesewerecomparedwiththeresultsofa2Dpluvial ﬂ oodmodelthathasbeenshown toprovidecomparableresultsto otherstate-of-the-artinundationmodelsandiseasilytransferredtootherurban areas due to its reliance on open data sources. We introduce a general methodology for comparing VGI with model data by probing different spatial resolutions. Our ﬁ ndings highlight untapped potential and fundamental challenges in using VGI for model evaluation. It is proposed that VGI may become an essential tool and improve the con ﬁ dence in model-based risk assessments for climate change adaptation and disaster risk reduction.


H I G H L I G H T S G R A P H I C A L A B S T R A C T
• Pluvial floods are difficult to model, and most models cannot be empirically tested.• Models combine human, atmospheric, hydrological, surface and sub-surface processes.
• VGI is often presumed to be unreliable, which bars its wider use.• VGI for in-situ evaluation of a generic-type pluvial flood model is demonstrated.• Utilising VGI can improve confidence in flood models and highlight deficiencies.

Introduction
Climate change exacerbates weather and climate extremes in every region across the globe, including heavy precipitation events (IPCC, 2021, Hoeppe, 2016).Pluvial flooding is generally caused by intense or prolonged rainfall generating a run-off discharge and/or volume that exceeds the capacities of man-made and natural drainage systems (Rosenzweig et al., 2018).Such events are often characterised by rapid onset (flash floods) and small spatial and short temporal (sub-daily) scales.As a result, pluvial floods are generally much harder to predict and localise than river or coastal flooding.Although pluvial floods can occur in both urban and rural areas, pluvial flooding is often associated with urban environments where its impacts are typically the most pronounced (Guerreiro et al., 2017;Rözer et al., 2016).
Pluvial flood models are designed to represent rainfall-runoff and inundation processes in settlement areas and serve as important tools for disaster risk management and climate change adaptation.Pluvial flood models for urban applications principally combine dynamic elements of the atmosphere (precipitation, temperature), hydrosphere (surface and sub-surface processes, topography, pervious surfaces), anthroposphere (e.g.urban systems, structure and infrastructure, including blue-green infrastructure, impervious surfaces) and in some cases (in a low rate) even the biosphere (e.g.interception of rainfall by vegetation).Detailed models provide the distribution, extent and intensity of inundation in water depth, duration, surface flow velocity and dynamics during urban floods.With this information, pluvial flood models are critical for understanding, assessing and reliably predicting pluvial flood conditions and their impact with or without adaptation and may also provide the basis for early warning and emergency response.
Pluvial flood models can generally be divided by type.Most commonly, models use either rapid flood spreading algorithms (e.g.Samela et al., 2020) or one-dimensional (1D) or two-dimensional (2D) representations of surface inundation processes based on shallow water equations (Bulti and Abebe, 2020).Topographic detail and distance between different features determine the model resolution and output details (Fewtrell et al., 2008).The interaction of surface water flooding with sewer systems may range from simplified approaches, e.g. on volume accounting of sewer systems, to fully coupled two-dimensional dynamic sub-surface models (Guo et al., 2021).Likewise, the effects of blue-green infrastructure may be accounted for in different ways.
In-situ observations related to urban pluvial floods, for instance, water level observations, are rarely available or -at best -very scarce (Francipane et al., 2021, See 2019).This is partly due to the short duration and local nature of intense rainfall and the inherently rare occurrence of extreme events in practice.As a result, the number and quality of observations from real-life pluvial flood events that can be used to validate state-of-theart urban flood models are also generally limited.Indeed, at many locations, which are, in principle, highly exposed to pluvial floods, no previous records exist at all.
Several direct methods for estimating the flood extent and depths from observational evidence have recently been pursued.They include the use of high-water marks left by the flood, imagery from, e.g.unmanned aerial vehicles (UAVs) (Loli et al., 2022;Giordan et al., 2018) and estimates based on remote sensing information like high-resolution Synthetic Aperture Radar (SAR) images and lidar (Giustarini et al., 2013;Taubenböck et al., 2011).While these techniques are likely to see increased use, currently, such data are still scarce, associated with significant uncertainties (Li et al., 2019) and often insufficient (Brill et al., 2021;Mignot et al., 2019).
Several authors have suggested that crowd sourced (e.g.citizen science) and Volunteered Geographic Information (VGI) could be used to support and improve disaster risk management (Poser and Dransch, 2010), facilitate rapid flood depth mapping (Fohringer et al., 2015), and support the validation of inundation areas and models (Francipane et al., 2021;McDougall and Temple-Watts, 2012;Rollason et al., 2018).Studies by Assumpção et al. (2018) and See (2019) find that the amount of data collected in different VGI case studies is not extensive but still seems to provide an effective form of pluvial flood model validation at urban scales.For example, in a study by Yu et al. (2016), localised flood incidents at the street or house level were reported by locals and collated through a web-based emergency incident reporting portal operated by the government and accessible to the general public.Kutija et al. (2014) used a web page inviting the general public to upload their flood photographs, pin them on the map and optionally write a comment.This information was further supplemented by questionnaires sent to all the residents in the affected areas by Newcastle City Council, asking them to describe the observed flood in and around their properties.Re et al. (2019) also used VGI data collected by the affected community to validate a pluvial flood model, including a compendium of information about various storms from 2014 and onwards and a collection of photographs taken during flood events.Imagery posted on the internet and social media, e.g.press photographs and imagery posted by citizens, also provides an extensive and relatively unexplored source of information.Wiegmann et al. (2021) have reviewed the strengths and weaknesses of using social media data as a source of VGI information for developing and validating urban flood models.In addition to the general weaknesses of VGI data in terms of precision (correctness and reliability, e.g.geolocation, timing) and completeness (e.g.spatial sampling, features), they suggest that an additional risk in using social media as a source of flood information is that it is not inherently reliable.Hence, Wiegmann et al. (2021) suggest that using social media requires a trade-off between precision and completeness since no optimal solution for its analysis is currently available.
This research paper explores the utility of VGI in general and from online sources in particular to evaluate the performance of state-of-the-art pluvial flood models for risk assessment and risk management, for example in the context of climate change adaptation.For such applications, homogeneous data is generally needed for consistency across larger urban areas, or even regions; and model transferability is often also required (Hattermann et al., 2018;Guerreiro et al., 2017).This may impose constraints on model complexity and resolution, which may be further influenced by (limited) data availability.Hence, a fully coupled, twodimensional dynamic surface plus sub-surface model is typically too costly to run at extremely high resolution for even moderately sized areas but may be used at district level to assess the functionality of specific adaptation measures i.e., it is necessary to make a trade-off between resolution and model complexity (Guo et al., 2021;Qi et al., 2021).Meanwhile, digital terrain/elevation models at a resolution of 2 m or below are by no means generally available everywhere.Often such products can only be obtained through commercial vendors at significant cost or are only available with special permissions or for limited areas.In the current study, we explore the use of VGI as means of evaluating pluvial flood models based on middle resolution terrain/elevation models (5-30 m) as such models are freely available and are used extensively both for both scientific and real-life applications including in urban environments.Several authors have systematically investigated the effect of terrain resolution on the quality of urban flood models (e.g., Jiang et al., 2022;Muthusamy et al., 2021;Xu et al., 2021;Fewtrell et al., 2011).In general, they show that high resolution terrain/elevation models are better suited for resolving urban elements than coarser models, which can introduce large uncertainties, key deficiencies, and lead to erroneous flood maps.The also clearly demonstrate the current importance of middle resolution models e.g., 30 m, considering the constraints outlined above.The whole idea of the paper is to provide scientific insights into the usefulness of VGI in this regard, the problems we face, and to address the fact that most pluvial flood models are hardly ever validated at all (Guerreiro et al., 2017).
Using Budapest as our laboratory, we investigate the inherent challenges related to precision and completeness when comparing water levels derived from VGI to modelled water levels.From 2015 to 2020, Budapest suffered substantial pluvial flooding no less than three times, with the largest flood on record taking place in 2015.For each of these events, VGI material, including photographs and videos from various online sources, was identified, processed and ultimately analysed against a well-regarded and common type of pluvial flood model for predicting inundations in urban areas, similar to the one used by Kaspersen et al. (2017) and in the Future Danube Model multi-hazard and risk model suite (Hattermann et al., 2018).This paper is organised as follows.Section 2 (Methods and materials) outlines the methods and data used, including the pre-processing of VGI material from online sources, and introduces a new method for comparing flood depths derived from VGI with flood levels from pluvial flood models.Section 3 (Results) presents our main findings, whereas Section 4 (Discussion and conclusions) discusses the lessons learned from the study and how this can help pave the way for improved use of VGI data for pluvial flood model validation generally.

Methods and materials
Fig. 1 outlines the overall methodology used in this research.For each of the three recent flood events in Budapest, which took place on 17 August 2015, 23 May 2017 and 14 June 2020, pluvial flood model simulations were carried out using a tailored 2D hydrodynamic flood model for Budapest based on MIKE FLOOD software (Section 2.2).All of these simulations were forced by idealised rainfall (Chicago Design Storm; Keifer and Chu, 1957) corresponding to the observed severity of the flood events (Section 2.3).For one of the events (23 May 2017), observed rainfall series from 48 local rain gauges were available and kindly provided by the Budapest Sewage Works (Tibor Rácz, private communication).From the three pluvial flood events, 150 images and videos were identified and retrieved from online sources.Of these, 67 images were processed for water depth estimation (Section 2.4).Finally, we compared these empirically determined water depths to the modelled water depths (Section 2.5 and Results).

Data sources
Observed daily rainfall for Budapest was extracted from the ECA&D repository (Klein Tank et al., 2002).These data were supplemented by a dataset comprised of heterogeneous sub-daily observations of precipitation derived from rain gauges operated by the Budapest Sewage Works (Tibor Rácz, private communication).The digital elevation model (DEM) used by the pluvial flood model was the European Digital Elevation Model (EU-DEM), version 1.1 at 25 m horizontal resolution provided by the Copernicus Land Monitoring Service and urban land cover from the CORINE Land Cover inventory.The urban land cover of Budapest was further characterised using SENTINEL-2 remotely sensed imagery with a resolution of 10 m, courtesy of the European Space Agency (ESA), downloaded from the Copernicus Open Access Hub.Reference values for the soil water infiltration (pervious areas) and different soil types were derived from the US Department of Agriculture (USDA) (USDA, 2016).Finally, photos and videos (i.e.VGI material) from the three pluvial flood events were retrieved from various online platforms such as news outlets and public image repositories based on internet searches.

Pluvial flood model
MIKE FLOOD (MIKE powered by DHI, n.d.) is a collection of state-ofthe-art flood modelling engines that are commonly used for research and commercial applications globally.The embedded MIKE 21 module computes two-dimensional overland flows in response to a heavy precipitation event based on terrain data, e.g. the EU-DEM.The model setup is similar to the one used in Kaspersen et al. (2017) and in the Future Danube multihazard and risk modelling suite (Hattermann et al., 2018).The main inputs are the terrain description (for routing the surface water) and the time series of precipitation and infiltration rates (from pervious surfaces).For the surface roughness we used the default MIKE 21 value corresponding to a Manning number of 32 s/[m 1/3 ].
Infiltration rates at grid cell level were calculated based on parameters from USDA (2016) corresponding to the dominant soil texture(s) in Budapest (mainly sand, loamy sand or sandy loam), combined with estimates of the slope derived from the terrain data (e.g. for sand: 2.7 cm/h for slopes at 0-4 %.For slopes above 16 % the infiltration rate is 0.7 cm/h.For more information, see USDA, 2016).An explicit representation of subsurface flows and the urban drainage system was not included.Instead, we used a conceptual representation of the urban drainage system based on precipitation intensities (Chow et al., 1988;Henonin et al., 2013).For each time step, we reduced the precipitation input at the grid cell level relative to the fraction of impervious surfaces over the entire modelling domain to simulate the effect of an urban drainage system designed for coping with an intense precipitation event with a return period of 2-years.The resulting run-off from all impervious surfaces caused by the modified rainfall input (if any) is subsequently routed between grid cells to account for further losses, reflecting the surplus infiltration capacity of downstream grid cells (pervious surfaces).The methodology has some obvious limitations.Since we only modify the incoming precipitation input at the grid point level, it means we stop accounting for the effect of the urban drainage system when the rain stops.This could potentially lead to local overestimations of the water depth after the rain, as surface water trapped in depressions is not drained away in the model.We screened our results with the help of local experts from the Budapest Sewage Works (Tibor Rácz, private communication) and did not find any significant errors of this type, which can be attributed to topography: the western side of Budapest (Buda) slopes towards the Danube, whereas the eastern side (Pest) spreads out on a flat and mostly featureless sand plain.Likewise, the methodology disregards the exact location of, e.g.manholes and other detailed characteristics of the existing urban drainage system.Here the corresponding errors of the flood extent derived from the model can, in practice, often be assumed to be localised and for gradually more extreme precipitation events where the pipe capacity is exceeded their significance decreases.In order to ensure an accurate representation of the urban surface, impervious surface fractions for each grid cell were inferred from remotely sensed imagery obtained by SENTINEL-2 using the NDVI (Normalized-Difference-Vegetation-Index) method described by Kaspersen et al. (2015).The mapping of pervious and impervious surfaces in Budapest using the NDVI was confirmed by visual inspection and by comparison to CORINE Land Cover.Finally, we modified the DEM to account for the inability of surface water to drain naturally into the Danube due to flood barriers but only through the urban drainage system.Historically, this leads to flooded areas in Budapest alongside the barriers during cloudburst events.

Experiments
From 2015 to 2020, Budapest has suffered various degrees of pluvial flooding due to sudden heavy rainfall (cloudbursts) on three occasions.Table 1 provides an overview of the three events, which here comprise our laboratory.
As indicated in Table 1, the 2015 event was record-breaking and stands as the most intense rainfall event on record in Budapest (Fig. 2(a)).Comparatively, the cloudbursts in 2017 (Fig. 2(b)) and 2020 represent unusual situations occurring, on average, every 2-10 years.Note that the estimated return periods are calculated from accumulated daily rainfall.Since cloudbursts generally occur within a few hours, the estimated return periods are shown in Table 1 (based on daily accumulations) are therefore likely to overestimate the actual frequency of such events.In all three cases, parts of Budapest were flooded by stormwater.
The abovementioned extreme precipitation events were simulated (see Section 2.2) using the Chicago Design Storm method (Keifer and Chu, 1957) in combination with intensity-duration-frequency (IDF) curves (Balbastre-Soldevila et al., 2019;Rosbjerg and Madsen, 2019;Koutsoyiannis et al., 1998) provided by the Hungarian Meteorological Service and based on local measurements.A Chicago Design Storm is a synthetic storm with a T-year precipitation intensity for all possible durations of uniform rainfall events (Fig. 2(c)).Based on this assumption, we constructed five-hour (300 min) time series corresponding to temporally disaggregated (sub-daily) 2-year, 7.5-year, and 500-year events estimated from the daily rainfall data.

Volunteered Geographic Information
VGI material (photos and videos) from the abovementioned pluvial flood events were retrieved from various online platforms such as news outlets and image repositories.While VGI material from, e.g.webcams could have been used to estimate inundation duration and flow velocities, we focus here on water depth estimation.No metadata such as geolocation or timestamp was consistently available alongside the multimedia files.Since the images we retrieved were mostly postprocessed and reduced in size, time stamps were available for less than half the files and could usually only be derived from contextual information.However, street names or

Table 1
Overview of events.Daily rainfall data for Budapest was extracted from ECA&D (Klein Tank et al., 2002).The estimated return periods corresponding to daily rainfall levels are inferred from an extreme value analysis of daily observations for Budapest from 1901 to 2020 using the R package "extRemes" by Gilleland and Katz (2016).A generalised Pareto distribution with a 30 mm/day threshold was assumed for a peak-over-threshold analysis (POT).names of prominent nearby locations could be extracted from the articles and captions of several photos and videos.Furthermore, street names, shop names or landmarks visible in the VGI material could be identified.This information was used to localise the data in a Geographic Information System (GIS).The scenes were then visually compared to Google Street View imagery to verify the exact location of their recording, as shown in the example in Fig. 3. Here, a salient house entrance on the right side of the road and blue signs from a music instrument store are used to verify the photo's location.The water level above terrain was manually estimated by comparing the water table visible in the recordings to objects of known size in the photo or video, e.g. a car, curbside, trash can, house entrance, etc. Fig. 4 illustrates the method used for maximum water depth estimation by visual comparison to reference objects and presents some of the challenges of this process.Estimating the water depth at the bike wheel in the foreground results in a 25 cm difference in water depth when compared to an estimation at the car wheel.Visual estimation of water levels will inevitably introduce inaccuracies depending on the quality of the photo, the perspective and the availability of reference objects in the scene.Likewise, local topographic differences like depressions of the terrain under the water table can introduce uncertainties into the estimation.Ideally, it is always recommended to carry out a detailed analysis of the baseline terrain morphology at the highest possible resolution, preferably down to <1 m, to account explicitly for these uncertainties.Conversely, the representation of the terrain morphology offered by the EU-DEM is too coarse to be used for correcting VGI estimates directly, and the associated uncertainties must be addressed in a different way (see Section 2.5).
As mentioned above, accurate timestamps for the recordings were not available.Fig. 2 illustrates the temporal evolution of the 2015 and 2017 rainfall events, which in both cases peaked between 16:30 and 18:30.Most of the images used for water depth estimation on 17 August 2015 can be placed in the later stages or just after the peak of the rainfall.This is evident from the heavy rainfall and rain gear visible in the photos and in some instances by the waning daylight (sunset this time of year is at 8 pm in Budapest).On 23 May 2017, images can particularly be placed around 17:30, when the rainfall intensity peaked.The 2020 rainfall event reached Budapest around 13:00 and lasted until 15:30 (data not shown) and the VGI material covers roughly the same period.While there is reasonable agreement between the timing of the images and the observed maximum rainfall intensities, this does not guarantee that our VGI materials are obtained at the peak of the water depth.Since this is dependent both on the terrain, the rainfall distribution, and other factors, it is likely that the water depth could have peaked a little later; this seems to be the case in 2015.Conversely, one could argue that observers would be particularly enticed to make recordings (photos and videos) close to the time when the flood is at its highest.Either way, it is clear that in some cases, a temporal bias might lead to an underestimation of the water levels compared to modelled water levels that inherently represent the maximum depth during  events (Section 2.5).The entire process of comparing photos, verifying locations and estimating water levels took approximately 48 h in total of manual labour (or about 16 h of manual labour per event).
Estimated water depths from a total of 67 photos and videos were successfully located in Budapest across the three flood events (Table 2).The majority of the VGI material has been geo-located in the northern part of the city centre on the Buda and the Pest side of the river Danube (Fig. 5).83 photos and videos were unsuitable for water level estimation because of missing localisation, reference objects in the scene or unclear local topography.

Comparing modelled and observed water depths
For validation of the pluvial flood model, water depth information at 67 VGI data locations was available.A comparison between the modelled and observed maximum water depths (reconstructed from the VGI material) was performed for each of the three flood events, with varying data points for each event (Table 2).In both instances (model and VGI estimates) the maximum value for the water depth is uncertain.In the case of modelled values, the maximum is roughly defined by the (narrow) peak of the assumed precipitation distribution (Chicago Design Storm, Section 2.3), and based on the aggregated rainfall observed during the events in 2015, 2017 and 2020.The model does not replicate the actual observed distributions one-to-one but is designed to generate a maximum water depth (Fig. 2).There is an additional contribution to the uncertainty from, e.g. the representation of the drainage system.This reflects one of the fundamental challenges faced by pluvial flood modellers.Since intense rainfall events are relative rare and unalike, while extensive local rain gauge networks are scarce, detailed spatio-temporal information is generally not available, and cannot be used to reliably characterise rainfall events.Instead, synthetic rainfall distributions are used.In the case of VGI values, they are assumed to approx.represent the maximum water depth, but since time information was mostly inaccessible this can only be asserted to some degree (Section 2.4).Combined, we need to account for the fact that there could be a mismatch between modelled and observed water depths.
Rasters of the modelled maximum water depths were for that purpose overlaid with the corresponding VGI point data.Around every VGI point, four circular buffers with radii of 25, 50, 100 and 150 m were drawn (see Fig. 6).Within these buffers, the best matching water depth values for each VGI point were extracted and compared against the VGI-based water depth estimation (Fig. 8).We adopted this approach to account explicitly for the fact that any flood model's effective resolution is generally lower than its nominal resolution when factoring in uncertainties related to, e.g. the input data, modelled scenarios and the model itself.In practice, this approach also accounts for the abovementioned mismatch between the flood depth represented by a specific photo and the modelled flood depth, caused by varying locations, variations in time between photo and peak extent and  the reading of flood depth from the photo.More often than not, these divergences are not taken into account when comparing flood maps to in-situ observations at grid cell or near-grid cell levels and can lead to erroneous conclusions about model performance.In this study, the nominal model resolution is defined by the DEM (25 m horizontal resolution), as is often the case.For each event and buffer size, the number of VGI points with modelled maximum water depth was counted, and correlations were calculated between the VGI and the modelled water depth values (Table 3).
A detailed comparison between modelled and VGI reconstructed water levels immediately revealed a few locations where our pluvial flood model simulations fail to reproduce the observed flooding.For example, during the 2015 event, the photographic material indicated a water depth of approx.90 cm under a bridge in the city centre, resulting in submerged cars.A similar result near the same bridge is found for the 2020 event.However, since the local depression under the bridge is not represented in the (relatively coarse) DEM used by our pluvial flood model, this is not captured in the flood simulations.The issue of over-and underpasses not being properly represented in pluvial flood models is well known and may have serious implications for the ability of such simulations to realistically depict the flow of the water during a flood, even outside the immediate vicinity of such features.One way to alleviate such errors is to use very high-resolution and/or hydrologically corrected DEMs.In this study, neither of these options were available.A special DEM at 5 km resolution was built for us by the Department of Geodesy, Budapest, since an existing surface model was unavailable including hydrological corrections.A detailed analysis of the generated DEM unfortunately revealed severe errors, which barred its use.We did not pursue commercially available alternatives, since this would have made it very difficult to compare our results with those found in other studies.Finally, video footage from the 2017 flood event indicates considerable local differences in water levels within a small area that is not resolved even at the nominal model resolution.While we extracted the maximum observed water depth indicated on the footage (approx.35 cm), this illustrates some of the uncertainties associated with the VGI reconstructed water levels.

Results
The highest number of VGI data points that were available for comparison with our modelled data is found for the pluvial flood event in 2015 (28 data points; Table 2), followed by the 2020 event (22 data points) and the 2017 event (17 data points).For the 2017 event, where in-situ data from a large number of rain gauges was available (similar data was unavailable for the other two events), Fig. 7 depicts the spatial relationship between the simulated flooding (orange colours) and the distribution of the observed precipitation intensities.The blue shading indicates a reconstruction of the precipitation pattern (by interpolation) corresponding to the accumulated precipitation recorded by a network of rain gauges (black circles, the size and scaling correspond to the accumulated precipitation intensities).A cluster of VGI data locations (green circles) is located at the centre of the flood event.During all three events, photos and videos were predominantly available from the most affected areas, i.e. north of the Budapest city centre.For the two smaller events in 2017 and 2020, the locations of VGI data points are somewhat equally distributed on both sides of the Danube, whereas for the 2015 event, observations are located mainly on the eastern (Pest), mostly flat part of Budapest.Fig. 8 summarises the results of comparing the maximum water depths derived from the pluvial flood model and VGI, respectively, assuming different buffer sizes (see Section 2.5).The radii of the buffers (from 25 m to 150 m) are shown with different colours and symbols, whereas the coloured lines indicate the best linear fit to the data.Correspondingly, Table 3 presents the results of a linear correlation analysis using different buffer sizes.Overall, the best agreement and most significant correlations (>0.6) are obtained with the larger buffer sizes (100 m and 150 m).Meanwhile, the most mismatches, as in the case of flooding below a bridge which was not adequately captured by the model/DEM (the 90 cm water depth point in Fig. 8, left) and the weakest correlations (0.1-0.4), are found for the smallest buffer sizes (25 m and 50 m), where model and observations are compared at grid cell or near-grid cell level.In general, the results found for the 100/150 m and 25/50 m buffers are quite similar.This suggests that the "effective" horizontal resolution of the pluvial flood model, taking all the associated uncertainties mentioned above into account, is likely to be around 100 m as opposed to the "nominal" resolution provided by the underlying DEM (25 m).Analogous results (effective < nominal resolution) would have been found using a higher resolution DEM able to resolve the urban elements (1-2 m), as the concept of an effective resolution combines different sources of uncertainties, although it is likely that the effective resolution would be improved.
There is also a better agreement between the pluvial model and observations for the more significant pluvial flood events (2015/2017).This is arguably expected given the properties of the pluvial flood model used in this study and is likely to be model-dependent (Section 2.2).Hence, the relative effects of adopting a simple and conceptual urban drainage system model rather than a heterogeneous, well-calibrated and fully coupled two-dimensional dynamic sewer model are generally most significant for  smaller rain events.Whereas for very extreme events like in 2015, the surface water routing generally plays a relatively more important role.Similarly, the vertical precision of the DEM is relatively more important in the case of low water depths where, e.g., errors in the DEM are also more clearly felt.

Discussion and conclusions
In this study, we demonstrate the use of VGI as means of in-situ evaluation of a pluvial flood model.While advances have been made in terms of using, e.g.UAVs, direct local measurements of water flows and water  depths recorded during pluvial flood events are still very scarce.This scarcity is due to the rareness of such extreme flood events and their localised occurrence, even within a city, and the typical embedding in a dedicated experiment or opportunity instead of in systematic records.As a result, it is inherently challenging to assert the skill of pluvial flood models used for local disaster risk management and climate adaptation against actual observations.Visual evidence in the form of VGI like pictures and video footage from newspapers, social media, etc. could principally help alleviate these challengesas a complement to or in the absence of other data sourcesto improve confidence in pluvial flood models and are in many places available for recent flood events, as exemplified for the city of Budapest, which has seen three instances of severe pluvial flooding since 2015.However, VGI data may be associated with significant uncertainties for their precision and completeness.
In the current investigation, the localisation of the VGI data points was manually determined through visual comparison with Google Street View imagery.Out of 150 VGI photographs and videos collected during the three different pluvial flood events in Budapest, only about one-third (67) could be reasonably localised and used for water depth estimation.The main reason for not including all available VGI data points was missing or uncertain information regarding the quality, geolocation of the images or lack of nearby reference objects for water depth estimation.Regarding the latter, the potential availability of reliable and unambiguous reference objects adds an additional layer of uncertainty.Estimates of the water depth based on, e.g. the assumed size of a specific type of wheel or the height of a person's knee, are inherently associated with uncertainty, and results may be further skewed by the perspective used in photographic evidence unless corrected.Machine learning approaches for image segmentation and water level estimation as described in Moy de Vitry (2019) could alleviate issues of uncertainty introduced by human estimations.Further issues may also arise from the (lack of) time stamping of the VGI data points and whether an estimated VGI water depth represents the global maximum at a specific location, which is the quantity often retrieved from pluvial flood model simulations and represented in flood maps.That said, in many cases, the uncertainties associated with visual water depth estimation are likely to be comparable with or even exceeded by the (effective) vertical, horizontal and temporal resolution of even high-resolution pluvial flood simulations.
As shown above (Fig. 5), VGI data points represent a "data collection of opportunity".Nevertheless, as demonstrated by the three rainfall events in Budapest, the locations of these observations are arguably not entirely random.In fact, since VGI material is more likely than not to be centred on public and central locations, which represent high exposure or inherent vulnerability, one might argue that given enough samples, such data promises fair coverage of the most relevant sites.This coverage issue is, to some extent, illustrated in Fig. 7, where most of the VGI data points collected from the 2017 event are located within a limited area extending on both sides of the Danube.A single point also covers the two additional highimpact areas, where the second-and third-highest precipitation sums of 60 mm and 59 mm were observed.Conversely, no images were found nearby the fourth-highest precipitation sum observed in the northwestern part of the city, where flooding should have occurred according to the pluvial flood model simulations.This part of the city is elevated, so it is possible that rain falling on the slopes has drained into the central area next to the Danube, where flooding was then caused.The point is that we do not have any insights on this from the VGI data we collected; and that improved sources of VGI data, a larger sample or supplementary information would be needed to resolve this.One could for example use webcams in cities to estimate flow paths, velocity and duration.Likewise, Leitão and Peña-Haro (2022), Hao et al. (2022) and Moy de Vitry (2019) propose machine learning approaches to estimate flow velocity and water depth based on VGI and surveillance videos during pluvial flood events for adding additional information for model validation.
Fig. 8 compares the VGI observations to our model simulations.The figure shows that the modelled maximum water depths seem to agree well with the observations for buffer sizes of at least 100 m (see Section 2.5).
We here introduce a simple methodology for comparing modelled (gridded) and observed (point) spatial data that is generally applicable beyond the current study.Comparing gridded model data to station data representing complex dynamical processes at very detailed scales is notoriously tricky due to model and observation uncertainties and biases (e.g.whether the available VGI and model estimates accurately represent the maximum water depth) and especially so for extreme convective precipitation (e.g.Larsen et al., 2016;Rasmussen et al., 2012).As a result, simulated flood patterns may be displaced or skewed compared to observations, and thus comparing an observation only to the nearest model grid point may result in the wrong conclusion.Considering a cloud (buffer) of grid cells surrounding our VGI data points with radii ranging from 25 mor essentially a single grid cellto 150 m allows us not only to account for this spatial uncertainty but also, in some sense to quantify the confidence level of the pluvial flood model evaluation.Moreover, we factor in not only the model uncertainty but also the abovementioned uncertainties related to the VGI data, which may be difficult to quantify in their own right.For this aim, we estimate an effective model resolution, which will nearly always be coarser than the nominal model grid cell resolution, even for highly resolved surface models.Since economic damage cost assessments often rely heavily on flood mapping to determine the exposure of socio-economic assets (Kaspersen and Halsnaes, 2017;Merz et al., 2013;Merz et al., 2010), this suggests that flood maps should generally be resampled to avoid propagating potential biases down the line.
Finally, the comparison with VGI data underlines the difficulties in modelling small-scale topographic features in an urban setting, such as representing water depths in underpasses and under bridges.These underground features are usually not adequately represented in the moderately resolved DEMs typically used for risk assessment and management, despite their critical importance for flood modelling (Lindsay and Dhun, 2015), and require DEM editing (Houston et al., 2011).As highlighted above, VGI data were instrumental in revealing several limitations in the pluvial flood model, which failed to reproduce the two most significant water depth estimates for the 2015 and 2020 events since the associated underpass is not included in the DEM.This is perhaps the most significant utility of VGI as means of evaluating pluvial flood models.While the exclusion of these points does not significantly change the results of our current analysis (Fig. 8, Table 3), there is no doubt that this sort of information is beneficial for local authorities and will aid in improving pluvial flood models, e.g. by informing morphological terrain analyses and furthering the development of improved and more reliable DEMs and pluvial flood models in general.
While VGI data hold significant potential as means of evaluating and improving pluvial flood models, the process of locating relevant visual material from online platforms (some of which are protected by intellectual property rights or GDPR) is very time-consuming and will greatly benefit from further advances in computer vision and automatic flood detection from photographs.Hence, Barz et al. (2021) recently proposed an automatic filtering approach based on machine learning techniques to help find Twitter images that would be relevant for one of the following information objectives: assessing the flooded area(s), the inundation depth(s), and the degree of water pollution.Instead of relying purely on textual information, the filter directly analyses the image contents.Likewise, in our study, comparing photos, verifying locations and estimating water levels took approximately 48 h in total of manual labour (or about 16 h per event), whereas Zamir and Shah (2010) have proposed a more automated approach.
In conclusion, our findings have highlighted an untapped potential but also important challenges in using VGI to evaluate pluvial flood models.Following recent developments in data mining techniques based on artificial intelligence and machine learning, there is no doubt that the availability of VGI data from media and social media platforms, through crowd sourcing etc., will grow exponentially.As exemplified in this paper, this will offer new and unique opportunities and help improve the confidence in model-based risk assessments for climate change adaptation and disaster risk reduction.

Fig. 2 .
Fig. 2. Examples of rainfall intensities observed during the peak of the 2015 (a) and 2017 (b) cloudbursts, recorded by rain gauges operated by the Budapest Sewage Works (Tibor Rácz, private communication).(a) depicts data from the "RAKP" station, while (b) shows data from the "ZSEM" (lighter blue) and "ZSIG" (darker blue) stations.For comparison (c) shows an idealised Chicago Design Storm corresponding to an aggregated 107 mm of rain over 5 h.As indicated by the "*" a few minutes of rainfall intensities > 150 mm/h is not shown.

Fig. 4 .
Fig.4.Estimation of water depth using reference objects with an example of the challenge of local topographic heterogeneity.Estimating the water depth at the bike wheel in the foreground results in approx.25 cm difference in water depth when compared to an estimation at the car wheel.Perhaps the biker is in a lower part of the road?

Fig. 5 .
Fig. 5. Map of VGI locations in Budapest with district borders.

Fig. 6 .
Fig. 6.Example of buffers around VGI locations for modelled and inferred water depth comparison.

Fig. 7 .
Fig. 7. Distribution of observed precipitation over Budapest (numbers, scaled circles and blue shaded areas) and modelled pluvial flood extent (orange colours) during the 2017 event.Green circles indicate the location of the VGI data points.

Fig. 8 .
Fig. 8.Comparison of modelled maximum water depth values with water depths estimated from VGI material.Four buffer sizes around the VGI location are drawn for each of the three events, and the best matching value from the modelled data is extracted (see Section 2.5).The lines indicate a linear fit to the data.Colours and shapes correspond to the three flood events.

Table 2
Total number of analysed VGI data sets and those used for water depth estimation.

Table 3
Correlation coefficient of best-matching values from the modelled water depths and VGI estimates for different buffer radii.