Processing of nationwide topographic data for ensuring consistent river network representation

Increasing river floods and infrastructure development in many parts of the world have created an urgent need for accurate high-resolution flood hazard mapping for more efficient flood risk management. Mapping accuracy hinges on the quality of the underlying Digital Terrain Model (DTM) and other spatial datasets. This article presents a processing strategy to ensure consistent adaption of countrywide spatial datasets to the requirements of hydraulic modelling. The suggested methods are automatized to a large extent and include (i) automatic fitting of river axis positions to the DTM, (ii) detection of culverts and obstacles in the river channel (iii) Smooth elimination of obstacles by interpolation along the river axes (iv) geometric detection of water-land borders and the top edge of embankments for (v) integration of the submerged river bed geometry into the DTM. The processing chain is applied to a river network (33880 km) and a DTM from Airborne Laser Scanning (ALS) with 1 m spatial resolution covering the entire territory of Austria (∼84000 km2). Thus, countrywide consistency of data and methods is achieved along with high local relevance. Semi-automatic validation and extensive manual checks demonstrate that processing significantly improves the DTM with respect to topographic and hydraulic consistency. However, some open issues of automatic processing remain, e.g. in case of long underground river


Background and Motivation
The last years and decades have seen numerous extreme river flooding events in Europe which have caused damage to persons and infrastructure. Changes of the frequency and magnitude of floods have been observed on a continental scale (Blöschl et al., 2015) and globally (Douben, 2006;Najibi and Devineni, 2018;Ceola et al., 2014). Among other factors, climate change has been identified as an important driver (Bertola et al., 2021;Blöschl et al., 2019;Alfieri et al., 2017). These changes require authorities to extend and adapt flood protection measures in their respective administrative domains. Citizens and other stakeholders such as insurance companies also have strong interest in accurate flood risk assessments. Accordingly, there is a need for detailed and up-to-date flood hazard maps based on hydraulic simulations. While in the past computational requirements have limited the use of high resolution data to small areas, this is no longer the case (Horváth et al., 2016;Buttinger-Kreuzhuber et al., 2021). Using region-or country-wide data at a resolution of meters or even below has become feasible.
Hydraulic simulations require spatial data as inputs, including Digital Terrain Models (DTMs) for specifying the geometry of the flow domain and vector data of the river network for specifying boundary conditions (e.g. inflow hydrographs). These datasets in their original forms tend to be inconsistent due to different acquisition techniques and temporal decorrelation. Additionally, DTMs are often created for other purposes and thus do not initially fulfill all requirements of hydraulic modelling. However, DTM quality and resolution has massive impacts on further processing results (Chaplot, 2005;Lin et al., 2010;Sørensen and Seibert, 2007;Xu et al., 2016).
The aim of this paper is to develop a procedure that prepares and augments extensive spatial datasets to make them suitable inputs for flood modelling. Given potentially very large datasets, the most crucial question is the possible degree of automatization while still reliably adapting to different landscape forms and river sizes.

State of the art and contributions
Due to the importance of flood risk mapping, there is extensive literature on the acquisition and preparation of dedicated spatial data. Recently, more and more studies use spatial data with high resolutions of around a meter or less for flood modelling. These are often based on Airborne Laser Scanning (ALS) data (Muhadi et al., 2020;Mandlburger et al., 2011;Woodrow et al., 2016), but also on other techniques such as photogrammetry from Unmanned Aerial Vehicles (Hashemi-Beni et al., 2018). Because of the rather costly data acquisition per area and the computational requirements, many of these studies cover small areas, e. g. individual river reaches or catchments up to several 100 km 2 . Thus, extensive manual editing as part of the workflow is not of major concern, since the effort remains low.
When it comes to larger areas such as whole provinces or countries, there are hybrid approaches locally refining an extensive, low-resolution height model in settled areas where a higher level of detail is needed (Leitão and de Sousa, 2018). Several studies compare and evaluate the implications of using DTMs or, more generally, Digital Elevation Models (DEMs) with different spatial resolutions (Mohanty et al., 2020;Muthusamy et al., 2021;Saksena and Merwade, 2015). Generally, investigations on a regional or continental scale usually rely on DEMs with comparably coarse spatial resolution (Hutchinson and Dowling, 1991;Alfieri et al., 2014;Fleischmann et al., 2019;Jarihani et al., 2015;Li et al., 2020), e.g. from SRTM (Yan et al., 2019;Lehner et al., 2008;Geoscience-Australia, 2015) or from TanDEM-X (Mason et al., 2016). The position of the drainage networks and river/valley lines is thereby computed with flow-routing algorithms based on the respective Elevation Model (Tribe, 1992;Rieger, 1993;Soille et al., 2003;Hou et al., 2011;Poggio and Soille, 2012). The procedure is very sensitive to pits in the DEM, so these have to be removed beforehand (Soille et al., 2003;Tianqi et al., 2003;Grimaldi et al., 2007;Poggio and Soille, 2012). Some existing software packages such as ANUDEM (Hutchinson, 2011) cover the respective processing chains.
In case of using higher resolution DTMs, additional issues have to be taken into account most notably surface roughness (Lindsay et al., 2019) and infrastructure or other features obstructing possible underpasses thereby creating erroneous pitches (Barber and Shortridge, 2005). The latter is often solved with depression-breaching which avoids negative influence of an obstacle on flow routing without geometrically removing it from the DTM (Rieger, 1998;Soille et al., 2003;Lindsay and Dhun, 2015). The specific task of identifying and eliminating bridges obstructing the river course was treated in Sithole and Vosselman, 2006. Apart from that, also the detection and description of river embankments from DEMs is covered in the literature (e.g. Krüger and Meinel, 2008;Köthe and Bock, 2009;Casas et al., 2012;Sofia et al., 2014). Steinfeld et al., 2013 provide a comparison with image-based detection approaches. The focus of these mostly local studies is on the detection of levees and other anthropogenic features clearly elevated over the floodplains.
The main contribution of this study is the combination of a high level of detail with large spatial coverage. Our aim is to design, execute, and evaluate methods to process heterogeneous 1 m grid width terrain elevation data and heterogeneous stream line data to provide a compound homogeneous dataset ready for hydraulic simulation for an entire country (Austria). The methodology intends to include large streams as well as small, less regulated rivers given by a pre-existing vector river network which is preserved in terms of topology throughout all geometric corrections and adaptions.

Data and context
The developed workflow was applied to datasets covering the whole territory of Austria (∼ 84.000 km 2 ) with a river network of 33880 km as part of the project HORA 3.0. The goal of this project was to create country-wide high-resolution flood hazard maps for different return periods based on instationary 2D hydraulic simulations (Buttinger-Kreuzhuber et al., 2021;Waser et al., 2011;Visdom, 2021).
Accordingly, the project area corresponds to the whole territory of Austria along with the immediate border regions (c.f. Fig. 1). Except for a couple of rivers along the northern border, it mainly contains catchment areas of two rivers, the Rhein (Vorarlberg) and the Danube (rest of Austria) with its tributaries Mur and Drau south of the main Alpine ridge. Elevations vary between slightly over 100 m in the Pannonian Basin close to the eastern border and nearly 3800 m in the Central Eastern Alps.
For the complete area, a DTM and a Digital Surface Model (DSM) with a spatial resolution of 1 m was available. The underlying point clouds for these models were acquired in various Airborne Laser Scanning (ALS) campaigns between the years 2003 and 2013 by the Austrian federal states. Therefore, inconsistencies are to be expected due to both temporal decorrelation and different terrain modelling workflows.
The most important vector dataset is the river network (Line SHP file) as used by the Austrian administrative authorities. Similarly to the raster models, the nationwide dataset has been put together from separate datasets acquired by the federal states. Due to different acquisition techniques, level of detail and accuracy of the river axes vary between federal states but often also depend on river size. The total length of all river axes amounts to approximately 40.000 km. However, only those with a total catchment area of at least 10 km 2 are considered in this work, altogether 2735 river lines. Other important country-wide vector datasets used in the project are land-use and buildings from the Digital Cadastral Map as well as lakes and power plants. Additionally, local datasets such as measured river cross sections or other bathymetric data served as inputs to be integrated into the DTM (c.f. 3.5).

Requirements
Besides the need for consistent and accurate spatial data, there are also some specific formal requirements. These come from the model and the framework for hydraulic modelling in order to ensure compatibility and computationally efficient processing.
The hydraulic model used in flood hazard mapping is based on the shallow water equations (SWEs, c.f. Buttinger-Kreuzhuber et al., 2019). One popular numerical discretization of the SWEs is the finite volume method, where the variables water depth and discharge are represented by their averages on the computational grid. Efficient and robust numerical schemes together with massive parallelization, e.g. on GPUs, enable the processing of large areas at high spatial resolution (Horváth et al., 2016). However, the accuracy of the hydraulic model hinges on the accuracy of the input data, including the hydrological flood discharges and the DTM. As the simulated inundated areas tend to be very sensitive to the DTM, a precise and reliable DTM is of paramount importance.
In this study, the DTM is provided in 2.5D, so it can be expressed as a function f : Z = f(X, Y) assigning exactly one height value Z to every position (coordinates X, Y). Thus we assume that the domain of the function f is simply connected. If this is not the case, gap filling has to be applied first. It is furthermore required to be consistent throughout the simulation domain. Consistency includes the removal of artifacts from measurement and interpolation errors. In fact, not only errors in the DTM acquisition can lead to problematic results of the simulations, but also actually existing objects such as bridges, which can potentially act as incorrect barriers since the 2.5D representation is not able to map 3D underpasses correctly. Upstream of the barrier, the water levels are overestimated while inundated areas are underestimated downstream. These kind of objects have to be removed from the DTM, ensuring that the water flow along the river is not erroneously restricted. However, there are exceptions for culverts under settled areas, under other water bodies and retention dams. Since, in these cases, cutting the DTM would not lead to meaningful results, a separate dataset holding the respective river segments was created instead.
Additionally, many common DTM acquisition techniques are not capable of providing bathymetric data. For example, topographic Airborne Laser Scanning (ALS) of water bodies either leads to void areas or to data representing the water surface. This is because inundated parts of the river bed can not be measured due to the absorption of infrared light in the water. In order to fully represent the river bed topography, bathymetric data from other sources were integrated which resulted in a hydrologically enforced DTM (DTM-H).
Another crucial input for the hydraulic simulation are hydraulic boundary conditions (BCs). At the origin of a river and wherever a river enters or leaves the simulation domain, an inflow or outflow BC has to be specified. The locations and cross-sectional widths of the in-and outflows were derived from the river center lines and the river banks. Alignment of these vector datasets with the river course in the DTM ensures a correct application of the BCs. The inflow/outflow hydrographs themselves were estimated by hydraulic modelling.

Methods
A common property of nearly all spatial datasets is their heterogeneity (c.f. Section 2). Processing thus has to ensure compatibility between existing datasets before deriving new ones from them.
The raster DTM has a central role in the workflow as it covers the entire domain. In order to make it processable, the DTM is organized in tiles of 10 km by 10 km. The most important input vector dataset is the river network representing the axis of each river as one line feature.
Since most further working steps are based on a combination of DTM and river network, discrepancies between these two datasets have to be corrected. This is realized by adapting river axes to the geometric properties of the DTM (Section 3.1). Subsequently, a proper DTM representation of the river courses is ensured by finding and eliminating obstacles (Sections 3.2 and 3.3). Assuming an unobstructed river bed representation, river embankments were detected based on the geometric cross section properties (Section 3.4). These enable the application of boundary conditions and the integration of river bed geometry into the DTM to get a DTM-H (Section 3.5). Fig. 2 provides an overview of the methodology which is explicitly described in the following subsections. Processing and analysis were carried out primarily using the software packages Opals , Python, Matlab and QGIS.

Positional correction of the river network
The original DTM and river network datasets show clear geometric discrepancies, meaning that the river axes as indicated by the Shape file only partly overlap with the river bed according to the DTM. Harmonization of these two datasets has highest priority since all following working steps (3.2-3.5) rely on it.
The reasons for these discrepancies are mainly temporal decorrelation and too coarse digitization of river lines, the latter especially for minor rivers. The assumption that the river network is the dominating source of discrepancies is confirmed by the fact that deviations often amount to several metersthis is significantly larger than the accuracy of ALS data which normally does not exceed a few decimeters (Kraus, 2007;Pfeifer and Briese, 2014). Accordingly, it was decided to correct the river axes while keeping the DTM unchanged.
Another key argument for always correcting river axes is that changing a 2D line geometry depends on far less assumptions than changing the whole river bed, river banks and their flood plain in the DTM. Since approximate river axis positions were already available, a sequential, cross section based approach was developed for the spatial correction. The node structure of the river network was densified to ensure a maximum node-to-node distance is not exceeded. For each node, a cross section normal to the original river axis was derived (Fig. 3,4). The height profile along these cross sections was sampled from the DTM and smoothed. This function is called h(d), where d is the signed distance orthogonal to the river axis.
The actual correction starts from the first point of each river axis, i.e. the source in case of natural rivers and proceeds in downstream direction. For each cross section, six weight functions are formulated in order to account for several geometric criteria. The river axis position is then corrected to where the maximum of the product over all weight functions appears. In the following, the individual weight functions are described in detail. They are based on geometric and hydraulic reasoning as well as on test analyses over limited areas.
• Cross section form: The center of the river bed is expected to be a local height minimum of h(d). The weight function therefore is essentially a ratio of the second derivative h ′′ (d) and the first derivative h ′ (d) where β 1 has the purpose of avoiding division by zero and shifting the balance between the influence of first and second derivative. This function is maximized in case of a clearly positive h ′′ (d) and with h ′ (d) close to zero which are both given at local height minima.
• Height-dependency: At least in a local neighbourhood, lower points are more likely to be located in the river bed and therefore receive a higher weight. The related weight function was designed with an exponential expression which enables adjusting the influence of this criterion by changing the base β 2 (with β 2 > 1). (2) All points above the mean height thus get a weight between 0 and 1 while the points below have clearly higher weights.
• Initial approximation: In most cases, the original river axis is not more than several meters off the "real" river bed according to the DTM, making it a viable coarse approximation. Therefore, a Gaussian function centered in the middle of the cross section is added as another weight function: Compared to the other criteria, w 3 is less discriminative. It mainly plays a decisive role if other weight functions are similarly high for large parts of the cross section (e.g. wide, flat river beds).
• River axis prediction: Another approximation for the river axis position is found by estimating a cubic function through the prior four already corrected positions. The intersection d i of that extrapolated river course with the current cross section is used as the center point of another Gaussian function (c.f. Eq. 4).
The value of σ 4 was chosen slightly smaller than σ 3 in order to act as a counterbalance to w 3 (Eq. 3) for bad original river axes. • Water flow properties: Water tends to flow along direct paths without ascents. Thus further weight functions are introduced to prefer a short horizontal path from the previous cross section (w 5 ) and a cheap path in the sense of having no significant ascent on the way (w 6 ). Both weight functions are again formulated in a similar way as w 2 (Eq. 2). The exponent of w 5 depends on the horizontal 2Ddistances (s(d)) from the previous river axis point. A difference from the mean value is chosen in order not to make the criterion too discriminative. All distances above the mean value result in weights between 1 and β 5 .
For w 6 , all positive height differences along the horizontal path are accumulated (δh + cum (d)). Due to DTM accuracy and resolution, δh + cum (d) tends to be slightly above 0 in practice. A threshold δh max is introduced to assign weight 0 to all points exceeding a tolerable cumulative ascent. The threshold is determined depending on DTM properties as well as on the distance between cross sections. It makes Fig. 3. River axis with densified nodes and the orthogonal cross sections.
w 6 a very discriminative criterion since it usually excludes all points outside the river bed, even if they are lower in elevation.
Moreover, an option was added to skip a cross section in case it contains no single point that can be reached without major height gains on the path (δh + cum (d) > δh max ∀ d). That way, detours around erroneous obstacles in the river bed (c.f. Section 3.2) are avoided.
Finally, all weight functions are combined by multiplying them.
The multiplication ensures that exclusion (w k = 0) of a certain cross section part due to one single criterion cannot be overruled by others. If the resulting function has a maximum larger than 0, the corresponding position d max in the cross section is introduced as a new point of the corrected river axis and the same procedure is repeated for the next downstream cross section.

Detection of obstacles, bridges and culverts
After ensuring correspondence between the river network and the DTM, it is still not guaranteed to have monotonously falling height profiles along the river lines. Significant local height increments are caused by different kinds of obstacles in the DTM representation of the river. These obstacles can be existing objects such as bridges or other buildings. Many of them also originate from interpolation effects since the water surfaces are often insufficiently covered by ALS point clouds. Especially forested river banks tend to favour discontinuous representations of the water surface.
These obstacles are problem for simulations using 2.5 D DTMs which do not allow the modelling of full 3D information (e.g. bridge underpasses). Consequently, all objects above the river bed act like dams and therefore have to be detected and then removed or treated explicitly.
The detection is realized here using the height profile along the corrected river axes as sampled from the DTM. After smoothing, the ascending and descending hill of a potential obstacle is detected as a clear local maximum of the first derivative followed by a clear local minimum within a certain distance (c.f. Fig. 3.2 C). "Clear" is intended to imply that the absolute values of the extrema exceed a certain threshold. An additional criterion is that the height after the obstacle has to be lower than before.
However, elimination should not start at the steepest slope but at the point where the obstacle effectively starts changing the height profile. So after having detected an obstacle based on the first derivative, a subsequent fine detection is run in the surrounding section of the height profile using the same combination of first and second derivative as described in Eq. 1 (c.f. Fig. 3.2 D). The value for β 1 is chosen to be very small giving the first derivative a strong weight. This means local minima in the height profile are identified. These local minima are the points of the river axis where the height profile stops fulfilling the expectation of monotonicity. Additionally, a small margin is added Fig. 4. A practical example of obstacle detection. The light blue river axis crosses several bridges in the original DTM. The diagrams on the right refer to the river axis segment framed by A. Plot B shows the original (red) and smoothed (blue) height profile along the river axis. The triangles show the detected obstacle extents. These were determined using the two functions in C and D: C is the first derivative for the coarse detection with significant (above threshold) local maxima and minima marked with triangles. D is the function for fine detection. All significant maxima are marked with black circles. Those nearby the detected obstacles (C) are used as endpoints and therefore marked with a red asterisk. before and after the obstacle to ensure a proper channel representation when removing obstacles (c. f. 3.3).
Finally, a decision has to be made how to proceed with the detected obstacle. Although there are some objective criteria, the final decision was made manually by an operator using the DTM and optionally other datasets such as Orthophoto, DSM and the Cadastral map. The three possible outcomes are: • Cut out: If the obstacle represents a bridge over the river or an artifact in the DTM, it is in most cases cut out and replaced by an interpolation of the river channel using the approach described in Section 3.3. • Culverts: There are, however, some cases where cutting out would strongly deteriorate the simulations. For example, if a river passes underground through settled areas, it is clearly not meaningful to cut a channel into the DTM and assume the river to run on the surface. Another example is a river that runs through a small culvert under a retention dam or under another river. In all of these examples, no cutting out is done. Instead, the respective segments of the river axes are saved to a separate vector dataset and are modeled as culverts in the hydraulic simulations. • No further treatment: If the increment of the height profile represents an actual obstacle (e.g. rock) in the river bed, no cutting is necessary because the water may pass on the left and the right. These cases are rather rare as the axis correction (3.1) tends to avoid these objects.

Removal of obstacles in the river channel
The obstacle detection (3.2) results in a dataset indicating segments of the river axes where the DTM is no proper representation of the river channel due to some kinds of obstacles. In the following step, some of these are removed by (i) interpolating between the valid sections of the river channel, (ii) replacing the obstructed parts with the interpolated channel and (iii) ensuring a smooth transition between replaced and unchanged DTM.
The interpolation for one segment is realized based on N densely spaced cross sections (c i with i = 1, …, N) normal to the river axis. Each cross section has a height profile h orig i (d) as sampled from the DTM, d is the normal distance from the river axis. The first and last cross section, c 1 and c N , are before and after the obstacle, so their respective height profiles are assumed not to be affected by the obstacle.
For the interpolation, the cross sections are transformed to a profile coordinate system which is equivalent to unwrapping the river axis to a straight line (Mandlburger, 2000;Mandlburger et al., 2011).  Fig. 4. River axis segments running through identified obstacles are marked in red. The diagram shows an example cross section before the obstacle and the detected river embankment (vertical red lines). The blue/yellow color bar on the bottom illustrates the weighting between h inter i (d) and h orig i (d) as formulated in Eq. 9. Blue in the river bed indicates that only the heights interpolated from before and after the obstacle are used; yellow stands for only using cross sections from the original DTM, and in between there is a smooth transition. On the right side, the DTM after obstacle elimination is visualized in ground view and in 3D perspective vie.w (green frame).
δ ] + 0.5; and w r (d) = 1 2 ⋅cos[π⋅ d− dr δ ] + 0.5 The adapted height profiles h final i (d) are brought back into their original positions by inverting the transformation to a profile coordinate system. An equidistant sampling of the single height profiles leads to a point cloud which is then interpolated to get a local raster model with eliminated obstacles. The procedure is concluded by integrating these new raster models for all obstacles into the original DTM. Example results are visualized in Fig. 5.

Detection of river embankments
Up to this point, rivers were represented by their central axes. However, for boundary conditions or integration of bathymetric measurements, it is important to represent rivers with their full spatial extent. The two datasets derived here are the top edge of embankment and the water-land boundary. Fig. 6 illustrates these two datasets. Each one is saved as one vector line pair per river.

Top edge of embankment
The upper edge of embankment is defined as the terrain edge between the river channel and the surrounding topography. Typically, it shows a strong negative curvature due to the flattening above the river banks. This idealized form can, however, not always be expected.
Especially in alpine, V-shaped valleys, no clear delineation of river embankments is possible. In this case, the geometric criteria are replaced by a height threshold.
Due to the very heterogeneous form and size of rivers, the automatic detection is again realized based on cross sections through densified river axis nodes. Similar to the obstacle detection (Section 3.2), two-step approach is adopted where the embankments are coarsely detected in the first step before refining the search to identify their upper edge.
At this point it is assumed that the position of the river axis corresponds to the center of the cross section. Accordingly, the first derivation (h ′ symm (d)) of the height profile is calculated starting from the center which leads to the left and right river embankment both having clearly positive values. Furthermore, they are expected to be in the transition from a positive curvature in the river bed to a negative curvature towards the top edge.
Consequently, the function for coarse embankment detection searches for a high first and a small second derivative by maximizing the function where γ > 0 is a scalar to shift the emphasis between the two derivatives and avoid division by zero. It is of course not meaningful to search the whole cross section for maxima of this function, especially since Eq. 10 does not explicitly contain terrain height. The height above the river axis is instead introduced as a hard criterion for limiting the search area to certain parts of the cross section. This is done by introducing an adaptive height threshold on each side of the river axis separately: 1. All parts on one side of the cross section with heights in a certain range above the river axis {h s (d) | h axis +ε < h s (d) < h axis +δh} are selected. The tolerance ε should be in the range of the height variations expected on the water surface due to interpolation effects and the height range δh is slightly higher than the maximum expected embankment height (e.g. 15 m). 2. The river flood plain height h f on the current side of the cross section is approximately given by h f = median(h s (d)). In case of V-shaped valleys without a flatter foreland, h f will be around δh 2 . 3. On both sides of the river axis (d = 0), the lateral search area is limited by the first point (d = d lim ) coming close to flood plain height (h(d lim ) − h axis > (h f − h axis )⋅γ with γ depending on the dominating bank and flood plain topography, e.g. γ = 0.9).
Having found a steep part of the river embankment on both sides, a lateral margin of several meters outwards is used to limit the search area for fine detection of the top edge of embankment. The detection itself is executed by finding minima of a function with the same design as Eq. 1. However, in this case, a higher value for β 1 is chosen compared to the application in 3.1 since the top edge of embankment is not necessarily expected to have a first derivative near zero.
If the distance between cross sections is small enough, the top edge of embankment line is simply completed by connecting the detected points of the cross sections. Finally, this line is smoothed, e.g. in GIS.

Water-land border
Compared to the top edge of embankment, the delineation of the water-land border is not as clear since there is often no obvious geometric manifestation (e.g. flat gravel banks). Radiometrically, the border is theoretically very clear, but detection from Orthophotos would be limited to large rivers without vegetation on the banks which is a small portion of all rivers in Austria.
Therefore, it was decided to define the water-land border as the lower edge of embankment while applying slightly more restrictive criteria to the embankment detection. That way, even minor ascents are detected as banks, so their lower edge is very likely the water-land border.
With this assumption, the workflow for the water-land border is very similar to the previous one for the top edge of embankment (3.4.1): First, a coarse detection is run to find the river banks using Eq. 10. The second step is to refine the search, this time inwards, i.e. between banks and river axis. The lower edge of embankment is detected by finding the strongest curvature in the search area by maximizing h ′′ (d).

Integration of the river bed into the DTM
The DTM discussed above represents water by the water surface if at all. In order to make it suitable for hydraulic modelling, river bed information from other data sources has to be integrated.
A common river bed representation are measured height profiles normal to the river axis. In this case, the concept of transforming cross sections to a profile coordinate system (c.f. 3.3) was used to densify the measurements along the river axis via interpolation. Alternatively, mesh terrain data (such as SMS or the mesh containing the trapezoidal profiles) were available. These were converted to regular raster data with linear interpolation.
For river reaches without prior information about river bed geometry, a trapezoidal-shaped cross-sectional profile of the river was assumed. The width of the trapezoid was based on the distance between the water-land borders. Its depth was calculated with a fixed-point iteration from the hydrological discharges using the Manning formula (Buttinger-Kreuzhuber et al., 2021).
Depending on the reliability and the extent of the river bed representation, the water-land border or the top edge of embankment serve as boundaries between the different datasets meaning that the profile data replace the DTM between the river banks whereas the original DTM remains unchanged for the flood plain.

Results and discussion
The methods described in Section 3 were applied to datasets covering all of Austria as the basis of flood hazard mapping (c.f. Section 2). The automatic procedures were supplemented by manual validation and adaption at some points of the workflow to ensure highest reliability of the results. Table 1 summarizes the most important outputs as well as the respective degree of automatization.
Overall, the vector datasets required increased manual interaction compared to the raster datasets. This has to do with (i) their content partly being subject to interpretation and (ii) their importance as intermediate products. The latter is also a prerequisite for the high automatization of raster processing.
The river network has a key role in the workflow. All further products essentially depend on the correctness of the river network, i.e. the river lines being at least inside the water-land borders according to the DTM. This is largely ensured by the automatic correction procedure. However, the procedure fails in cases of poor approximations or if the river channel is not clearly observable in the DTM due to long culverts, poor interpolation of gaps over water bodies etc. Extensive manual validation and correction ensured that the assumption of a correct river network holds and thus all further working steps could be automatized more reliably. Manual checking for both, obstacle and embankment detection, was then limited to distinct automatically detected positions. Fig. 7 shows the results for two example scenes. The river axes are correctly positioned inside the river channels. However, they don't always take the most direct route but tend to show slight zigzag patterns. This has to do with the strong emphasis of the correction procedure on avoiding ascents in downstream direction. As a consequence, small detours are accepted in order to avoid undulating parts of the river channels in the DTM. The top edge of embankment and the water-land border confine the river banks very well. Nevertheless, the latter does not always exactly limit the water body. The Orthophoto in Fig. 7B indicates that there are also gravel banks included as long as these are flat enough to not clearly stand out against the water surface geometrically. Fig. 7A also shows two eliminated bridges in the DTM. Furthermore, the two most frequent use cases of culverts are illustrated, these are (A) cases where cutting would obviously deteriorate the hydraulic simulations and (B) overbuilt rivers through cities.

Table 1
Resulting country-wide datasets in chronology of the adopted processing chain. Apart from these local quality assessments, also more extensive validation of the spatial data was carried out, although there are some limitations concerning the availability of ground truth. The final flood hazard maps were validated with good overall results on a country scale against local flood hazard maps (Buttinger-Kreuzhuber et al., 2021). These comparisons also provide an implicit quality assessment of the spatial datasets. Nevertheless, the assessment is rather indirect since (i) the hydraulic simulations also use several other inputs and (ii) the local hazard maps also considered detailed information (e.g. mobile flood control facilities) not available at the national scale.
Separate validation of spatial datasets is possible with some as-sumptions. For instance, the exhaustively checked and corrected river network can be considered ground truth in terms of being situated in the river channel. Even though the DTM does not contain enough information to enable a unique river thalweg definition, the final river axes can be expected to be located in between the river embankments. In this respect, quality checking of the automatic correction procedure can be realized by intersecting the river axes with the final water-land borders after manual corrections. To this end, the water-land boundary line pairs were closed to "river polygons" so that each polygon represents the water body of one river. The original river axes (r orig ) and the river axes after the automatic correction (r corr ) were converted to point SHP files with each point representing one vertex of the line SHP files. The correctness of a river axis was then defined as the percentage of points within the respective river polygon. The correctness value was calculated both for the complete river network and the unobstructed parts alone. The latter excludes all river segments in the immediate vicinity of bridges and culverts. By design, the manually corrected river network has a correctness of 100% since it is the basis of the water-land border detection (c.f. Section 3.4.2). Results for the other two stages of the river network correction can be found in Table 2.
Even though the automatic river network correction improves position correctness substantially, there remain unsatisfactory results for about 2.3% of all unobstructed river segments. The most common reasons are: • Poor approximations: The correction procedure implicitly limits the search space for the correct river bed position via cross section widths (e.g. 100 m to both sides of the original river axis). Additionally, small deviations from the original axis are slightly preferred over larger ones (Eq. 3, Section 3.1). This leads to a potential failure of the correction in case of exceptionally poor approximations. • Unanticipated complexity: In some cases, meandering rivers are strongly simplified in the original dataset. Even though the axes are densified before correction, the procedure is sometimes not able to recover the full complexity, which is to some degree related to the next point. • DTM river channel representation: As mentioned in Section 2, ALS data acquisition usually results in measurement gaps for a large part of water surfaces. Closing these gaps leads to artifacts in the river channel, especially if the banks are vegetated. Due to these effects, the true river channel is probably not the "cheapest path". Instead, a corrected river line running through smooth terrain a few meters off the actual river channel may lead to clearly less ascending slopes and is therefore preferred by the automatic correction. • Error propagation: Long underground sections (e.g. culverts) hamper river channel detection from the DTM. This can also have an impact further downstream beyond the directly affected reach. If the river channel is bordered by ridges, the corrected axis is not able to immediately return to the channel without uphill slopes once it is outside (c.f. Eq. 7 in Section 3.1).
The detected obstacles in the river course can also be verified wherever they correspond to actual bridges. Therefore, a road and path layer from Open Street Map was intersected with all river axes resulting in approximately 52000 points for all of Austria. These crossings between rivers and traffic routes were then assessed as to whether potential obstructions of the river channel (e.g. bridges) could be detected and eliminated. Of the 52000 intersections, about 22000 were detected as obstacles or culverts. The remaining almost 30000 points are largely related to bridges that had already been eliminated from the original DTM or tunnels. In order to find meaningful quality measures, these points are manually assessed for four federal states: Vorarlberg and Salzburg have predominantly alpine character whereas Burgenland and Vienna are rather flat, the latter with mostly urban characteristics. In total, for 122 interSections (1.7%) the DTM still contains bridges or perceptible remains of bridges after elimination. More details can be found in Fig. 8.
The undetected bridges are mainly located in the upper river reaches. The detail view in the center of Fig. 8 represents one typical example: The channel of the small river is not adequately resolved in the DTM. Consequently, the crossing road only creates a plateau in the height profile but there is no ascent in downstream direction and the bridge is not detected. In all the four states, there was no remaining bridge over large rivers. There are, however, remarkable differences concerning the ratio between bridges already eliminated in the original DTM (blue) and the bridges detected and eliminated in our workflow (green). This clearly shows the heterogeneity of the DTM due to different processing chains in the individual states.
Validation of the other vector datasets is limited to thorough visual examination (e.g. Fig. 7) since there is no extensive comparable ground truth at a similar spatial resolution. These datasets were created for a very specific application and thus do not always have a clear correspondence with the DTM. For example, river banks in steep alpine valleys often do not have a geometric pattern as expected for the top edge of embankment. However, if there is a clear top edge of embankment, detection is usually more reliable than for the water-land borders because the latter very much depend on a smooth and artifact-free representation of the river channel, especially the water surface. Concerning the raster products, spot checks were conducted to assess output quality. The obstacle elimination proved to work exceptionally well in terms of smoothly interpolating the obstructed river channel without leaving artifacts (c.f. Figs. 5 and 7A). Integration of river bed measurements mainly depends on the quality and compatibility (e.g. height system) of the respective bathymetric data.
Overall, the results demonstrate that processing of high resolution spatial data for hydraulic modelling is no longer restricted to small study areas. The developed methods are able to provide countrywide datasets based on a 1 m DTM with local relevance. At these spatial extents, a high degree of automatization is essential. However, similarly to prior studies (c.f. Section 1.2), we conclude that fully satisfactory results still require some level of checking and corrections by a human operator.

Conclusions
We present a methodology to adapt extensive high-resolution spatial datasets to the requirements of hydraulic modelling for flood hazard mapping. The aim of these methods is to ensure suitable and consistent representation of a river network and the surrounding area with a high degree of automatization. This includes the elimination of discrepancies between datasets (e.g. river network vs. DTM), an improved river channel description in the DTM (obstacle removal & bathymetry integration) as well as the creation of new datasets such as culverts and embankment lines. The workflow was designed in a way to make its parts as independent as independent of each other as possible, in order to facilitate the isolated execution of single steps and enhance planning flexibility. Furthermore, a focus is on high adaptability with respect to different river sizes and DTM properties which is achieved by purely geometric, cross section-based processing.
All methods were applied to data covering the whole territory of Austria with 33880 river km on a total area of about 84000 km 2 . The automatic correction increased the percentage of river axes matching the DTM river channel from about 85 to 97%. After automatic detection and elimination, more than 98% of all bridges do no longer affect the river bed representation. Nevertheless, a certain degree of manual interaction is necessary for achieving the full reliability due to inconsistent input data and artifacts in the DTM. Manual checks at an early stage of the processing chain, e.g. checking the complete corrected river network, substantially reduces the required manual efforts in later working steps. Derivation of the original DTM with particular emphasis on a sound mapping of river channels could also likely improve the Table 2 River network validation before and after automatic correction. r is the correctness index defined as the percentage of river axis vertices situated between the river banks. This value is calculated for the original (r orig ) and the automatically corrected (r corr ) river network. Furthermore, the complete river networks are compared to unobstructed sections, the latter excluding the vicinity of bridges and culverts. The final river network after manual interaction has a correctness of 100%. results of automatic procedures. Another promising possibility is processing directly based on point clouds. This would, however, require significantly more computational resources. Due to the rapid progress of surveying techniques (especially ALS) and computational possibilities, we also expect an increasing demand for locally relevant flood risk modelling on regional and country scales. Accordingly, scalable and automatable procedures to adapt spatial data will be needed. Our proposed methodology has been shown to work on a large, challenging dataset for a very specific application.
While the approach was not extensively tested for other datasets outside Austria, we still expect the method to adapt well to comparable data since it provided consistent quality over all federal states despite heterogeneous inputs. Most correction functions contain tuning parameters which can be calibrated in order to flexibly adapt to specific topographic circumstances. The applicability mainly depends on whether the underlying basic assumptions concerning geometry and input data hold, c.f. Section 3. The methodology could also be used for a DTM with different spatial resolution as long as it is high enough to resolve the river channels of interest.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Fig. 8. Validation of bridge and culvert detection for selected federal states (purple). The pie charts represent all intersections between traffic routes and river axes in the respective states. The blue portion indicates bridges already fully eliminated in the initial DTM thus requiring no further action. Green indicates intersections correctly detected as obstacles or culverts with the approach described in Section 3.2. For a small portion (yellow), at least parts of a bridge can still be identified in the DTM which could not be found by the automatic detection -the detail view in the center of the figure shows such a case. The river flowing from the top right to the bottom left is crossed by a gravel road which obstructs the river channel but does not create a height ascent in downstream direction. It is thus not detected as an obstacle by the auto.matic procedure.