Geospatial technology for flood hazard analysis in Comal Watershed, Central Java, Indonesia

River flood has become a severe disruption toward the community, and it can trigger loss of life, damage infrastructure and properties, suffer economic, social and cultural, and environmental degradation. Flood hazard analysis downstream of Comal Watershed in Pemalang Regency, Central Java, was designed to generate the flood hazard map to reduce the effects of flood disaster. The flood hazard was simulated for five, 10, 25, and 50 years using HEC-RAS and ArcGIS with HEC-GeoRAS. The data input was discharge, geometry, and roughness. The discharge was obtained from hydrology modelling by utilizing daily rainfall data for 22 years. The geometry data was composed of a topographic map and measured using an echo sounder, while the roughness was from land use along the flood plain. The accuracy and validation were conducted by matching the flood modelling result with the historical data from the Pemalang Disaster Management Agency and in-depth interview results with the community. The results showed the Comal River downstream flood hazard was feasible for disaster risk reduction purposes. The flood hazard levels were generated in five, 10, 25, and 50 years return periods dominated by high and very low levels. In every return period, the rising trend of the inundation width and the flood depth occurred for very high, high, and very low levels.


1.
Introduction Flood is one of the disaster events that harm society. The negative impacts vary; under certain conditions, floods can cause casualties, injuries, damage to infrastructure and property, disruption of socio-economic activities, environmental damage, or even damage to cultural heritage [1]- [8]. This event is a geographical phenomenon whose magnitude, intensity, frequency, and impact increase based on space and time scales [9]- [12]. On a spatial scale, floods hit certain areas with physical characteristics identical to floods in the past and present, for example, topographic features. Furthermore, based on a chronological perspective, the impact of flooding tends to increase due to an increase in elements at risk in flood-prone areas [13]. Therefore, it is necessary to analyze flood characteristics as a disaster mitigation effort to reduce flood risk [14]. A practical flood risk reduction framework requires flood hazard maps, estimation of losses in floodprone areas, and proper flood disaster mitigation planning [15]- [17]. The flood hazard map is crucial because it can provide data on the depth and area of flood inundation which can then be used as a basis for policymaking [5], [13]. Therefore, the method of compiling flood hazard maps is growing with various variations of the data used. Today's most advanced process is hydraulic modelling, both 1D and 2D, because it can consider the flood inundation profile in several characteristics or return period [16]. However, in general, the results of hydraulic flood modelling do not pay attention to the distribution of flood inundation. Therefore, a geospatial analysis technique is needed by utilizing a Geographic Information System (GIS) to obtain comprehensive modelling results that can analyze flood hydrographs while simultaneously presenting the data spatially as flood hazard maps [18]. In flood modelling, two main components will significantly affect the accuracy and validity of the modelling results. The two components are hydrological and hydraulic data [19]. Hydrological data can be obtained from the effects of measurements in numerical modelling results by utilizing water table data and rainfall data that meet specific requirements [20]. These data should ideally be owned by authorized agencies in every country in the world. However, the fact is that many countries, especially developing countries, do not have good quality discharge and rainfall data, which can hinder efforts to reduce flood risk. This is also in line with the availability of hydraulic data that is suitable for mapping flood hazards. The hydraulics data consists of manning roughness data (n) obtained from the interpretation of land use in floodplains and river bodies. The topographic data describes the physical condition of an area based on the configuration of the land surface elevation [21], [22]. The ideal elevation data is ground elevation data or commonly called the Digital Terrain Model (DTM). DTM with high accuracy will produce a perfect flood inundation map because it can cover different flood inundation heights that vary from the minor interval [23]. Therefore, it can be concluded that the better the quality of the topographic data, the better the quality of the flood hazard maps [21], [24]. The main obstacle in obtaining quality topographic data such as DTM is time and cost. DTM is obtained from field measurements or data processing from satellite images and aerial photographs, which are highly cost [21]. Therefore, efforts are needed to reduce these distortions or weaknesses. The shortage of hydrological data can be overcome by modelling the hydrograph of synthetic units, approaching rain data as the central element in discharge formation [25]. Lack of topographic data can be covered by topographic maps, which are then tested for accuracy in the field. In this study, hydrological data and hydraulic data are used as inputs for the HEC-RAS model. HEC-RAS is a reasonably practical model used for one-dimensional (1-D) river flow modelling [26]. HEC-RAS is used for modelling inundation in open channels or natural rivers [27]. Natural river discharge in this study uses the fundamental equations of continuity and conservation of momentum (de Saint-Venant equation) [28]. The HEC-RAS can represent floods based on the input data used. The quality of the input data is very influential on the modelling results, so calibration and validation of the modelling results are needed. In this study, calibration was performed by matching flood-affected areas with historical data of recorded flood events. At the same time, validation was carried out by interviewing residents in flood-affected areas according to the modelling results. Calibration and validation are carried out to produce accurate geospatial information for mapping flood hazards using minimum data as input. The data from the modelling is then presented spatially using GIS. This study analyses and maps flood hazards using spatial technology and low-cost or even free datasets in the downstream Comal watershed, Central Java, Indonesia. The Comal watershed is included in the watershed with super-priority for handling in Indonesia [29]. The Comal watershed is included with super-priority for taking in Indonesia, marked by

Rainfall analysis
Rainfall was one of the primary data to obtain the design discharge at a specific return period in this study because discharge data with good accuracy were not available. Rainfall data were obtained from 11 rain gauge stations with 22 years (1992-2014) for each station. The regional rainfall analysis used in this study was the Thiessen Polygon. This method was suitable for determining the average rain in areas with limited rainfall stations and uneven rainfall events. The results of the data processing were then processed by frequency analysis. Frequency analysis was one way to determine the design rainfall or discharge at a specific return period based on statistical properties to obtain the probability of discharge in the future. The statistical distribution used was the Normal Distribution, Log-Normal, Log-Pearson III and Gumbel [31], [32], while the statistical distribution suitability test used the Chi-square test and Smirnov-Kolmogorov. An Intensity Duration Frequency (IDF) analysis was carried out to determine the amount of design rainfall for a specific return period. In this study, the return periods used were 5, 10, 25 and 50 years. The rain intensity analysis used was Mononobe assuming that the rainfall was distributed 24 hours in each return period. The Mononobe formula for Indonesia used m = 2/3 to believe that the rain was distributed 24 hours in Indonesia. The design rain (PT) at several repetitions was distributed using the Alternating Block Method (ABM). The design hyetograph generated from this method was the rain that occurs in n consecutive time intervals with a duration t during time Td = n t. Furthermore, the distribution of design rain at several return periods was obtained by dividing the amount of rain at each return period by the percentage according to the duration of the rain. Rainfall duration was obtained from the time of concentration (TC) of rain calculated by the Kirpich formula.
The design rain at several return periods that have been distributed is then reduced by index to get the effective hourly rain depth (Peff). The index value was the difference between the amount of rain and flow calculated by the Synthetic Unit Hydrograph (SUH) Gama I method.

Discharge Generation
The calculation of synthetic unit hydrograph (SUH) Gama I used input data, namely watershed morphometric data [26], which consists of watershed area (A), length of the main river (L), length of river level I (L1), length of the river of all levels (Li), the number of rivers of order 1 (ST1), the total number of rivers at all levels (Sti), watershed width at 0.75 L (Wu), watershed width at 0.25 L (WL), upstream watershed area (Au) and river slope average (S). The watershed morphometry was obtained from topographic maps (with a scale of 1: 25,000) and the calculation of geospatial data. These morphometric data were used to construct flow hydrograph formation parameters consisting of source factor (SF), source frequency (SN), width factor (WF), upstream watershed area (RUA), symmetry factor (SIM), drainage network density (D) and the number of river confluences (JN). The hydrograph formation parameters were then used to calculate the components of the SUH Gama I, including peak time (Tr), peak discharge (Qp), base time (Tb), reservoir coefficient (K), base flow (Qb) and Phi index (Φ). After these components were met, the amount of discharge every hour starts from zero hours with zero discharge increasing to the peak flow until it experiences a recession and returns to the base time.
The design flood discharge was obtained from the analysis of the frequency of the rain data, which was then distributed into hourly rains and became the effective rain distribution after deducting the index value. The amount of effective rain was used as one of the parameters in calculating the design flood discharge on the SUH Gama I method at each return period. The design discharge at the 5, 10, 25 and 50-year return periods was not enough to be used as input data in HEC-RAS with the unsteady flow type. Therefore, it was necessary to prepare a flood hydrograph at each return period by calculating the hydrograph of the mean unit, base flow (Qb) and effective rain (Peff) hourly at each return period.

Hydrodynamic flow model in HEC-RAS
Hydrodynamic modelling in HEC-RAS consists of at least two critical parameters that will significantly affect the modelling results: input geometry data composed of topographic data, manning roughness values, and discharge/flow data. Geometric data were obtained from field measurements using an Echo Sounder and Digital Elevation Model (DEM) data processing which was processed into a Triangulated Irregular Network (TIN) format with ArcGIS with HEC-GeoRAS extension [33], [34]. River geometry data includes stream centerline, banks, flow path, and XS cut lines creation [21], [35]. Geometry input is very influential on the accuracy of the model [13]. The more detailed the topographic data used, the higher the accuracy of the resulting model [24], [36]- [38]. Input the manning roughness value, which is adjusted to the cross-section of the river (XS cut lines creation). The adjustment of the Manning coefficient value is adjusted to XS cut lines creation based on land use in the floodplain area [39].

Correction and validation
Correction of flood inundation distribution was carried out using the formula for flood water level elevation minus ground surface elevation to find the difference in elevation between the ground surface and flood water level elevation [1], [39], [41], [42]. This analysis will produce inundated areas, namely areas that are exceeded by floodwater level elevations and places that are not flooded, namely areas with land surface elevations that are not exceeded by floodwater level elevations. The validation of the flood distribution was carried out by comparing the flood inundation area data with flood event data from the Pemalang Regency Disaster Management Agency (BPBD). Furthermore, it was also conducted by interviewing 51 residents in the outermost locations affected by flooding [43]. The results of interviews and observations were then calculated the percentage difference with the results of the model. If the difference between the modelling results and interviews is less than 10% or the level of match between the modelling results and the interview results is 90%, the modelling results are considered accurate.

the distribution of flood hazard modelling
Correction of flood inundation distribution also produced flood depth data used to determine the level of flood hazard. The level of flood hazard was classified into five hazard classes based on the modified flood hazard classification of the Head of BNPB Regulation Number 2 of 2012 [44], [45] ( Table 1). The flood hazard classification was then mapped at each repeat time to determine the distribution of the flood hazard level. The level of flood hazard at each return period will produce a different area depending on the analysis of the discharge data. The entire series of data collection and analysis to present modelling results is carried out in detail ( Figure 2). The use of primary and secondary data has been adapted to the needs of the data in the modelling. Data limitations can be overcome with other available data, although they cannot replace them altogether.

Result and discussion
3.1. Design of flood hydrograph Regional rainfall in this study was arranged using the Thiessen polygon method due to the limited data of rainfall stations. The spatial distribution of rain stations was uneven inside and outside the Comal watershed, so it did not meet the requirements for making regional rain using the Isohyet method. The results of the regional rainfall analysis showed that the average annual rainfall in the Comal watershed was 4,271.676 mm/year. The annual rainfall pattern at all stations in the Comal watershed was uniform because the rainfall at each station was above 1,500 mm/year. Rainfall data used in the frequency analysis was determined by the annual maximum series method. This method was considered to get the highest flood discharge to get the worst river overflow flood events or floods with high intensity through modelling [31]. These results were used as a basis for determining spatial planning directions based on river flood risk. The worst of high-intensity flood events were needed to develop appropriate spatial planning directions. In addition, the annual maximum series method was used because the rainfall data used as input data was more than ten years (Table 2). The daily maximum rainfall data were calculated for statistical values to obtain the average value (x), standard deviation (S), skewness coefficient (Cs), kurtosis coefficient (Ck) and coefficient of variation (Cv). The calculation results showed that the statistical values of the maximum daily rainfall data of the Comal watershed were not following the normal distribution, Gumbel and normal logs, so it can be concluded that the best type of statistical distribution was the Pearson III log. The Chi-square and Smirnov-Kolmogorov tests also showed results that strengthen these calculations. The Chi-square test shows that x 2 was the value of (Ef-Of)2/Ef, which was 1.067. The x 2 value was compared with the critical x value obtained from the Chi-square table, with the degree of freedom value being 2. The critical x value based on the table was 5.991. The comparison results showed that the value of x 2 (1.067) was smaller than the critical x value. Furthermore, the results of the Smirnov-Kolmogorov test showed the value of max = 0.0397 and the value of cr = 0.27, so that the best type of statistical distribution was log Pearson III. The determination of the return period based on the Pearson III log distribution was 5, 10, 25 and 50 years ( Table 3). The return period also considers the length of the input data to get more accurate results. The design rainfall (PT) was distributed again using the Alternating Block Method (ABM). The calculation results showed that the rain in the Comal watershed was concentrated for three hours. The relatively short concentration time results in higher rainfall intensity. The higher the rainfall intensity, the higher the peak flood discharge produced. Furthermore, the design rainfall at several return periods that have been distributed was reduced by index to obtain the effective hourly rainfall depth (Peff) ( Table  4). Rainfall analysis was used as a basis for determining flow rate using SUH in addition to watershed morphometric data. The SUH used was Gama I, which has been tested in 30 watersheds on the island of Java [46], so it is assumed that it will be suitable for the Comal watershed. These morphometric data were used to compile the parameters for the formation of the flow hydrograph, which was then used to calculate the components of the SUH Gama I ( Table 5). The discharge on the SUH Gama I was estimated to calculate the amount of discharge every hour. The calculation started from hour to zero with zero discharge increasing towards peak discharge until it went into recession and returned to the base time ( Figure 4).  The design flood discharge was obtained from the analysis of the frequency of the rain data, which was then distributed into hourly rains and became the effective rain distribution after deducting the index value. The effective rain was used as one of the parameters in calculating the design flood discharge on the SUH Gama I method at each return period. The design flood discharge at 5-year return period was 507.93 m 3 /s, ten years was 673.53 m 3 /s, 25 years was 968.42 m 3 /s, and 50-year return period was 1813.50 m 3 /s. The results of the calculation of the flood discharge at each return period were used to compile the design flood hydrograph. The design flood hydrograph was obtained from the hydrograph analysis of the mean unit, base flow (Qb), and effective hourly rainfall (Peff) at each return period. The SUH Gama I method calculations obtained the mean unit hydrograph and base flow (Qb).

Hydrodynamic modeling in HEC RAS
Running the program begins with creating a new project on the HEC-RAS application and building the input river geometry, which is a requirement in the modeling. The river geometry input was made using ArcGIS 10.1 software with HEC-GeoRAS extension. Geometry inputs included river channel modeling, channel bank modeling, flow-paths modeling, and river cross-sections. The river channel modeling in this study used data from the Triangulated Irregular Network (TIN) from the conversion of contour data of Indonesian topographic by 1: 25,000. This layer represents the flow of the Comal River. The centerline of this flow was made in the middle of the river channel. They were made from upstream to downstream. This layer represents the cliff boundaries on the right and left sides of the river channel. Channel Banks were made by drawing lines from upstream to downstream on both sides of the river channel, while flow paths were used to identify hydraulics in the middle, right, and left sides of the river channel. The two geometric elements were based on the boundaries of the existing river channel in the 5-meter Panchromatic SPOT Image of the Comal watershed area. However, the modeling results with these data were less than optimal, especially in the downstream Comal Watershed because the level of detail was low, so field measurements were carried out to determine the river's depth with an echo prediction technique using an Echo Sounder. The cross-section of the river is one of the essential geometric elements in modeling in HEC-RAS. The line drawing in making the cross-section was carried out from the left to the right of the river channel from upstream to downstream, perpendicular to the river channel, and did not intersect with each other. Each cross-section was identified as a river station (RS) with a serial number starting from downstream to upstream. The cross-sections in this study were 77 starting from RS 500 to RS 38999 with a distance between cross-sections of 500 meters (Figure 3).  The type of flow used in this study was an unsteady flow. Two limitations must be met in this type of flow: the upstream boundary conditions (flow hydrograph) and downstream boundary conditions (stage hydrograph). This study's upstream boundary condition (flow hydrograph) was the discharge data resulting from the conversion of various rain data into flood discharge. The downstream boundary condition (stage hydrograph) was the average tidal data Pekalongan-Pemalang coastal area. The average tidal height in the Pekalongan-Pemalang coastal area was one meter. The modeling results produced flood inundation scenarios at each return period of 5, 10, 25, and 50 years (Figure 3). These results indicated differences in the area of flood inundation at each return time. The longer the return period in this study, the wider the resulting flood inundation. The correction of the inundation area was carried out with DEM data after the running process was complete. The data used to correct the flood inundation scenario was the elevation of the floodwater level and the ground surface elevation [42]. The flood elevation high point data was used as the basis for analysis to determine the inundation area from the converted data using HEC-geoRAS. Correction with DEM data showed unsatisfactory results because the DEM data was less detailed, so it was necessary to carry out a validation test by matching the flood event history data from the Pemalang Regency Disaster Management Agency (BPBD) and interviews with residents. The validation of the flood modeling results was carried out to test the accuracy of the modeling results. . Interviews were conducted with residents aged >25 years with the assumption that they experienced several flood events. The validation results showed that the flooded area due to the modeling was 90% accurate (46 out of 51 respondents answered that there had been a flood). Therefore, it can be declared valid and used as a basis for flood risk analysis.

The distribution of flood hazard
At the five-year return period, the total inundation area of the downstream Comal River overflow was 7741.08 hectares and was dominated by medium, low, and very low hazard classes. The 10-year return flood had a total area of 8170.05 hectares and was overwhelmed by low and very low hazard classes, but the extent of both types had decreased. While the high and medium hazard classes, although not dominating, experienced a significant increase in area. The 25-year return period had 9292.78 hectares and was overwhelmed by low and very low hazard classes, not much different from flooding at the 10year return period. However, a significant increase occurred in very high and high hazard classes. While the 50-year return period flood was 10477.10 hectares and was dominated by a very low hazard class, the area of very high, high, medium, and low hazard classes covered an almost evenly distributed area. It indicated that the flood on the 50-year return period had the highest level of hazard/threat. Based on the total flood inundation area of the downstream Comal Watershed at each return period, it can be concluded that the flood inundation area has an increasing trend ( Figure 5). Meanwhile, the rising site's movement was very high, high, medium, and very low hazard classes ( Figure 5). These showed that the flood hazard in this study was directly proportional to the length of the return period. The longer the return period, the wider the flood hazard class. Therefore, disaster preparedness, early warning, and 12 mitigation must be implemented and improved as a form of flood disaster management. They can be realized in the form of spatial planning directions.

Figure 5. The Rising of inundation flood width at every return period
Rainfall data was the primary input to generate discharge because there was no accurate Comal River discharge data. The flood hydrological model integrates three main components, namely precipitation, infiltration, and runoff [47], [48]. Therefore, the quality and quantity of rain data are crucial in modeling river floods. The rainfall data used in the frequency analysis was determined by the annual maximum series method. This method is considered to get the highest flood discharge to get the worst river overflow flood events or high-intensity floods through modeling. In addition, the annual maximum series method was used because the rain data used as input data was more than 10 years. The rain data obtained were processed statistically to receive flow discharge scenarios at 5, 10, 25, and 50-year return periods [8], [38], [49]. The calculated discharge data was the hourly discharge data that was simulated to occur in the Comal River. Hourly discharge data is one of the data input requirements for HEC-RAS with an unsteady flow scheme. The modeling results on HEC-RAS showed differences in the area affected by flooding. The area affected by flooding has increased every return period ( Figure 6). The longer the return period, the higher intensity of the flooding. Another study with the same model showed a positive relationship between high rainfall intensity for a long time and high peak runoff and flow rates [48], [49]. Generally, studies on flood hazard modeling have met all the data and prerequisites needed; however, they are not entirely perfect [38], [48], [50]. Therefore, according to the modeling results, this study carried out validation by conducting interviews with communities in flooded areas. The flood hazard in this study was divided into five hazard classes based on the thickness of the flood or the height of the flood surface. Validation was also used to cover the lack of topographic data. This study used a topographic map of 1:25,000 with a contour interval of 12.5 meters.  In a more detailed analysis of flood hazards, the results showed that an increase of flood hazards occurs in high (1.5-2 meters) and very high (>2 meters) hazard classes ( Figure 5). The increase in the high and very high hazard classes indicated a significant increase in the intensity of the flood hazard class. It causes the evacuation process will be more difficult. Meanwhile, the moderate, high, and very high flood hazard classes dominated the middle area of the downstream Comal watershed. This concentration occurs because the area had meandering and was a very flat area used as agricultural land by the community. A hazard analysis is not enough to be used as a reference in reducing disaster risk in a specific area. Capacity and vulnerability data are needed to obtain disaster risk data in a particular area, so disaster risk reduction planning becomes more accurate [51], [52]. This study focused on the flood hazard because no previous research has analyzed the flood hazard in the Comal watershed. In fact, almost every year, floods hit the Comal watershed. Another study on regional vulnerability to flooding in the downstream Comal watershed was conducted by [53]. The study showed that the downstream Comal watershed area was vulnerable to flood hazards in terms of social, economic, physical, and environmental vulnerabilities. The four vulnerability variables used in the study correspond to [44] and the conditions of the downstream Comal watershed. [53] stated that high and very high classes dominated social vulnerability in the downstream Comal watershed due to the lack of community capacity in dealing with flood disasters. High vulnerability classes dominated economic vulnerability in the downstream Comal watershed because of the productive land found in the area. In the downstream Comal watershed, fertile lands included gardens/plantations, shrimp ponds, fishponds, wetland agricultural land (paddy fields), and dryland agriculture (moor/fields) with the primary commodities of rice, long beans, cassava, jasmine, maize, bitter melon, and chilies. Physically, the downstream Comal watershed was dominated by high vulnerability classes due to regional development. The downstream Comal watershed area is densely populated, so many economic, social, and educational facilities are developed. Different things happened to environmental vulnerability because low vulnerability classes dominated it. Environmental vulnerability parameters caused the dominance of low vulnerability classes, namely mangrove forests, forests, and shrubs are only found in specific areas such as the estuary of the Comal River [54]. The vulnerability of the downstream Comal watershed area shows the importance of research on flood hazards to be carried out. The limited discharge data is one of the main obstacles for researchers to conduct flood hazard analysis [51]. Therefore, using daily rainfall data and modeling becomes a logical choice to study flood hazards. The combination of flood hazard data and regional vulnerability to flooding hazards is needed for decision making and determination of flood disaster mitigation [48], [52], [55], [56].

Conclusion
Geospatial technology was applied in this study to produce a flood hazard map using minimum data. The data analysis results showed an increasing trend of inundated areas in the 5, 10, 25, and 50-year return periods with an area of 7741.08, 8170.05, 9292.78, and 10477.10 hectares, respectively. Furthermore, the hazard classes increased in the area with each return period were very high, high, and very low. The accuracy and validation tests showed that the flood modeling was feasible to be used as a basis for determining flood disaster mitigation. They also proved that an extraordinary approach could overcome the lack of data. The development of the flood hazard analysis method has proven to produce flood hazard maps that can then be used for disaster risk analysis. The limitations of discharge data can be overcome using rain data to compile a SUH. In contrast, the lack of topographic data with high accuracy can be overcome by conducting accuracy and validation tests. However, high-accuracy and up-to-date data are still needed in flood hazard analysis to get much more accurate results.