Fractal Theory and the Estimation of Extreme Floods

Floods and draughts constitute extreme values of great consequence to society. A wide variety of statistical techniques have been applied to the evaluation of the flood hazard. A primary difficulty is the relatively short time span over which historical data are available, and quantitative estimates for palcofloods are generally suspect. It was in the context of floods that Hurst introduced the concept of the rescaled range. This was subsequently extended by Mandelbrot and his colleagues to concepts of fractional Gaussian noises and fractional Brownian walks. These studies introduced the controversial possibility that the extremes of floods and droughts could be fractal. An extensive study of flood gauge records at 1200 stations in the United States indicates a good correlation with fractal statistics. It is convenient to introduce the parameter F which is the ratio of the 10 year flood to the 1-year flood; for fractal statistics F is also the ratio of the 100 year flood to the 10 year flood and the ratio of the 1000 year flood to the 100 year flood. It is found that the parameter F has strong regional variations associated with climale. The acceptance of power-law statistics rather than exponentially based statistics would lead to a far more conservative estimate of future flood hazards.


Introduction
The flow in a river can generally be considered a time series.The extreme values in the time series constitute floods.Floods present a severe natural hazard; in order to assess the hazard and to allocate resources for its mitigation it is necessary to make flood-frequency hazard assessments.The integral of the flow in a river is required for the design of reservoirs and to assess available water supplies during periods of drought.
One estimate of the severity of a flood is the peak discharge at a station V.The magnitude of the peak discharge is affected by a variety of circumstances including: (1) The amount of rainfaU produced by the storm or storms in question, (2) the upstream drainage area, (3) the saturation of the soil in the drainage area, (4) the topography.soil type, and vegetation in the drainage area, and (5) whether snow melt is involved.In addition dams, stream channelization, and other man-made modifications can affect the severity of floods.
In order to estimate the severity of future floods, historical records are used to provide flood-frequency estimates.Unfortunately, this record generally covers a relatively short time span and no general basis has been accepted for its extrapolation.Quantitative estimates of peak discharges associated with paleofloods are generally not sufficiently accurate to be of much value.A wide variety of geostatistical distributions have been applied to flood-frequency forecasts, often with quite divergent predictions.Iixamples of distributions used include power law (fractal), log normal.gamma, Gumbe), log Gumbe), Hazen, and log Pearson.Many discussions of this work appear in the literature [1][2][3][4][5][6][7].
An independent approach to reservoir storage was developed by Hurst [8,9].Hurst spent his life studying the flow characteristics of the Nile and introduced the rescaled range (R/S) analysis.He found that the variations of the storage (the range) scaled with the time period considered as a power law.introduced the concepts of fractional Gaussian noises and fractional Brownian walks and related these to R/S analysis; alt are recognized as fractal distributions.They also introduced the Noah and Joseph effects.The Noah affect is the skewness of the distribution of flows in a river and the Joseph effect is the persistence of the flows.Although the concepts introduced by Hurst and Mandelbrot and Wallis have been considered in a wide variety of applications [14], they have not influenced approaches to floodfrequency forecasting.This point will be a central feature of this paper along with a general discussion of the applicability of fractal statistics.

Analysis
In most cases the flow in a river is a continuous function of time, thus it is appropriate to treat the flow as a time series.It is straightforward to study the spectral characteristics of the time series by determining the coefficients of a Fourier expansion.For most river flows there will be a strong annual peak associated with seasonal variations in rainfall.However, it is of interest to examine the longer range trends in the data.If the Fourier coefficients have a power-law dependence on frequency over a significant range of frequencies a fractal dependence is obtained (with some constraints on the power).
UV (f) is the volumetric flow in a river as a function of time, the condition that the flow is fractal requires that dimension of a fractional Brownian walk is related to the Hausdorff measure by [15] D=2-H (2) and with 0<//< 1 we have 1 <Z) <2.
An extension of the self-similar analysis of rivers as a time series is to treat floods as a discrete fractal set.In order to avoid difficulties with annual variability we hypothesize that the peak annual discharge Vm in a time interval T is related to the interval by

Mn^c^T"
(3) with Tan integer number of years.Self-similar river flows imply a power-law scaling of peak annual discharges and recurrence intervals.
This scale invariant distribution can also be expressed in terms of the ratio F of the peak discharge over a 10 year interval to the peak discharge over a 1 year interval.With self-similarity the parameter F is then also the ratio of the 100 year peak discharge to the 10 year peak discharge.In terms of// and D we have F = 10" = 10^-''. (4) The parameter F is a measure of the severity of great floods.
An alternative way of writing Eq, ( 3) is where N is the number of floods per unit time with flows that exceed V.This relation wilt be used to analyse actual flood-frequency data.The quantities N in Eq.
(5) and T in Eq. ( 3) are related by Data will be used to obtain a; F, H, and D will then be found from Eqs. ( 4), (7), and (8).
Before considering actual examples we will also introduce rescaled range {RIS) analysis.Hurst [8,9] proposed this empirtcat approach to the statistics of floods and draughts.The method is illustrated in Fig. 1.Consider a reservoir behind a dam that never overflows or empties, the flow into the reservoir is V{t) and the flow out of the reservoir is \^(T) defined by »?4jV(0d/. (9) The volume of water in the reservoir V(t) is given by v{t) = y(0)+ I y(t')dt'-tP{T) (10) and the range is defined by

01)
where Vaat is the maximum volume and Fmin the minimum volume stored during the interval T. The rescaled range is defined as R/S where 5 is the standard deviation of the flow during the period T

S(T) = [^ I (^(0~b'Ydt.
(12) Hurst et al. [16] found that for many time series the rescaled range satisfies the empirical relation f=(-D"' where Hi is known as the Hurst exponent.Examples included river discharges, rainfall, varves, temperatures, sunspot numbers, and tree rings.In many cases the value of the Hurst exponent is near 0.7.

Vmin (t)
Fig. 1.Illustration of how rescaled range (R/S) analysis is carried out.The flow into a reservoir is V{i) and the flow out is V(T).The miwimura volume of water in the reservoir during the period T is VmaiT) and the minimum ^'niin(7'); the difference is the range R{T)^y^T)-V^UT).
If a Gaussian white noise sequence of numbers is integrated or summed the result is a Brownian wallt.An R/S analysis of the white noise sequence gives a Hurst exponent Hi, thus the Hurst exponent is equal to the Hausdorff measure of the integrated signal, a Brownian walk with H =0.5.introduced the concept of fractional Gaussian noises and their integrals, fractional Brownian walks.They showed that the Hurst exponent //| of a fractional Gaussian noise is equal to the Hausdorff measure of the corresponding fractional Brownian walk.
If 0.5 <Hi<l the original time series is said to be persistence; adjacent values are more strongly correlated than if they were random.The higher the value of Hi, the greater the persistence.If 0<//i <0.5 the original time series is said to be antipersistent; adjacent values are less correlated than if they were random.

Examples
We now turn to the analysis of flood-frequency records.As our first example, the 10 benchmark stations considered by Benson [2] will be studied.Benson [2] applied a variety of geostatistical distributions to the data from these stations, these will be compared with the fractal approach discussed above.The maximum annual floods for two stations are given in Fig. 2. Values for station 1-1805 on the .Middle Branch of the Westfield River in Goss Heights, Ma.ssachusetts are given in Fig. 2a for the period 1911-1960 [17] and values for station 11-0980 in the Arroyo Seco near Pasadena, California are given in Fig. 2b for the period 1914-1965 [18].In order to assess the applicability of fractal statistics the number of annual floods A^ with a peak discharge greater than F(m-'/s) is divided by the sampling period to give the mean number of floods per year N with a peak discharge greater than the specified value.The \ogN{V) is then plotted against log y. Results for station 1-1805 are given in Fig. 3a, the solid line is the least square fit of Eq. ( 5) with the data over the range 50<K<200mVs; large floods are omitted from the fit because of their small number.The solid line corresponds to a = 2.3; from Eqs. ( 4), ( 7), and ( 8 best fit of Eq. ( 5) with the data over the range lO<K<100m^/s.The solid line corresponds to a = 1.1; from Eqs. ( 4), (7), and (8) we have H =0.909, F = 8.11, and D = 1.09.In both cases the fit to the scale-invariant (fractal) relation is quite good.The values of// and F in California are considerably larger than in Massachusetts.Large floods are relatively more probable in the arid climate than in the temperate climate.
The values of H, D, and F are given for all ten benchmark stations in Table 1.The correlations with the fractal relation (5) in Fig. 3 are typical of the ten stations.The parameter F is a measure of the relative severity of flooding.The higher the value of F the more likely that severe floods will oc* cur.Our results show that there are clear regional trends in values of F. The values in the southwest including Nevada (F = 4.13) and New Mexico (F = 4.27) as well as California (F = 8.11) are systematically high.The high values can be attributed to the arid conditions and the rare tropical (monsoonal) storm that causes severe flooding.Central Texas {F =5.24) is also high and Georgia (f = 3.47) is intermediate.However, the results indicate that there is considerable variation of a (//, D, and F) but very little variation in Hi.A simple explanation is that the former is sensitive to the Noah effect while the latter is sensitive to the Joseph effect.The relative scaling of floods is sensitive to the skewness of the statistical distribution but is not sensitive to the persistence of flows or floods.An important conclusion is that R/S analysis is not relevant to flood-frequency hazard assessments.
Many statistical distributions have been applied to historical records of floods.Benson [2] has given six statistical correlations for each of his ten benchmark stations.His results for the 2-parameter gamma (Ga), Gumbel (Gu), log Gumbel (LGu), log normal (LN), Hazen (H), and log Pearson type III (LP) are given in Fig. 5a for station 1-1805 and in Fig. 5b for station 11-0980.Also included in each figure is the self-similar (fractal) estimate {F).For large floods the fractal prediction (F) correlates best with the log Gumbel (LGu) while the other statistical techniques predict longer recurrence time for very serious floods.The fractal and log Gumbel are essentially power-law correlations whereas the others are essentially exponential.
While the ten benchmark stations provide a basis for comparing statistical approaches, they hardly made a convincing case that fractal statistics are preferable to alternatives.A principal difficulty is the relatively short time span over which reliable records have been collected.In order to try to overcome this difficulty we have analysed a large number of records and superimposed the results.We have utilized a digitized 40 year data set for 1009 stations unaffected by flood control projects [19], The distribution of the stations over the United States is given in Fig. 6a.We will separately consider the data from the 18 hydrologic districts, these are illustrated in Fig. 6b.
The largest floods in each of the 40 water years are ordered, the largest annual flood is assigned a period of 40 years, the 2nd largest annual flood a period of 20 years, the 3rd largest annual flood a period of 13.3 years, and so forth.The log of the peak discharge for each flood is plotted against the log of its assigned period and the best straight tine, i.e., from Eq. ( 3), is obtained.Two randomly selected examples are given in Fig. 7.
Results for station 1-860 on the Warner River in Davisville, NH, are given in Fig. 7a.The best fit straight line gives H = 0M; from Eqs. ( 2), (4), and (7) we have f =4.8, D = 1.32 and a = 1.46.Results for station 3-2305 on the Big Darby Creek in Darbyville, OH are given in Fig. 7b, The best fit straight line gives // =0.386; from Eqs. ( 2), (4), and ( 7   In order to detennine the quality of the fit of the data to the fractal relation Eq. ( 3), the ratio of the measured peak flow to the value predicted by the fractal fit is given for periods of 1, 2, 5, 10, 20, and 40 years in Fig. 8-The 111 stations from hydrologic region 3 are given in Fig. 8a, the 57 stations from region 4 in Fig. 8b, the 10 stations from region 16 in Fig. 8c, and the 100 stations from region 17 in Fig. 8d.If all points were unity the fit would be perfect.The mean deviations from the fractal relation are only a few per cent.The deviations for larger values of the period are greater as would be expected since the individual points are only a few floods.However, the mean values of the 40 year floods are close to the fractal extrapolation.This agreement provides support for the applicability of fractal statistics to the estimation of the flood hazard.
In Fig. 9a the 111 fractal fits for hydrologic region 3 are given, the fits for regions 4, 16, and 17 are given in Figs, 9b, 9c, and 9d.The peak flow at a period of 10 years was normalized by the drainage area upstream of the station.If peak flows were simply proportional to upstream drainage areas in a hydrologic district then all the plots should fall on a single band.In fact, there is more than an order of magnitude variation.This is not surprising but the details of the variations should be helpful in providing a better understanding of the flood hazard.
The regional variations in F are clearly illustrated in Table 2.The highest values of F are generally associated with the arid southwestern states in regions 12, 13, 15, and 18, the mean value of F for these regions is F =5.03.The lowest mean value for F is in region 17, the Pacific Northwest, with F = 2.08.In some cases the standard deviations for F in a district are large.For district 18 (primarily California) the mean is 5.34 and the standard deviation is 2.4.In this case much of the deviation can be identified with the presence or absence of snow run off.Those stations with large upstream snow packs have relatively small values for F compared with those stations with little or no upstream snow packs.

Conclusions
Historical flood-frequency records have been examined to determine whether fractal (power-law) statistics are applicable.Although it must be recognized that the relatively short duration of historical records restricts the validity of conclusions; nevertheless, quite good agreement is obtained between fractal statistics and observations for 10 benchmark stations and for 1200 other stations in the United States.The basic question in terms of flood hazard assessment is whether extreme floods decay exponentially in time or as a power law.If the power-law behavior is applicable then the likelihood of severe floods is much higher and more conservative designs for dams and land use restrictions are indicated.
For fractal behavior the ratio of the 10 year to the 1 year flood F is also the ratio of the 100 year to the 10 year flood and the ratio of the 1000 year flood to the 100 year flood.We find large regional variations in values of F. In arid regions such as the southwestern United States the values of F are nearly three times the values in more temperate regions such as the northwestern and northeastern corners of the couatry.Smaller values of F are also found if upstream drainage areas have large snow packs.The relevance of R/S analysis to flood frequency forecasting has also been addressed.For the ten bench mark stations we find the Hurst exponent to be //i=0.7±0.03.This value indicates moderate persistence for the floods but also shows that determinations of Hurst exponents are not useful for flood hazard assessments.The Hurst exponent does not correlate with the fractal flood parameter F. In the terms introduced by Mandelbrot and Wallis [10] the Hurst exponent is sensitive to the Joseph effect or persistence of events whereas the fractal flood parameter F is sensitive to the Noah effect or skewness of the statistical distributions of floods.
It certainly remains to be demonstrated that fractal flood frequency statistics are generally valid.However, the success indicated in the results given here raises the interesting question whether the underlying physical processes are inherently fractal.Fractal statistics will be applicable to any scale invariant process.They are also applicable to dynamical systems that exhibit self-organized criticality [20].One s]}eculative conclusion is that the storms that generate floods are associated with the selforganized critical behavior of the atmosphere.
Fig. 3a.Numb«r of Hoods per year with a peak discharge greater than V. Station 1-1805 in Goss Heights, Massachusetts during the period 1911-1960.
Fig. 3b, Number of floods per year with a peak discharge greater than K. Station 11-09S0 near Pasadena, Califoriiia during the period 1914-1965.

Fig. 6a .
Fig. 6a.Distribution of the 1009 stations that have been analysed.

Fig. 7b .
Fig. 7b.The peak daily discharge for the largest annual floods over 40 years as a function of the assigned period: Station 3-2305.
Fig. Sa.Ratio of the observed peak daily discharge to the value predicted by the fractal fit to the data as a function of the assigned period for the 111 stations in region 3.
Fig. 8d.Ratio of the observed peak daily discharge to the value predicted by the fractal fit to the data as a function of the assigned period for the 100 stations in region 17.
Fig. 91).Fractal fits of the nonnalizcd flood frequency data for the Vl stations in region 4.

10QTaMc 2 .
Average values and standard deviations of the flood intensity factor F for the 18 hydrologic districts

.13 0.66 11-0980 Arroyo Scco (CA) 0.909 1.09 8.11 0.68 12-1570 Wcnatchee (WA) 0.310 1.69 2.04 0,72 MA
) and in Fig.4bfor station 11-0980(Pasadena,  CA).Good correlations are obtained with (13) taking Hi = 0.67 for station M805 and //1-O.68 for station 11-0980.Values of//i for all ten stations are given in Table1.The values are nearly constant with a range from 0.66 to 0.73 indicating moderate persistence.It is not surprising that the values of the Hausdorff measures H differ from the values of the Hurst exponent Hi since the former refers to the statistics of the flood events and the latter to the statistics of the running sum.