Why is Probabilistic Seismic Hazard Analysis (PSHA) Still Used?

Even though it has never been validated by objective testing, Probabilistic Seismic Hazard Analysis (PSHA) has been widely used for almost 50 years by governments and industry in applications with lives and property hanging in the balance, such as deciding safety criteria for nuclear power plants, making ofﬁcial national hazard maps, developing building code requirements, and determining earthquake insurance rates. PSHA rests on assumptions now known to conﬂict with earthquake physics; many damaging earthquakes, including the 1988 Spi-tak, Armenia, event and the 2011 Tohoku, Japan, event have occurred in regions relatively rated low-risk by PSHA hazard maps. No extant method, including PSHA, produces reliable estimates of seismic hazard. Earthquake hazard miti-gation should be recognized to be inherently political, involving a tradeoff between uncertain costs and uncertain risks. Earthquake scientists, engineers, and risk managers can make important contributions to the hard problem of allocating limited resources wisely, but government ofﬁcials and stakeholders must take responsibility for the risks of accidents due to natural events that exceed the adopted safety criteria.

streamflow or precipitation records. The main weakness of the analysis is that it takes no account of the actual climatic, hydrological and other geophysical mechanisms that produced the observed extremes. Rather, it is based on arbitrary postulates of preconceived probabilistic mechanisms so that the results do not reflect what is likely to happen in nature, but what would happen if these postulates were correct. The crucial aspect, i.e. whether they actually are correct, is not addressed at all. The usual excuse for this is that there are not enough data, the physical processes are not understood well enough and the planners, engineers, managers, etc. cannot wait with their decisions until these problems are resolved; hence, to comply with the practitioners' requests for estimates of 100-year, 1,000-year, and (lately) even 1,000,000-year floods and droughts, these crude methods have to be used." ********* "Without an analysis of the physical causes of recorded floods, and of the whole geophysical, biophysical and anthropogenic context which circumscribes the potential for flood formation, results of flood frequency analysis as [now practiced], rather than providing information useful for coping with the flood hazard, themselves represent an additional hazard that can contribute to damages caused by floods. This danger is very real since decisions made on the basis of wrong numbers presented as good estimates of flood probabilities will generally be worse than decisions made with an awareness of an impossibility to make a good estimate and with the aid of merely qualitative information on the general flooding potential." (Klemeš, 1989)

Introduction
"Earthquake prediction," which is generally defined as the issuance of a science-based alarm of an imminent damaging earthquake with enough accuracy and reliability to justify measures such as evacuations, has been the goal of unsuccessful empirical research for the past 130 years Geller, 1997;Kagan, 1997;Main, 1999;Kagan, 2014). Thus earthquake prediction is not presently a realistic option for mitigating earthquake hazards.
Much effort is now devoted to making earthquake hazard maps, and it is important to evaluate their performance. The national hazard map of the former Soviet Union seriously understated the risk in several areas, most notably for the 1988 Spitak, Armenia, event (Muir-Wood, 1993). Similarly, the Japan national hazard map seriously understated the risk in the area struck by the 2011 Tohoku earthquake (Geller, 2011). Hazard maps do not agree well with subsequent seismicity for several other cases as well ; but also see a critical comment by Frankel, 2013, and the reply by Stein et al., 2013). The failure of the hazard maps appears to be due to the use of models such as the elastic rebound model or characteristic earthquake model, which are contradicted by observations (Kagan et al., 2012;Geller et al., 2015).
Seismology's low predictive power notwithstanding, human society nevertheless has been, and will continue to be, faced with seismic safety issues that arise when deciding whether to build, and how to design, structures including nuclear power plants, dams, expressways, railways, high rise buildings, schools, hospitals, etc. The considerations involved in designing an earthquake-resistant structure are technical, financial, legal, political, and moral. The stakeholders desire that structures be designed and constructed to be sufficiently earthquake-resistant, while also being as inexpensive as possible. The stakeholders will obviously also seek to minimize their financial and criminal legal exposure in the event of an earthquake-induced failure.
Someone (or some group) must decide whether to build a particular structure. If the decision is positive then someone (or some group) must specify the ground motion that the structure should be designed to withstand. No one would argue that such decisions should be made without any input from earthquake scientistsseismologists, geologists, and earthquake engineers. On the other hand, it is essential that such consultants properly explain the uncertainties of the information they are providing, so that society as a whole can make appropriate judgements regarding the various tradeoffs . Such explanations must make clear the existence not only of "known unknowns" (formal statistical un-certainties within the framework of a particular model) but also (in the phrase coined by Donald Rumsfeld, a former U.S. Secretary of Defense) of "unknown unknowns" (uncertainties due to the limitations of the models being used), the effects of which may be much larger than those of "known unknowns" but are extremely difficult to quantify. This is actually a familiar issue in applications of statistics to evaluating results of physics experiments to determine whether or not a hypothesized physical effect (e.g., "the fifth force") is real (e.g., Anderson, 1992).
Much recent interest has been focused on testing PSHA estimates against actual seismicity (Kagan and Jackson, 2000;Schoerlemmer et al., 2007). The Collaboratory for the Study of Earthquake Predictability is attempting to set standards while extending forecasting to shorter time intervals (Jordan, 2006;Lombardi andMarzocchi, 2010, Mak et al., 2014). However, such testing sets are in many cases too small to allow any real power (e.g., Iervolino, 2013).

Cornell's Work
We begin our discussion of PSHA begin by reviewing Cornell's (1968) explanation of the motivations and assumptions of his work, quoting excerpts from his introduction (but omitting cited references).
(1) "Owing to the uncertainty in the number, sizes, and locations of future earthquakes it is appropriate that engineers express seismic risk, as design winds or floods are, in terms of return periods." (2) "[The] engineer should have available all the pertinent data and professional judgement of those trained in seismology and geology in a form most suitable for making [safety decisions] wisely. This information is far more usefully and completely transmitted through a plot of, say, Modified Mercalli intensity versus average return period than through such ill-defined single numbers as the 'probable maximum' or the 'maximum credible' intensity." (3) "The locations and activities of potential sources of tectonic earthquakes may be many and different in kind; they may not even be well known. In some regions, for example, it is not possible to correlate past activity with known geological structure. In such circumstances the seismologist understandably has been led to express his professional opinion in terms of one or two single numbers, seldom quantitatively defined. It is undoubtedly difficult, in this situation, for the seismologist to avoid engineering influences; the seismologist's estimates will probably be more conservative for more consequential projects. But these decisions are more appropriately those of the design engineer who has at hand more complete information (such as construction costs, system performance characteristics, etc.) upon which to determine the optimal balance of cost, performance, and risk." (4) "In this paper a method is developed to produce for the engineer the desired relationships between such ground-motion parameters as Modified Mercalli Intensity, peak-ground velocity, peak-ground acceleration, etc., and their average return period for his site. The minimum data needed are only the seismologist's best estimates of the average activity levels of the various potential sources of earthquakes (e.g., a particular fault's average annual number of earthquakes in excess of some minimum magnitude of interest, say 4). . . . The technique . . . provides the method for integrating the individual influences of potential earthquake sources, near and far, more active or less, into the probability distribution of maximum annual intensity (or peak-ground acceleration, etc.). The average return period follows directly." 2.2. Are the assumptions made by PSHA scientifically valid? Cornell's (1968) method allows all of the information provided by geologists and seismologists to be aggregated and used as the input to a "black box" procedure for providing curves that give the relation between ground motion parameters and "return periods" at a particular site. Engineers can then use these curves as the basis for choosing the design parameters of structures without having to get involved in the details of the Earth-science related issues.
Cornell's method and its later extensions fill a need for a seemingly objective "due diligence" procedure, which neatly separates the roles of Earth scientists and engineers. Proponents of PSHA seem to regard its validity as firmly established, and to dismiss criticisms as unwarranted. For example, Musson (2012) made the following statement: "It might seem odd to be writing about confirmation of the validity of probabilistic seismic hazard assessment (PSHA) in 2011, given that the method has been successfully applied in countless studies worldwide over the last 40 years. However, the fact that papers still occasionally find their way into print attacking the method as mathematically invalid seems to indicate that there is still some requirement, if small, to demonstrate the soundness of the method." As quoted above, Musson (2012) alludes to many successful applications of PSHA in "countless studies worldwide over the last 40 years." Since this statement was neither sourced nor clarified it is not clear which PSHA studies were being alluded to nor what qualified them as "successes." One can only guess, but presumably this statement is based on classifying all PSHA studies that were accepted by the clients or regulators as "successes." However, the real success or failure of PSHA studies can only be evaluated by comparison to subsequent seismicity, and as noted by Muir-Wood (1993), Geller (2011), and, many large damaging earthquakes have occurred in areas categorized as low risk by PSHA-based hazard maps.
As noted by Musson (2012), most debate about PSHA has been focused on mathematical issues, which basically involve questions of internal consistency. In contrast, this paper is about physics and statistics, not mathematics. We consider the question of whether or not PSHA is scientifically valid, in view of what is now known about the earthquake source process. To our knowledge this question has not heretofore been systematically investigated. We discuss below what is now known about the physics and statistics of earthquake occurrence. We then show in section 4 that the assumptions made by PSHA are not consistent with this knowledge.

Earthquake phenomenology-what is known?
As noted by Ben Manahem's (1995) review of the history of seismological research, earthquake kinematics has been emphasized and progress toward understanding the physics of earthquake occurrence has been slow; this is still true today. This lack of progress of progress should probably be attributed to the difficulty of the problem rather than to a lack of appreciation of its importance. Studies by Kanamori and Heaton (2000) and Mulargia et al. (2004) may serve as starting points for further work.
Significant effort have been made towards applying the statistical mechanics of critical phenomena to earthquake occurrence (cf. Rundle et al., 1995;Turcotte, 1997;Bak et al., 2002;Mulargia and Geller, 2003;Corral, 2003;Turcotte et al., 2009). The simple models (sandpiles, blocks and sliders, etc.) considered by such studies can provide some insights into earthquake occurrence. However, it is not clear whether these simple models are sufficiently realistic to assist in understanding actual earthquakes.
Seismic and geodetic data allow post hoc inference of the details of the mechanism of a given earthquake, and results from rock mechanics experiments have contributed to the physical interpretation of such results. However, these experimental results have not advanced efforts to predict future large earthquakes. For example, after the 1999 Izmit, Turkey, earthquake Parsons et al. (2000) argued, on the basis of the Coulomb Failure Stress criterion, that "The Big One" in Istanbul was imminent, but no such event has occurred. Among the reason for the failure of such predictions is that it is practically impossible to measure point by point in the crust a) the past stress history, b) the material properties, and c) the flow map of crustal fluids, which play a much more important triggering role than Coulomb stress itself (Mulargia andBizzarri, 2014, 2015). More generally, see section 10.1 of Geller et al. (2015) for a brief discussion of some other limitations on the applicability of laboratory rock mechanics experiments to actual earthquakes in the crust.
Two basic empirical statistical laws governing earthquake occurrence are well known: the Gutenberg-Richter (G-R) Law, which gives the relation between magnitude and frequency of occurrence, and the Omori Law, which gives the decay of aftershock activity with time. We now briefly review and summarize these laws. Note that these empirical laws specify frequency distributions, not probability distributions. See Freedman (1997) and Stark (2013Stark ( , 2016 for a discussion of this important distinction.

Gutenberg-Richter law and modern extensions
As stated by Gutenberg and Richter (1956), the G-R Law was where N is the number of earthquakes with magnitude great than or equal to M, and a and b are constants. Most studies find b ≈ 1. If eq. (1) held scale-invariantly as M becomes arbitrarily large, then the total energy released by earthquakes would diverge, so some deviation from scale invarance of eq. (1) is required for large magnitudes. This could be either a hard cutoff (a maximum magnitude), or a soft cutoff (a "corner magnitude") above which the frequency-magnitude curve falls off, say, exponentially, from the G-R straight line. (Because of atomic size there must also be deviation from scale-invariance as M becomes extremely small, but we will not discuss this further.) Until the mid-1970s global earthquake catalogs used body wave magnitudes, m b , and surface wave magnitudes, M s (Geller and Kanamori, 1977). However, as seismology progressed it became generally recognized that earthquake magnitudes measured from 20 s surface wave amplitudes saturated at about M s = 8.5 (Kanamori and Anderson, 1975;Geller, 1976), and that the scalar moment, M 0 given by where µ is the rigidity of the material bounding the fault, S is the fault area, and D is average slip, should instead be used to quantify the size of earthquakes. In order to preserve continuity with the magnitude scale, Kanamori (1977) defined the moment-magnitude, M w , as follows: where the scalar moment M 0 is quantified in units of N m. If M 0 is quantified in units of dyn cm then the constant in eq. (3) becomes 16.1 rather than 9.1. Using eq. (3) ensures that the magnitudes of extremely large earthquakes are properly quantified, while at the same time ensuring that smaller earthquakes have M w values roughly equal to their M s values. Kagan (2002) studied the moment distribution of earthquakes using data from the Harvard catalog (now Columbia University GCMT catalog), which is a basically homogenous catalog in which all moment values are determined using methods that have not significantly changed since its inception. He analyzed global shallow earthquake data from the period 1977-2000.

Kagan's and related work on moment distribution
In this subsection only we follow the notation of Kagan (2002). He denotes the scalar moment by M rather than by M 0 and the moment magnitude by m rather than by M W . Kagan (2002) begins by rewriting the classical G-R law (eq. 1) to account for the fact that earthquake catalogs have a magnitude threshold m t below which the catalog is incomplete. He wrote where N(m) is the number of earthquakes with magnitude greater than or equal to m and a t is the (base 10) logarithm of the number of earthquakes with magnitude greater than or equal to m t . Using eq. (3) we can define the slope parameter for the moment distribution, β, in relation to the slope parameter for the magnitude distribution, b, as follows: If we differentiate eq. (4) and use eq. (5) we obtain the following functional form for the derivative of the frequency distribution, φ(M): where M β t is the moment for the threshold magnitude m t . While this may look like a probability distribution (a Pareto distribution), actually treating it as a probability distribution is a huge leap, with no sound physical basis (cf. Freedman, 1997;Stark, 2013Stark, , 2016. Fig. 1 (modified from Kagan, 2002) shows the data from the Harvard catalog (now the GCMT catalog) for all shallow (depth≤70 km) earthquakes with magnitude greater than about 5.8 from 1977-2000. The top panel shows all the data, and the lower panel shows a close-up of the lower right part of the upper panel. The classic G-R curve for b = 1 (i.e., for β = 2/3) fits the data well up to about magnitude 7.7 or so, and then the data fall increasingly below the G-R curve, as required by physics if the energy released by earthquakes is to remain finite (see Kanamori, 1977). Kagan fits three distributions to the data in Fig. 1: the tapered G-R, the gamma distribution, and the truncated Pareto. A visual comparison shows that all of the last three distributions do a reasonable job of fitting the data, but which, if any, is "best" has not, to our knowledge, been established by objective testing. The equations for these three distributions are given by Kagan (2002) and will not be repeated here.
The tapered G-R distribution and gamma distribution both have in common that they overlay G-R at lower magnitudes but fall off exponentially (i.e., below the G-R curve ) at magitudes above the "corner magnitude," which is a parameter in each of these distributions. The implication of this work for seismic hazard assessment is that rather than thinking in terms of a "hard" maximum magnitude we should be thinking in terms of a "soft" corner magnitude, with no "hard" upper limit on the magnitude. It is not clear whether this is sufficiently appreciated by workers on seismic hazard estimates.
Kagan and Jackson (2013) consider regional variations in corner magnitudes for different subduction zones. Although such differences may exist, the shortness of the available record makes it difficult to estimate them. This also applies to attempts to find regional differences in corner magnitude for intraplate regions, as the data are even sparser. Another important question is whether b-values vary spatially or temporally. As this topic is outside the main area of this paper a detailed discussion is omitted. However, as noted by Kamer and Hiemer (2015), claims of regional or temporal variations in b-values have a high likelihood of being artifacts.

Omori Law and earthquake Clustering
As mentioned above, the two fundamental empirical laws of seismology are the Gutenberg-Richter Law and Omori's Law (Omori, 1895;Utsu et al., 1995). Recent work has generalized Omori's Law in important ways, as we now discuss. These new results have significant implications for earthquake hazard estimation.
Our discussion is based on the Italian seismic catalog of Gasperini et al. (2013), which is the result of a careful equalization of bulletin records (without any forced compliance to the Gutenberg-Richter or Omori laws), correcting for the individual responses of each sub-net, for the type of instruments, operational periods, methods of analysis used, etc. (Lolli et al., 2014).
The catalog starts on January 1, 1981, is complete for magnitudes above 2.5 and consists of 20547 events, including 19 M5.5+ destructive earthquakes. It has a non-constrained b value of 0.99 and fits the G-R law well (see Fig. 2). Fig. 3 shows the instantaneous rate of event occurrence versus time for different magnitudes, illustrating through its vertical "pinnacles" one main feature of earthquake occurrence, namely, clustering in time. As shown by Fig. 4, for M > 2.5 events, each cluster has a rapid increase in the rate of event occurrence at the beginning of a cluster (and thus a sharp decrease in inter-event time), and a subsequent smooth decrease in the rate of occurrence (and thus a smooth increase in the inter-event time). Fig. 5 shows a close-up of the rightmost portion of Fig. 4.
The behavior seen in Figs. 4 and 5 is well represented by the Omori power law with d ∼ 1, and where λ(t) is the frequency of event occurrence at time t after the initial event, and k and c are constants (see Utsu et al., 1995). Clustering in time (Kagan and Jackson, 1991) is accompanied by clustering in space, as is apparent in Fig. 6. Note that a clear simultaneous time-space picture is readily obtainable due to the geographic shape of Italy, which is essentially a North-South segment with earthquakes distributed along its structural backbone, the Apennines.
Another major phenomenological feature accompanying clustering is scale invariance. A fundamental result of Bak et al. (2002) and Corral (2003) is that an identical picture exists (Fig. 7) for the rescaled recurrence times of all the events independent of magnitude, as is shown for the distribution density function D(τ) of the inter-event times τ, renormalized to the occurrence rate over a time interval in which it is approximately constant (In discussing scale invariance here we neglect what happens above the corner magnitude.) This same scale-invariant picture was found to occur in many regions, irrespective of the spatial extent, magnitude threshold, and the seismic catalog from which the data were taken (Corral, 2003(Corral, , 2004. Our analysis of the Italian catalog differs from Corral (2003Corral ( , 2004Corral ( , 2005a in that that we deal with the whole catalog, without any selection of stationary periods, i.e., without using eq. (8), to avoid any arbitrary choice. This leads to a little blurring of the picture (compare Fig. 7 to similar figures in Corral 2003 and 2005a), but nevertheless retains the same general features. The mathematical environment in which this can be framed is the renormalization group (ibid). In quantitative terms, the scaling law holds in general, due to the self-similarity in space and time of earthquake occurrence. The functional form of f is close to a generalized gamma distribution where θ = Rτ is a dimensionless recurrence time, γ ∼ 0.7, δ ∼ 1, a ∼ 1.4, and Γ is the gamma function. Thus f decays as a power law for θ < 1 and exponentially for θ > 1. The crucial physical implication is not just the specific functional form of eq. (10) -which we call the Bak-Corral equation -but the universality that it implies through eq. (9), which appears to apply irrespective of event size and spatial domain. This constitutes a basic property of earthquakes, which appear always to occur in clusters with an identical scale-invariant behavior. Rescaled by rate of occurrence R, this scale-invariant pattern occurs identically both in "obvious" clusters as well as in periods where no obvious clustering is apparent (i.e., in periods which Corral calls "stationary"). This can be viewed as a generalization of Omori's law, but its meaning is more fundamental: while Omori's law just governs the decay of the occurrence rate following a large earthquake, i.e. "the aftershocks of a mainshock," the Bak-Corral equation governs the occurrence of all earthquakes, independent of their size and of the region considered.
The behavior discussed above is phenomenological and does not imply any specific physical model. In practical terms, it implies: 1) positive clustering at short time scales, with events tending to occur closer to each other in time and space, but 2) negative clustering at long time scales, with sparse events tending to be even sparser. This leads to the counter-intuitive consequence that the waiting time until the next earthquake becomes longer the longer one has been waiting for it. This is at odds with both elastic rebound, which would imply shorter waiting times as the elapsed time becomes longer, and with Poisson models, for which the waiting time does not depend on the elapsed time (Davis et al., 1989;Corral, 2003;Mulargia, 2013).
Earthquakes do exhibit "memory" in several respects and there exist a number of correlations in seismicity (Corral, 2005a;2006). The effect of memory is obvious in Omori's law and in the time evolution of any cluster, as illustrated by Figs. 4 and 6. Note that this also occurs identically outside major clusters, in the time intervals called "stationary" by Corral (2003Corral ( , 2005, due to the fact that clustering is always present. The existence of memory may appear surprising when considering large areas, since one might think that the superposition of the seismicity of many different areas could be viewed as the sum of many independent processes, so that the seismicity of a broad region should converge to a Poisson process (Cox and Lewis, 1966). However, as discussed above, this does not occur in practice for Italy (Fig.  7) or for the whole world (Corral, 2004).
We can summarize earthquake memory effects as follows. 1) Pre-earthquake conditions are all alike due to the lack of correlation of the magnitude with previous seismicity, confirmed also by paleoearthquake data for single faults (Weldon et al., 2004;Cisternas et al., 2005).
2) The larger an earthquake, the shorter the time until the next one, due to the negative correlation between magnitudes and the occurrence time of the next earthquake.
3) The shorter the time from the last earthquake, the shorter the time until the next one. 4) The longer the time since the last earthquake, the longer the expected time until the next one (Davis et al., 1989;Corral, 2005b).
No physical or statistical model developed to date matches this phenomenology. In particular, the ETAS model (Ogata, 1988(Ogata, , 1999, which combines the Gutenberg-Richter law, the Omori law, and an aftershock production rule, is not scale-invariant.

PSHA: assumptions vs. reality
As discussed in section 2, in the almost 50 years since Cornell (1968) was published, PSHA has become a standard tool for hazard analysis, despite receiving some criticism. Current procedures for implementing PSHA are discussed in detail by Hanks et al. (2009) and Kammerer and Ake (2012). PSHA is based on the classic Cornell (1968) approach. The seismic hazard at a given site is estimated (Stirling, 2014) based on a) the location of the site with respect to known or assumed earthquake sources, b) the assumed recurrence behavior of these earthquakes sources, and c) the computed ground motion for the earthquakes at the given site.
In this paper we make the case that PSHA is fundamentally flawed and should therefore be abandoned. The results generated by PSHA should therefore not be used by society as the basis for decisions on seismic hazard mitgation. Our main concern is not with the numerical inputs used by various PSHA practitioners, but with the methodology itself.
As discussed below, our case rests on three main arguments.
(1) PSHA makes assumptions that contradict what is known about seismicity. (2) PSHA fundamentally misuses the concept of "probability." (3) In practice PSHA does not work; as noted above (and see  many recent destructive earthquakes occurred in regions that PSHA identified as low risk.
We hope that researchers who disagree will respond by attempting to present physics-based data and analyses that refute the above three arguments. If this cannot be done then, like other once widely accepted theories or methods that did not agree with observations or experiments, PSHA should be scrapped. We emphasize that the arena for the debate on PSHA should be that of physics, rather than mathematics, taking fully into account what is now known about the phenomenology of earthquake occurrence, as discussed above in section 3.

PSHA cookbook
In more detail, the "PSHA cookbook" (e.g., NEHRP, 1997; USGS-WGCEP, 1999; Kammerer and Ake, 2012) follows the following basic steps: 1) identify seismically active regions, structures, and faults using earthquake catalogs, geologic evidence, and geodetic strains (derived from GPS data); 2) for each fault or region, estimate the average event rate, using both instrumental and historical seismicity data, and geodetic strains; 3) for each source, using a physical model, infer the average future occurrence rate λ = N/Δt, (where N is the number of events in the time interval Δt), commonly expressing it in terms of the recurrence time: τ = 1/λ; 4) attenuate the spectrum of ground motion from each source to the given site through an empirical function derived from the ground motion prediction equation (GMPE); 5) forecast the ground motion, parameterized as an engineering variable such as the peak ground acceleration (PGA), in terms of the probability of exceedance p or its reciprocal, the expected return period, T = 1/p, of a given level of shaking; for example, a 10 percent probability of exceedance in 50 years is equivalent to a return period of 475 years; 6) sum the exceedance probabilities of the different sources to account for the fact that a given site is potentially subject to shaking due to a variety of earthquakes sources; 7) use the forecast values as design parameters for the building or other structure being designed.
Additional steps can also be introduced. Beyond specific choices about extrapolating the smaller magnitude event rates to the "maximum credible earthquake" (Wang, 2008), the most common additions are as follows (NEHRP, 1997; Kammerer and Ake, 2012): 8) assume that the earthquakes will repeat identically not only on the same faults, but with the same size and mechanism and produce a spectrum identical to past occurrences, and use this to forecast the expected ground motion at each frequency (the response spectrum) in terms of exceedance probabilities; 9) estimate the local amplification at each site from the local shear wave response, which is primarily determined using the shear wave velocity profile versus depth in the uppermost 30 meters (Vs30); We now consider whether the above cookbook can be expected to provide reliable and accurate hazard analyses, given the available data and present knowledge of the physics of earthquake occurrence.

Insufficiency of seismicity data
PSHA estimates the seismicity rates of a given region based on data on past earthquakes. Ideally, this requires a complete catalog of all significant earthquakes for an extensive time period. Unfortunately, however, catalogs based on instrumental recordings are available only since about 1900, and are only reliable after about 1960. Historical data for about 2000 years is available for some countries (e.g., Italy, China, Japan), but even though the dates and approximate epicenters are probably reliable, it is difficult to obtain reliable magnitudes and focal mechanisms, and the catalogs are incomplete in unknown ways. Even the Italian macroseismic catalog, which for historical and geographic reasons is of high quality-the area of Italy is small and it has been a site of civilization for over two millenniararely reports more than one significant occurrence in any given seismic region (Boschi et al., 2000). In principle, palaeoseismology can extend the time span of seismic catalogues to thousands of years, allowing the inclusion of several large earthquakes on the same tectonic structure (Sieh et al., 1989). However, exposed faults are rare, so such direct estimates are generally unavailable (Mulargia and Geller, 2003).
An indirect estimate can be achieved by extrapolating the G-R law from lower magnitudes, making use of the main phenomenological feature of earthquakes: scale invariance. Thus the behavior of large earthquakes, for which data are insufficient, can be inferred from small seismicity, for which data are available. However, in addition to the general risks and inaccuracies of any extrapolation, this procedure is also subject to great uncertainty depending on whether a maximum event size (Wang, 2008) or a corner magnitude (Kagan, 2002) is assumed.
The lack of a sufficiently long time series may lead to errors, since often "the largest historic magnitude" is taken as an upper limit, which is inappropriate (Mulargia, 2013). The 2011 Tohoku earthquake is a good example of a magnitude 9.0 earthquake happening where a maximum magnitude 8.0 had been wrongly specified by official Japanese government forecasts (Geller, 2011;Kagan and Jackson, 2013).
If, hypothetically, seismic catalogs 10 5 years long were available, seismicity might appear to be nearly time independent on that time scale, because the time scale of geological processes, on the order of ∼ 10 6 years, is much longer than that of inter-earthquake intervals in seismically active regions, 10 2 ∼ 10 4 years.
Unfortunately, only very short seismic catalogs are available, so it is unlikely that the available seismicity data are time independent. Also Liu et al. (2011) found considerable variability in the spatial pattern of seismicity in North China over the past 2000 years.
In conclusion, even if there were no methodological problems afflicting PSHA, which, as we discuss below, is not the case, the insufficiency of the available seismicity data means that it is highly questionable that PSHA can provide reliable and accurate hazard analyses.

PSHA's dubious basic assumption
The word denoted by the first letter of the acronym PSHA is "probabilistic." PSHA proponents appear to take for granted that earthquakes truly occur randomly in time and space, so it is reasonable and natural to estimate the spatiotemporal probability distribution of events, and to use that estimate as the foundation for risk assessment (see, e.g., Musson, 2012). Values of such probabilities are key inputs to PSHA, which then produces numbers such as "the annual probability of a peak ground acceleration of 0.2 g" at a particular site.
PSHA assumes that "the probability of an earthquake at a particular site" or the probability of "how often, both in terms of inter-event time and magnitudefrequency, earthquakes occur" exists. There are data on occurrence dates, locations, and magnitudes (subject to various errors and catalog incompleteness uncertainties) of past earthquakes. Let us call these "frequency" data. The information PSHA requires is the "probability" of future earthquakes. PSHA simply conflates the observed empirical frequencies with probabilities.
But frequencies are not probabilities, nor are they necessarily estimates of probabilities. If seismicity were the result of a random process, its empirical frequencies could be used to estimate the underlying probabilities. However, unless we have exogenous knowledge that the empirically observed frequency of past seismicity is due to something random, defining a quantity called the probablity of future seismicity is just an arbitrary assumption about the mechanism that generates seismicity There are several conceivable potential justifications for modeling seismicity as random. For instance, there might be an underlying theoretical basis in physics, as there is in quantum mechanics-but that does not appear to be the case for earthquake physics. Or modeling seismicity as random might lead to a compact and statistically adequate description of the data, but in fact seismicity is empirically inconsistent with random seismic models that have been proposed (Luen, 2010;Stark, 2013). Or perhaps modeling seismicity as random might lead to useful predictions or hazard analyses, but, as we have discussed above, they do not. For further discussion of the fundamental issues involved here see also Stark (2016).
"In a die-based experiment, the statement 'the probability of rolling a two on this die is 0.166667' (i.e., this die is fair) could be tested by making a suitable number of rolls and observing directly the number of twos that are rolled as a fraction of the total number. In seismology, a similar solution could be applied to answering the question 'What is the probability that tomorrow there will be an earthquake larger than 6 Mw somewhere in the world?' It would be sufficient to collate the data for the past 1,000 days and observe on how many days an earthquake above 6 Mw was recorded. To answer the question 'What is the annual probability of 0.2 g PGA at my site?' is more intractable, as the data are insufficient, and there may actually be no past observations of the target condition.
"The test of a statement about earthquake ground motion would be to make observations over a long enough number of years (say, 100,000) and count the number of times an acceleration over 0.2 g is recorded. Or even better (but even more impractical), observe the same year 100,000 times in independent parallel universes. This would give a very classical frequentist estimate of the probability, based on simple counting.
"While one cannot in actuality venture into multiple parallel universes, one can very easily do the next best thing, which is to simulate what would happen if one could. One can think of a PSHA study as consisting of two parts: a model (represented by a PSHA input file) and a process (represented by a seismic hazard program). The model is essentially a conceptualization of the seismic process expressed in numerical form, describing 1) where earthquakes occur, 2) how often they occur, both in terms of inter-event time and magnitudefrequency, and 3) what effects they have. With these three elements, one describes everything that determines the statistical properties of the seismic effects that will occur at a given site in the future. This is, therefore, all that one needs to simulate that future. One does not know precisely what will happen in the future, one only knows the aggregate properties of the seismicity and the ground motion propagation. Therefore simulations need to be stochastic. For each simulation, the probability density functions for earthquake occurrence in the model are randomly sampled to produce one possible outcome compatible with the model." It appears to us that Musson's argument is circular: it commits the well known fallacy of assuming that the real world is governed by probability in the same way that games of chance are (e.g., Taleb, 2007), and it rests on conflating probability with frequency and on conflating estimates of parameters with the true values of the parameters. Musson's argument implicitly assumes that earthquakes are generated by a stationary random process that is known except for a few parameters. Then he asserts that by observing seismicity for a long enough time period we could estimate the values of those parameters well. Once we know the parameters well, we can simulate additional (synthetic) seismicity, which we can then use to find other probabilities numerically. But where is the evidence that games of chance are a good model for earthquakes-i.e., that earthquakes are generated by a stationary random process? None is presented by Musson. The models used in the common formulations of PSHA are the characteristic earthquake model and the Poisson process model. We now discuss the validity of these models.

The characteristic earthquake model
The characteristic earthquake model views large earthquakes in a particular region as the result of a cycle in which elastic strain steadily accumulates and is then released by the "characteristic earthquake" for that region when some limit is reached. Each repetition of the cycle is posited to result in a repetition of the "same" characteristic earthquake. This model is based on classic ideas proposed by Gilbert (1884) and Reid (1911), and was given widespread currency by Schwartz and Coppersmith (1984). In a recent paper (Geller et al., 2015), the present authors reviewed the characteristic earthquake model and discussed its failure to agree with observed data. Needless to say, it is a basic principle of physics that any model or theory, however intuitively appealing it may be, and regardless of whether or not it has a majority of supporters in some community, must be discarded if it does not agree with observed data, and we argued that this is clearly the case for the characteristic earthquake model. Kagan et al. (2012) pointed out that even though it has been refuted by a series of studies, the characteristic earthquake model, together with related concepts such as seismic gaps, and the seismic cycle, continues to be widely used. How can this situation persist? The following comment by Parsons et al. (2012) provides a clue: "The controversy between the characteristic and Gutenberg-Richter [sic] models was partly developed through discussions by Nishenko and Sykes (1993), Jackson and Kagan (1993), Kagan and Jackson (1995) and Rong et al. (2003). This debate did not have a clear conclusion and is ongoing. However, the recent great M=9.1, 2004 Sumatra and M=9.0, 2011 Tohoku earthquakes do not seem to fit the definitions of characteristic earthquakes that were expected in their respective regions. The approximate interval between great earthquakes in the Nankai zone is about 150-600 yr, meaning that empirical resolution of this issue could require many hundreds of years." We can see three factors at work here. First, it is up to editors and referees whether or not to allow authors to continue to invoke refuted theories and models simply by asserting that the "debate did not have a clear conclusion and is ongoing." In order to be allowed to make such an assertion the authors should have been required either to cite a specific recent study that supports the model in question and refutes the criticisms thereof, or to make such a case themselves. We see no evidence of either in Parsons et al. (2012).
A second key point is the framing of the issue. Parsons et al. (2012) have framed the issue as "are great earthquakes in the Nankai zone characteristic or not?" It seems to us, however, that this framing is an unacceptable departure from the fundamental principles of physics, which seeks universal laws. In the absence of a specific and well documented reason why the physics of great earthquakes in the Nankai zone should be different from that of great earthquakes at other subduction zones throughout the world, Occam's Razor, the principle of choosing the simplest possible hypothesis, tells us we should test the hypothesis "all great subduction earthquake are characteristic," rather than testing each individual subduction zone as a separate case. In fact work by the UCLA group (e.g., Rong et al., 2003) has done just that, repeatedly obtaining the result that the characteristic earthquake model (which can also be called the seismic gap model) is not successful beyond chance. While it might take hundreds of years to obtain a meaningful result at one particular site, the UCLA group has repeatedly shown that ten years is sufficient to obtain a meaningful result if data from all of the world's subduction zones are used.
Finally, a third point is the hypothetical question of what would happen if, as suggested by Parsons et al. (2012), we were to wait hundreds of years or longer and obtain data on four or five great earthquakes in the Nankai zone. Even if these great earthquakes appeared to be periodic (or somewhat periodic), as was pointed out by Savage (1992), that could either be because the characteristic earthquake model was correct, at least for the Nankai zone, or it could be accidental. Thus a test of the type proposed by Parsons et al. (2012) would not be conclusive.
The subject of this paper is the appropriateness of using PSHA in applications where lives and the safety of major infrastructure hang in the balance. Such applications should be based only on well validated methods with a sound physical basis. The spectrum of recent views on the characteristic earthquake model lies between complete refutation (Kagan et al., 2012;Geller et al,, 2015), and "ongoing debate" (Parsons et al., 2012); no recent work in peer-reviewed journals makes a strong data-based case in support of the characteristic earthquake model. Thus use of the characteristic earthquake model in real-world hazard analysis applications should be eschewed.

Exceedance probability, return time, and Poisson process model
In the Cornell formulation PSHA estimates are generally based on the return time, τ, and the exceedance probability, p, both sometimes also used in hydrology (Gupta, 1989). Note that the use of an exceedance probability implicitly assumes that earthquakes occur at random: some (stationary) stochastic mechanism generates seismicity, so the parameter p exists in the world and can be estimated from observed occurrence data. The simplest stochastic model proposed for earthquakes is a Poisson process, for which waiting times between events are independent, identically distributed exponential (memoryless) random variables. The memoryless property of the exponential distribution contradicts the phenomenology-earthquake clusteringdiscussed above in section 3.3. For Poisson processes, the theoretical "recurrence time" is simply the reciprocal of the theoretical rate of events per unit time. However, inter-arrival times for a Poisson process are highly variable: the "recurrence time" would be useless for prediction, even if it were known perfectly.
Because actual seismicity has more clustering than a Poisson process can account for, it is common to estimate the rate in Poisson models of seismicity after first "declustering" the catalogs to try to make them more compatible with the Poisson assumption. Declustering methods attempt to remove foreshocks and aftershocks, leaving only mainshocks. However, the definition of mainshock is effectively circular and algorithm-dependent: "mainshocks" are those events that the declustering algorithm does not delete. Also, in many cases aftershocks can be destructive, so removing them from a hazard assessment seems inappropriate.
Declustering algorithms may be divided into two main classes (Luen and Stark, 2012): mainshock window methods and linked-window methods. Mainshock window methods remove all earthquakes in a space-time window around every "mainshock," but the definition of "mainshock" is subjective and varies from author to author. A prototype of this approach is the algorithm of Gardner and Knopoff (1974). Linked-window methods (e.g., Reasenberg, 1985) allows a posteriori identification of mainshocks, aftershocks, independent events and, for some specific algorithms, also foreshocks. However, the choice of parameters is subjective, as is the choice among the various options. In conclusion, declustering discards the principal phenomenological feature of earthquake occurrenceclustering-and replaces it by a subjective entity-mainshocks. As such, it appears to be illogical and thus unacceptable. Furthermore, current declustering approaches are faulty even from a strictly statistical viewpoint, since they are unable to produce truly Poissonian residual catalogs (Luen and Stark, 2012).

Logic trees: Does the sum of ignorance lead to knowledge?
Some applications of PSHA seek to combine all possible options into a "logic tree," more or less subjectively assigning a probability to each branch, which will result in a wide variety of scenarios; these are presumed to provide a thorough basis for decision makers. This approach acknowledges our ignorance, but treats it by trying to reach an expert consensus. Such techniques have their origin in the Delphi Method (Dalkey and Helmer, 1963), which was used to assimilate expert opinions on possible outcomes of a nuclear war, In practice, a large-but not too large-group of experts (typically 7 to 12 in number) is convened and brainstorming is conducted to search for a consensus. What the numbers obtained from a logic tree mean is less than clear. Failures of engineered systems with a small number of well-understood components whose failure modes are known might sometimes be fruitfully modeled as a tree with known nodes and branches and estimable probabilities at each node, at least as a thought experiment. But there is no evidence that this is meaningful or useful for phenomena as complex and poorly understood as earthquakes.
The use of "expert opinion" in PSHA studies implicitly acknowledges the unscientific nature of such endeavors. For example, expert opinion is not needed to make calculations involving Newton's law of gravitation; barring computational blunders, a high school student and a Nobel prize winning physicist will calculate the same forces from the same input data.
As described in section 2.1, the essence of Cornell's (1968) approach is to separate the tasks of the seismologist and engineer. The job of the former is to provide "best estimates of the average activity levels of the various potential sources of earthquakes," and the latter does the rest. This division of labor has the effect of shielding the engineer from responsibility if the "best estimates" fail to match subsequent earthquake activity. The engineer is also shielded from the need to study and understand the current uncertainties of the seismological state of the art.

Discussion and Conclusions
Let us now return to the basic issue discussed in the introduction. Society will continue to consider building large-scale structures, including major industrial facilities such as nuclear power plants, in earthquake-prone regions. At the end of the day, someone or some group must decide either to approve or reject the proposed construction. Society's needs being what they are, it seems inevitable that at least some, and perhaps most, of these applications will be approved. In the event of approval, someone or some group must decide on quantitative criteria for the strength and duration of ground motion (and tsunami, etc.) that must be designed for.
The tradeoffs in this process are obvious. Safety is desired, but so is lower cost. There is great implicit, and often direct, pressure on geoscience consultants, architects, and engineers to reduce costs by lowering the design criteria. On the other hand, in the event of an earthquake-caused failure of the structure or of critical equipment (e.g., a nuclear reactor) inside the structure, all parties involved in this process have strong incentives to minimize their financial and criminal liability. PSHA, as based on Cornell (1968), is an ideal solution to this conundrum. Geoscience consultants feed information into the PSHA machine, the crank is turned, and a seemingly objective number pops out. Basing decisions on this number effectively absolves engineers, architects, project developers, and owners from responsibility and liability, since the PSHA methodology is deeply entrenched as a standard element of "due diligence." However, the fly in the ointment is that, as discussed above, the numbers that pop out of the PSHA machine have no theoretical or empirical justification, and thus do not quantify risk reliably or accurately.
The situation in our field is thus the same as that in hydrology, as discussed by Klemeš (1989) in the quotation at the beginning of this paper. Society would be far better off if all participants in the earthquake design process honestly acknowl-edged the grave shortcomings of PSHA and scrapped it rather than continuing to provide numbers for use in life-and-death earthquake safety decisions based on an as yet unvalidated theory which, as shown above, disagrees with the physics and statistics of earthquake occurrence.
What should we be doing instead? The answer does not lie in making minor (or even major) adjustments to PSHA as now practiced and relaunching it, as the problems go far deeper. What has to happen instead is that rather than leaving the choice of earthquake design criteria up to only geoscience and engineering consultants, all participants-including project developers and owners, risk managers, engineers, and architects-have to "take ownership" of the inherent uncertainties of earthquake hazards, rather than simply tasking everything to geoscience and engineering consultants. In recent years administrative procedures in the European Union have adopted what is called the "precautionary principle" (Foster et al., 2000). The implementation of this principle in individual cases is still evolving, but it may provide some guidance here.
In the roughly 50 years since the work of Cornell (1968) PSHA has grown into a minor industry, and we recognize that there may be strong resistance to our call for its abandonment. However, the facts must be squarely faced: PSHA is based on flawed physical models, and its hazard forecasts do not agree with subsequent seismicity. That being the case, a new way forward must be sought, starting again from square one.

Some thoughts on the way forward
Having pointed out the grave flaws of PSHA-based due diligence, it behooves the authors to make some suggestions about where we might go from here. One major conclusion of this paper is that responsibility for seismic safety measures must be shared among all parties, rather than just being tasked to geoscience and engineering consultants. The obvious place to start is by looking at risks, rather than just hazards. In many countries (e.g., Greece, Turkey, Armenia, Italy, etc.), a majority of the building stock has little or no earthquake resistance. Even in Japan and California, many older buildings can be readily identified as lacking sufficient strength. A reasonable first step might be to focus on the vulnerability of individual buildings and systems. This might include studies of the buildings' and systems' capacity to resist various earthquakes, including beyond currentcode earthquakes.
A second step is an open acknowledgment that "absolute safety" is impossible to achieve. Just as in automotive and air traffic, there will always be accidents with casualties; the only realistic goal is to reduce their occurrence and effects as much as possible, within the limits of society's ability and willingness to pay (Meehan, 1984;Geller et al., 2013). If risk is acknowledged, then open discussion of how to make structures "fail soft" will become possible.
A third step concerns how to most economically make progress and how to minimize litigation. This requires a clear sharing of responsibilities and costs among all of the stakeholders. The measures required (new laws and government regulations) are outside the authors' professional expertise, but we suggest that a first step should be for geoscience and engineering professionals to communicate the full extent of the uncertainties, including an unqualified repudiation of PSHA, to lawmakers and cognizant government officials.
Fourth, correct information on seismic risk must be supplied to the public in understandable form, so that each individual can take appropriate measures to prepare for future earthquakes.
Finally, everyone involved in seismic safety issues should acknowledge the shortcomings of PSHA, and its use as an unquestioningly relied on black box must cease.       (Gasperini et al,, 2013). Events are plotted in latitude vs. time due to the fact that Italy has a (tilted) N-S alignment. Figure 7: The rescaled density of the rescaled interevent-times (see text) for all the event magnitude classes of the New Italian Seismic Catalog (Gasperini et al., 2013). The best-fitting gamma density is also shown.