Modeling Non-Homogenous Poisson Process and Estimating the Intensity Function for Earthquake Occurrences in Iraq using Simulation

The Non-homogeneous Poisson Process, with time dependent intensity functions, is commonly used to model the scenarios of counting the number of events that appear to take place in a given time interval. The identification of the process relies on the functional form for the intensity function, which can be difficult to determine. In this paper, a Non-Homogenous Poisson Process model is proposed to predict the intensity function for the number of earthquake occurrences in Iraq; the constructed model allows anticipating the number of earthquakes occur at any time interval with a specific time length. Then, to estimate the model parameters, the data obtained from the annual reports of the Iraqi Meteorological Organization and Seismology (IMOS) from January, 1 st 2018 to May, 1 st 2023 is used. Moreover, a simulation study is conducted and a new algorithm is introduced to show the performance and the applicability of the proposed model.


Introduction
Poisson process models have been widely used to describe the scenarios of counting the number of events that occur at a certain rate in which the intensity function is formulated as a timedependent variable.The Model with a constant rate of occurrences, which is known as Homogenous Poisson Process (HPP), does not allow for any changes in the intensity function of occurrence over a time interval.While a Non-Homogeneous Poisson Process (NHPP) is a more complex model; it is used in cases when events occur differently within time and the intensity function is not constant anymore.In addition, for initial state, the probability of no event occur is one and the probability of n event occur is zero.
Destructive Earthquake Occurrences (EO) are one of the biggest natural disasters that take place in various countries in the universe, apart from their devastating impact on people's lives, earthquakes can have serious effect on economic collapse, and the cultural heritage of urban regions.Thus, modeling the EO is always a major research object in seismic probabilistic prediction and various modeling approaches have been developed from different viewpoints, for example: Stochastic Time-Predictable Model for EO (Anagnos, and Kiremidjian, 1984); a Nonparametric hazard model to characterize the spatio-temporal occurrence of large earthquakes (Faenza, et. al, 2003); A statistical analysis and comparison of earthquake and tsunami in Japan and Indonesia ( Parwanto and Oyama, 2014).furthermore, to model the casualty rate of earthquakes and related covariates, linear regression, beta regression, and semi-parametric additive regression are used, (Turkan and Ozel, 2014); testing probabilistic earthquake forecasting and seismic hazard models (Marzocchi and Jordan, 2018;Masoud, et. al, 2022); Generalized Linear Model for estimating the probability of EO (Noora, 2019) and many others.In addition, many stochastic approaches have been used as powerful tools for modelling EO, e.g.Statistical models for earthquake occurrences and residual analysis for point processes (Ogata, 1988) ; A new approach, which is obtained by mixing Poisson distribution with quasi-xgamma distribution, to model the counts of earthquakes (Altun, at el., 2021).Despite that, scientists have never predicted a major earthquake but can only determine the probability that a significant earthquake will occur in a specific area within a certain time interval.Therefore, to estimate the number of EO that may occur in future to predict the consequences of such destructive events, proposing statistical models to describe the counts of EO is very essential; it helps to take measures and actions based on the results obtained from implementing the proposed models.
Although earthquakes in Iraq, which are classified as events with medium hazard, are likely to happen from time to time and their occurrences may cause significant destructions, but there is always a shortage of data and lack of research studies concerning the assessment of the number of the earthquakes that may occur in future.Many authors investigated the seismicity, which is the occurrence or frequency of earthquakes in different regions of Iraq (e.g.Al-Abbasi and Fahmi, 1985;Alheety, 2016;Khagoory and Al-Rahim, 2023;Kagan, 2010;Francizek, 2018) and many others; unfortunately, none of these studies concern modeling the number of EO or exploring methods for estimating parameters of the intensity function.The main objective of this work, with no attention paid to the geophysical causes, is to construct a NHPP based approach to model the EO and then to predict the number of earthquakes that will happen in future as the earthquake occurrences in Iraq and the border regions from one month to the next are independent of each other and the numbers are not the same; thus the behavior is characterized by a NHPP.
Before introducing this model, basic concepts and the main properties for Poisson Process and NHPP are described in addition to the main assumptions (Ross, 2007).

Poisson Process
In order to introduce a stochastic process known as Poisson Process and then to derive the model formula, the following are introduced: 1.1.1Counting Process A stochastic process {(),  ≥ 0}, () indicates the total number of occurrences that take place by time t, is a counting process if the following are satisfied: 1. () ≥ 0; 2. () is integer valued; 3.If  < , ℎ () ≤ (), ,  > 0 4. For  < , () − () represents the total number of occurrences that take place in (, ]. The Poisson process is considered as the most important types of counting process but the weakness for this process is its assumption that events are just as likely to occur in all intervals of equal size.A NHPP is a generalization, which reduces this assumption, and represents a nonstationary process for the number of events that occur by time t with intensity function denoted by ().This model allows predicting the number of event occurrences at any time interval with a specific time length.These processes are defined below: Definition 1 A counting process {(),  ≥ 0}, is called a Poisson Process with rate of occurrence  > 0 if the following are satisfied: 1. (0)= 0; no occurrence for the initial state; 2. {(),  ≥ 0} has independent increments, or the number of events that appear to occur in the disjoint time intervals are independent; 3. {[( + ℎ) − ()] = 1} = ℎ + (ℎ); means that probability that one event will occur in a small interval depends on the length of the time interval and does not depend on the number of events that will occur outside the time interval (on the past history of the process).
From the above assumptions, the following theorem is derived: Theorem 1: If {(),  ≥ 0} is a Poisson process with rate of occurrence λ>0, then for all  ,  > 0, [( + ) − ()] is a Poisson random variable with mean λt.That is, the number of events in any interval of length t is a Poisson random variable with mean λt ( Ross, 2007).

Arrival and Interarrival Times:
For a Poisson process () with the rate of occurrence , if  1 is the time of the first event that appear to happen and   is the time between the (n − 1) and the nth event, then {   ,  = 1,2, … } is called the sequence of interarrival times and   ,  = 1,2, … ; these are independent and identically distributed exponential random variables having mean 1/λ.

Non-Homogenous Poisson Process:
In this section the NHPP, which is a generalization of the Poisson Process obtained by allowing the occurrence rate or intensity function at time t to be a function of t, is introduced: it is defined as follows: Definition 2 The counting process {(),  ≥ 0} is defined as NHPP having intensity function (),  ≥ 0 if the following are satisfied: In addition, the expected value for the number of occurrences through time t, which is known as the mean cumulative function denoted by (), is defined as () = ∫ ()  (Ross, 2007).

The intensity function
Let  () denote the number of events occur in the interval [0 , ], and that   () = , ( event occur in an interval (0, ]).The intensity function, which is a deterministic function for the number of occurrences that occur during an interval of length t and denoted by (), is the probability of occurrences in a small interval divided by the length of the interval; it is defined as: Thus, () can be interpreted as the rate of change in the expected number of occurrences; there will be many events occur over intervals on which () is large, and fewer occurrences over intervals on which () is small.On the basis of the above assumptions and definitions, for a NHPP, the probability that n events occur through the time interval (0, ] is given by the following: The probabilistic behavior of the NHPP is completely defined by the mean-value functions or the cumulative intensity function, which is interpreted as the expected number of events by time t, is defined as: Hence, the probability of exactly n events occurring in an interval (,  + ] is given by the following (Cinlar, 1975): Therefore, based on the above assumptions, a NHPP model can be defined such that it provides concepts that allow to build a probabilistic model concerning incidents or event occurrences over time.When the NHPP model is constructed, it is essential to estimate both the unknown parameters and the functional form of the intensity function; this function is usually determined based on the initial description of the data under investigation.
In this paper, for modeling the data under study, a NHPP, which is often suggested as an appropriate model when a system whose rate varies over time, is proposed.This model is characterized by an intensity function to describe the process changes over time.To estimate this function, the data under study is used: the data represents the number of earthquake occurrences happened in Iraq and its boarder from January, 1 st 2018 to May, 1 st 2023 .The following is the statistical analysis for the data under study with the model description for the data.

Statistical Analysis for the Data Source under Study
The following data, which is shown by Table 1, provides information relevant to the number of earthquake occurrences happened in Iraq and its border from January, 1 st 2018 to May, 1 st 2023; the data is reported by the Iraqi Meteorological Organization and Seismology (IMOS).
Table 1: The number of earthquakes in Iraq and its border from January, 1 st 2018 to April, 1 st 2023.
Table 1 shows the number of occurrences for the years 2018 to May, 2023; it is clear that earthquake occurrence in Iraq, which is classified with medium hazards, are likely to take place irregularly as the geographical location of Iraq is the reason that only the eastern and the northeast parts of the country are directly affected by the seismic activity of the Tauros-Zagros structural zone; therefore, the number of events occurring is limited and the event is not considered with high risk.Furthermore, the data show that the total number of occurrences for 2022 is higher compared to the other periods and in 2020 the least number of occurrences has been recorded.In addition, the following figure, Figure 1, illustrates the frequency distribution for the data set under study.
Figure 1: The frequency distribution for the number of earthquakes occurred in Iraq.
Moreover, based on the intensity function, the data set is explained by the following figure, Figure 2.
Figure 2: The number of earthquakes with the intensity function.
Therefore, based on the background of the problem described, this study applied a NHPP to count the number of earthquakes occurring in Iraq and then predict their probability.The aim is to propose a model with the smallest set of variables which provides an adequate description of the data.In this paper, for modeling the dataset under study, a NHPP is proposed.It is described below.

The Proposed NHPP Model
In this section, to study the number of earthquakes that occur in Iraq and its boarders over a time interval say (0,T], a NHPP is proposed.As the intensity function for this process plays a crucial role in characterizing the probabilistic behavior of the NHPP, the main concern is the identification of the functional form of the function, which may take various forms, and then the estimation of its parameters; the parameters are usually estimated based on the initial description of the data under investigation. If it is assumed that a counting process {(),  ≥ 0}, which counts the total number of events that have occurred up to time T, takes values on {0,1,2,…}, each value represents the number of earthquakes occurrences in a time interval (0,T].Due to the nature of these occurrences, preassumption that it is a NHPP with some parameter, () > 0 seems to be justified.
Therefore, a NHPP is proposed to model the number of earthquakes that occur over time interval in Iraq and its boarders; it is characterized by a deterministic intensity function () describing how the occurrence rate changes up to time  , where the time is measured on a daily scale.Hence, ((( + ) − ()) , which defines the expected value of increment for the process, can be determined from equation ( 7).These rules can be implemented empirically if the intensity function () > 0 is known.To define this function, the data from the annual reports of the IMOS from January, 1 st 2018 to May, 1 st 2023 is used.The statistical analysis of the data, which is described by Table 1, shows that the intensity function () can be approximated by the simplest deterministic mathematical function known as the simple linear regression defined as: Hence, the constructed model allows predicting the number of event occurrences at any time interval with a time length t when the parameter values are estimated.Numerous estimation methods have been used to predict parameters; the least squares method is a standard approach, which is used in this study, which is described below.

Parameter Estimation for the Model:
Consider a NHPP with intensity function (), which is assumed to be nonnegative, defined by equation ( 9) over a fixed interval (0, ], 0 <  1 <  2 ,• • • , <  −1 <   ≤  .The objective is to estimate parameters for the function λ(t) using the least squares.The concept of the least squares, which is a method for estimating the parameters of a statistical model, fitted to sample data by minimizing an appropriate sum of squared estimation errors.The empirical intensity function is approximated by the Simple Linear Regression Model described by equation ( 9) that satisfied the following: Then, the above equation is minimized to estimate each of  ,  (Singh and Masuku, 2013).
the least squares estimate of the intercept  and  of the true regression line is given by: Hence, the predicted intensity function is defined as Then, by using the annual data reported by the IMOS in Iraq, which is explained by table 1, then by applying equation ( 11), the estimates for the two parameters are:  ̂= 0.59,   ̂ = 0.00006 Thus, by using the data set given by Table 1 applied to equation ( 12), the following model is resulted: The following figure, Figure 3, shows the fitted model and the data representation for cumulative intensity function:

Simulation
In this section, to assess the model fitting, a simulation study is conducted.The cumulative number of occurrences () for the NHPP is generated by considering equally spaced intervals over the time range (0, ]; the process of generating random variables is repeated for 30 times.Then, the same procedure is applied to different forms of intensity function (); each with different values of   .Finally, based on the results obtained from implementing the simulation procedure each of    are estimated using equation ( 11), which is an estimate procedure for estimating parameters for regression model that describes the behavior of the intensity function.The proposed algorithm is different than the rest of algorithms written for simulating a NHPP (Lewis and Shedler, 1979); equally spaces intervals are considered so there is no need to simulate time intervals but it generates the number of occurrences that occur in each interval.For this algorithm, Matlab code is used and then to estimate (), SPSS (Statistical Package for Social Sciences) is implemented.The algorithm describing the main steps for the simulating a NHPP is introduced below.
4.1 Generating the number of occurrence for the NHPP Assume that N(t) is the cumulative number of occurrence for a NHPP with intensity function described by () =  +  for  < , then to simulate N(t) the following algorithm is defined:

Simulation results and discussion
The following figures, Figure 4, 5, 6, show the implementation for the algorithm for simulating the number of occurrences for a NHPP with intensity function () =  +  , for different values of T and , .Thus, all these figures show that the estimated intensity functions () , which are resulted from simulation process for different values of  and , can be approximated by a simple linear regression model.
The following

Modeling the number of occurrences based on real data and predictions
Hence, the cumulative number for EO in Iraq and its boarder is determined using equation ( 12) and then the linear intensity function for EO is obtained as: ̂() =  ̂+  ̂ = 0.59 + 0.00006 When equation ( 3) is applied to the data set, the (()) by time t, which is defined as (), is () = ∫ ( α ̂ +  ̂)  0 = α ̂ +  ̂2 , with α ̂= 0.59;  ̂= 6 * 10 −5 Hence, () = 0.59 + 0.00006 2 (14) Therefore, modeling the number of earthquakes that occur in Iraq can be considered as a NHPP with parameters () = 0.59 + 0.00006 2 for  > 0; and from equation (2): ) which implies that for the data under study this becomes: Then, from equation ( 5), the number of earthquakes at a time interval with the length of the interval , can be predicted as follows: Pr {((( + ) − ()) = } = ( ( + ) − ())   −(((+)−()) /!For example, the expected number of earthquakes that will occur from 1 to 7 November 2023, as in equation ( 7), can be anticipated as follows: For the data set under study, the time interval [1915,1945) was reported as the last interval for the data up to April 2023, then the time interval for 1 to 7 December 2023 will be [2160,2190).
Then, the length of the interval is 6 at time  = 2160.Thus, by implementing the proposed model defined by a NHPP with () to the data under study, the expected number of earthquakes that will occur in any time interval can be predicted.The above application is an example that in the first week of December 2023, it is expected that about 4 earthquakes will occur in Iraq and its border.

Conclusion
The NHPP is widely used to model the counts of event occurrences.The major feature of Non-Homogeneous Poisson Process (NHPP) is that the occurrence of event is allowed to change with time.Thus, the occurrence rate of events is a function of time λ(t).In this paper a NHPP model has been used to model the number of earthquakes occur in Iraq by defining the functional form for λ(t) and then estimating its parameters.To estimate model parameters, the data obtained from the annual reports of the Iraqi Meteorological Organization and Seismology (IMOS) from January, 1 st 2018 to April, 1 st 2023 is used.Moreover, a simulation study is performed to show the performance and the applicability of the proposed model.Finally, theoretical results are applied to predict the expected number of earthquakes that may occur in Iraq and its boarder.

Figure 3 :
Figure 3: The observed and expected fit for the interval time and cumulative Intensity.

Table 2 :
table, Table 2, displays the results for generating () for the time interval: 0 <  < 150, for different functional forms for (), (with different values for  and  ); from the regression model, based on simulated data, each of    are estimated.Finally, the mean square error (MSE) for the observed and estimates model is obtained.Parameter estimation based on simulated data with Mean Square Error