Gradient angle-based analysis for spatiotemporal point processes

Spatiotemporal point processes (STPPs) are important in modeling randomly appeared events developed in space and time. Statistical methods of STPPs have been widely used in applications. In all of these methods, evaluations and inferences of intensity functions are the primary issues. The present article proposes a new method, which attempts to evaluate angles of gradient vectors of intensity functions rather than the intensity functions themselves. According to the nature of many natural and human phenomena, the evaluation of angle patterns of the gradient vectors is more important than the evaluation of their magnitude patterns because changes of angle patterns often indicate global changes of these phenomena. This issue is investigated by simulation studies, where significant variations of gradient angle patterns are identified only when modes of intensity functions are changed. To study these phenomena, the article proposes an analysis method for gradient angles of the first-order intensity function of STPPs. The proposed method is used to analyze aftershock earthquake activities caused by great mainshock earthquakes occurred in Japan 2011 and Indian Ocean 2004, respectively, where a significant global change in the second case is identified. AMS 2000 subject classifications: Primary 62M30; secondary 62G05, 62G20.


Introduction
The goal of the research is to develop a gradient-based approach to spatiotemporal point processes (STPPs) which can be used to describe the global trend of point occurrences. To develop the approach, it is important to have a statistical way which can characterize the tendency of gradient vectors of the intensity functions. It will point out that the usage of directions of gradient vectors is more important than the usage of their magnitudes. Based on this point, we propose a new statistical approach to gradient angle patterns of the first-order intensity function of STPPs. An important assumption is the angle invariant condition, Gradient angle for STPP 4425 which means that the angle (or the direction) of gradient vectors of the firstorder intensity function in the whole study area is time invariant. The angle invariant condition is useful in modeling many natural or human phenomena. For instance, the frequently used ETAS (epidemic type aftershock sequences) model for aftershock earthquakes [26,40,27] assumes that the occurrences of aftershock earthquakes are symmetric about their mainshock earthquake center, which satisfies the angle invariant condition. A similar issue has also been pointed out in infectious disease studies since the spread of an infectious disease is roughly symmetric about the original occurrence center [24]. In these examples, the direction of the tendency is more important than the magnitude since the change of directions represents a global change while the change of magnitudes only represents a local change. This issue has been justified in our simulation studies. We find that gradient angle patterns have significant changes only when the mode of the intensity functions has been changed. A mode change is often caused by the change of the intensity functions in the entire study region. Therefore, the statistical inference on angle patterns of gradient vectors of intensity functions is more important than the inference on the whole gradient vectors themselves. This motivates us to study gradient angle-based analysis methods in our research.
An STPP is a collection of random locations of points appeared in their spatiotemporal domain. Specific applications in the development of statistical approaches to spatial point processes (SPPs) or STPPs have appeared in many disciplines, including forestry [17], forest wildfires [28,31,38], earthquakes [19], and spatial epidemiology [11]. Many concepts and methods for SPPs have been previously proposed. Examples include the K-function [29], the L-function [9], the pair correlation function [34], stationarity tests [15,39], nonstationary analyses [1,4], likelihood and composite likelihood analyses [8,32,36], first-order analyses [17,37], and second-order analyses [16]. However, none of them have studied gradient-based properties of spatial or spatiotemporal point patterns. The present article is the first one considering such an issue.
Although we only focus on STPPs, we believe that gradient-based methods are also important in the analysis SPPs. Gradients belong to the first-order properties. They describe changes of the first-order intensity function locally. However, gradient angles can also provide the global trend. According to insights from our simulations for STTP to be displayed in Section 5, we conclude that nonparametric estimation of angle patterns is more reliable than that of magnitude patterns in terms of the bandwidth selection. Understanding of angle patterns is more critical than understanding of magnitude or entire intensity patterns. We investigate possible impacts of our methods on invasive species problems under the framework of SPPs. In ecology, the invasive species problem describes the spread of a certain plant or animal that is not native to a specific region, which is believed to cause damage to the local environment or human society [2]. If locations of invasive species are provided by a spatial point pattern, then its magnitude pattern only provides the local trend of the invasion but the angle pattern can also provide the source center. Thus, the analysis of angle pattern is more important than the analysis of magnitude pattern.
We have encountered many new challenges in the development of methods for gradient patterns of STPPs. Our proposed method is different from those in the literature in a variety of ways. First, in most articles of STPPs in the literature, the interest is to investigate the mean and covariance structures, which are often described by the first-order intensity and pair correlation functions. However in our method, the interest is to investigate the gradient patterns of STPPs, which can be further specified by the magnitudes and angles of gradient vectors. Second, unlike the scenarios considered in the classical methods, the direction of a gradient vector may not be available in a stationary process since the gradient of the first-order intensity function is zero everywhere. Thus, our method is developed for a nonstationary process only, implying that it is not recommended analyzing stationary spatiotemporal point patterns. Third, for nonstationary spatiotemporal point patterns, the change of angle patterns of gradient vectors of intensity functions is more interesting than the only change of magnitude patterns since the previous one represents a global change while the latter one represents a local change. Our simulations support the conclusion that slight local changes of intensity functions do not affect the gradient angle patterns significantly. Therefore, we can use our method to study whether there is a global change. It is possible for us to conclude the existence of a global change if gradient angle patterns are significantly changed. Since the approach relies on the derivation of angles of gradient vectors, we propose a kernel-based estimator to compute them. We provide asymptotic properties of our estimator. We carry out simulations to study its performance. Our method is applied to study the gradient patterns of aftershock earthquakes in the 2011 Japan Great Tohoku Earthquake data and the 2004 Indian Ocean Great Sumatra-Andaman Earthquake data. We conclude that the angle patterns are almost invariant in the Great Tohoku Earthquake data but not in Great Sumatra-Andaman Earthquake data.
The statistical method proposed in this article is motivated from previous gradient-based techniques which have been extensively used in image processing and pattern recognition. Overall, estimation of the gradient pattern of an image is a fundamental task in any field of object recognition, where the earliest work appeared about forty years ago [12]. After that, gradient-based methods have been extensively used in pattern recognition [25]. Basic gradient-based criteria, which are detection, localization, and uniqueness of recognition, have been proposed [5]. It is well acknowledged that images can be easily visualized if a gradient-based method is used. For example, images can be segmented into visually sensible regions by finding watershed regions in a gradient magnitude image [23]. Multiscale analysis of intensity extrema in the gradient magnitude image can be used to define a hierarchy on the corresponding watershed regions [13]. In addition to pattern recognition, gradient estimation has been investigated in nonparametric statistics [6], where the main interest is estimation of partial derivatives (i.e., gradients) of a smoothness function but not the smoothness function itself. Although gradient-based approaches have been used in other fields, this kind of approaches has never been considered in the literature of spatial or spatiotemporal point processes.
The rest of the article is organized as follows. In Section 2, we propose a few important concepts and properties of the first-order intensity function of STPPs based on its gradient angles. In Section 3, we propose nonparametric kernel estimation of angles of gradient vectors of the first-order intensity function. In Section 4, we provide the asymptotic properties of our estimator, which includes the consistency, the asymptotic normality, and the bandwidth selection. In Section 5, we carry out a simulation study to evaluate the properties of our estimator. In Section 6, we apply our method to study the angle patterns of the Great Tohoku Earthquake data and the Great Sumatra-Andaman Earthquake data. In Section 7, we provide a discussion. The proofs of our asymptotic properties are displayed in the Appendix.

Gradient angles of the first-order intensity function
Let N be an STPP on D×T , where D ∈ B(R 2 ) is the spatial domain, T ∈ B(R) is the temporal domain, and B(A) represents the collection of Borel subsets of A. Let N (A×B) be the number of points in A×B, where A ∈ B(D) and B ∈ B(T ).
. Throughout this article, we assume that D is a bounded twodimensional manifold with |∂D| = 0 and T is a bounded interval, where ∂ is the boundary operation and |A| means the Lebesgue measure of A in R 2 . For convenience, we assume T = [0, T ] with T ∈ R + and write s = (x, y) and s i = (x i , y i ) for any s, s i ∈ D.
Let the kth-order intensity function of N be denoted by λ k ((s 1 , t 1 ), · · · , (s k , t k )), k ∈ N + . Write the first-order intensity function of N as λ(s, t) = λ 1 (s, t). For any A, A 1 , A 2 ∈ B(D) and B, B 1 , be the gradient of the first-order intensity function projected to the spatial domain. If ∇ s λ(s, t) > 0, then there exists a unique function θ(s, t) ∈ [−π, π) such that The function θ(s, t) represents the spatial angle patterns of λ(s, t), which can be used to describe the directions of the gradient vectors of λ(s, t) in the spatial domain.
It is clear that θ(s, t) is not well-defined if ∇ s λ(s, t) = 0, which is attained if (s, t) is a spatial stationary point of λ(s, t) (i.e. point (s, t) satisfying ∇ s λ(s, t) = 0). We say spatial stationary points of λ(s, t) are isolated if there exists a δ > 0 such that s 1 − s 2 > δ for any distinct spatial stationary points (s 1 , t) and (s 2 , t) of λ(s, t). If spatial stationary points of λ(s, t) are isolated, then for a given interior point s 0 ∈ D we can define θ(s, t) for any s = s 0 in a sufficiently small neighborhood of s 0 , indicating that gradient angles of an STPP are almost surely available. We do not expect that our method can be used if λ(s, t) is only a function of t in the entire spatial domain. As ∇ s λ(s, t) = 0 everywhere, we cannot define θ(s, t) at any s ∈ D. Therefore, our method is developed for a spatially nonstationary STPP. Letλ(s) andλ (s) be the first-order intensity functions of two SPPs. Letθ(s) andθ (s) be the angle functions of ∇λ(s) and ∇λ (s), respectively. We say the two functionsθ andθ are equal ifθ(s) =θ (s) at any interior point s ∈ D when none of ∇λ(s) and ∇λ (s) is zero. We define that the angle functions of ∇λ(s) and ∇λ (s) are equal at s if one of ∇λ(s) and ∇λ (s) is zero. Then, we provide the following definition. If stationary points ofλ(s) and ∇λ (s) are isolated, then they do not affect the concept of similarity given by Definition 2. This provides the following theorem. Theorem 1 provides a way to study the angle invariant property of the firstorder intensity function of an STPP. It is useful if an STPP only has one or a few stationary points. An example is the case when λ(s, t) is derived from an ETAS model (to be discussed in Example 3 below). Since λ(s, t) only has one stationary point, which is its mode, the angle functions can be studied at any points other than the mode. It is expected to see similar patterns of the firstorder intensity function if the STPP is compared among different time periods. We provide a few examples to illustrate the issues discussed in Definition 2 and Theorem 1.
Example 1 (First-order separable model). If the STPP N has a separable firstorder intensity function such that λ(s, t) = λ s (s)λ t (t) for any (s, t) ∈ D × T , then ∇ s λ(s, t) = λ t (t)∇λ s (s). Therefore, θ(s, t) does not depend on t implying that N is angle invariant.
Example 2 (Condensed model). Assume the first-order intensity function of N can be written as λ( where f is a bivariate function and F is a positive bivariate function. Then, where F 1 is the first component of the partial derivative vector of F . Therefore, θ(s, t) does not depend on t implying that N is angle invariant. A few interesting cases can be derived. If f (x, y) = x 2 + y 2 and F (u, v) is decreasing in u, then an isotropic model is derived. If f (x, y) = ax 2 + 2bxy + cy 2 with b 2 − ac < 0 and a, c > 0, then an elliptical model is derived. Both models are angle invariant.
Example 3 (ETAS model). The ETAS model is a spatiotemporal marked point process model representing the activities of earthquakes of a certain magnitude and larger in a region during a period of time [26,27,40]. The model includes background activities and aftershock clusters. Events in each aftershock cluster are independently produced by the mainshock earthquake. The size of the aftershock cluster is related to the magnitude of the mainshock earthquake. The ETAS model is composed of the background rates and the intensity functions derived from all of the mainshock earthquakes and their aftershock clusters. If there is only one extremely large mainshock earthquake with magnitude M * at space-time location (s * , t * ), then the first-order intensity function of the locations of aftershock earthquakes can be approximately expressed as It is clear that λ(s, t) given by (2.4) is angle invariant. Model (2.4) is useful in the study of aftershock earthquake activities produced by an extremely large earthquake. The reason is because the impact of other mainshock earthquakes and the impact of background earthquakes can be roughly ignored within a few months after its occurrence. The ETAS model will be used with more details in Section 6.

Estimation
We propose a nonparametric kernel method to estimate θ(s) for an angle invariant STPP. Suppose that observations of N are collected in D × T and denoted by (s 1 , t 1 ), · · · , (s n , t n ), where n = N (D × T ) is the total number of events. To estimate λ(s, t), we can use the classical Berman-Diggle estimator [4,10] aŝ According to Theorem 1, it is not necessary to consider a full spatiotemporal kernel function in estimation of θ(s). Here, we recommend considering estimation of the first-order intensity function of N T , which is denoted by λ T (s). We use a purely spatial full symmetric kernel in where C h (s) = D K h (s−u)du is the edge correction. Using notation of Stochastic integrals, (3.2) can be equivalently expressed aŝ In the following, we always use u to represent the integration variable for space and v to represent the integration variable for time. We treat (3.3) as an estimator of the first-order intensity function of N T , which will be used to derive an estimator of θ(s). To get the estimator, we consider the gradient ofλ h,T (s), denoted by ∇λ h,T (s), which is composed of The expected vector and the covariance matrix of ∇λ h,T (s) are important in the evaluation of the properties ofθ h (s), where the method of Stochastic Integrals in Campell's theorem is useful. Taking the expected values of (3.4), we have where u = x if j = 1 and u = y if j = 2. Taking the variances and covariance of (3.4), we have Equation (3.8) can be used to compute the whole variance-covariance matrix of ∇λ h,T (s) since it represents a formula of the variances if The functionθ h (s) given by (3.6) is treated as an estimator of θ(s), which is well-defined if ∇λ h,T (s) > 0. It is clear thatθ h (s) depends on the bandwidth h and the kernel function K(s), which does not rely on T . However, it does not mean that the temporal structure of the spatiotemporal point pattern is not important inθ h (s). According to (3.8), the temporal structure affects the properties of ∇λ h,T (s) and therefore it also affects the properties ofθ h (s). Based on our asymptotic results to be displayed in Section 4, we find that the temporal dependence structure has significant impacts on the bias, the variance, and the optimal bandwidth selection inθ h (s).

Asympotics
We provide the asymptotic properties ofθ h (s) under T → ∞ with a bounded connected D in this section. The asymptotic properties include the consistency, the asymptotic normality, and the optimal strategy for the selection of h in (3.6). The derivation is based on the properties ofθ h (s), where the asymptotic properties of ∇λ h,T (s) are important. We use P → to represent convergence in probability and D → to represent convergence in distribution. To derive the asymptotic properties, the most important regularity condition is the strong mixing condition, which was first proposed by [30] and then followed by many other authors [18,20]. The strong mixing condition has later been used for asymptotics in spatial point processes [21]. Since D is bounded, we modify the definition of the strong mixing condition such that it can be interpreted by where 0 < v < T . The coefficient of strong mixing condition introduced by [30] then can be modified as To derive the asymptotic properties, we need the following regularity conditions.
(C1) N is simple and angle invariant with uniformly bounded first-order, secondorder, third-order, and fourth-order intensity functions. (C2) The function λ s (s) = lim T →∞ T −1 T 0 λ(s, t)dt positively exists and is third-order continuously differentiable at every interior point s ∈ D.
(C3) There exists a u > 0 such that (C5) There exists a β > 2 and γ = 2/β such that lim sup a∈R The kernel function K is full symmetric and satisfies lim u→∞ u 2 K(u) = 0.
Condition (C1) concerns the properties of the first-order and second-order intensity functions, which can be easily understood. To derive the asymptotics ofθ h (s), we need the functional central limit theorem theory for spatial point process. It can be established by Corollary 1 of [18]. The condition that λ 4 exists and is bounded upto k = 4 is proposed to apply the corresponding conclusion. If Condition (C2) also holds, then based on Theorem 1 there exists a function λ (s, t) such that for any interior point s ∈ D there is Condition (C3) states that N is weakly dependent, which is strengthened by Condition (C4). The connection between (C3) and (C4) is given in Lemma 1 below. Condition (C5) is the usual mixing condition for functional central limit theorems with a weakly dependent structure [18]. For the kernel function, if Condition (C6) holds, then the term related to ∂C h (s)/∂x and ∂C h (s)/∂y in (3.7) and (3.8) can be ignored in asymptotic studies.
is uniformly bounded.
Lemma 1 states that the right-hand side of the equation given by Condition (C4) is bounded, implying that ϕ(u, u ) is well-defined if the limit exists. Let where z = x if j = 1 and z = y if j = 2, and Using (C2) and (C3), there are and If s is an interior point of D, then for any fixed h there is as T → ∞. and (4.9) Then for any fixed h, there is as T → ∞. and It is important to provide the best strategy for the selection of the bandwidth h used in the estimator. Here we consider the usual mean integrated squared error (MISE) ofθ h (s) criterion which is defined as (4.14) The best strategy for the selection of h is derived by minimizing MISE(θ h (s)). as T → ∞.

Simulation
We carried out simulation studies to evaluate the advantage of our gradient angle-based analysis method comparing with traditional intensity-based analysis methods. We wanted to address three basic issues. In the first, we wanted to study the sensitivity ofθ h (s) to the bandwidth h. As the choice of the bandwidth is critical in estimating intensity functions, it is important to study the impact of the values of h on estimates of gradient angles. This was evaluated within individual realizations. In the second, we wanted to study the reliability ofθ h (s) in the entire study region. We wanted to know whether the pattern ofθ h (s) was reliable to the change of intensity functions. We designed a mode change to represent the global change of intensity functions. This was evaluated between individual realizations. In the third, we evaluated the precision ofθ h (s). We comparedθ h (s) with its true values. This was conducted via a large number of simulation repetitions.
We evaluated the performance of ∇λ h,τ (s) displayed by (3.6) via simulation studies. We simulated realizations from Poisson STPPs and Poisson cluster STPPs on R 2 × [0, T ] with T = 1, 2, 5, 10. We considered these processes due to their popularity in modeling geological and ecological data. In both processes, we chose λ(s, t) = κφ[(s − s 0 )/σ 0 (t)]/σ 2 0 (t) with κ = 1000T , where φ(s) was the standard bivariate normal density, σ 0 (t) = 0.5 + 0.2(t/T ) was a time variant scale parameter, and s 0 was either fixed or varied to represent the gradient center of ∇ s λ(s, t) or the spatial mode of λ(s, t) equivalently. If s 0 was fixed, then λ(s, t) depended on time but angles of its gradient vectors projected to its spatial domain did not, implying that λ(s, t) was angle invariant; otherwise, λ(s, t) was not angle invariant. To study the impact of varied s 0 on our approach, we divided the time domain into two periods. We fixed s 0 at distinct locations in each of the two periods. This reflected the case when a sudden event occurred at the beginning of the second period (or the end of the first period) which significantly changed the global pattern of the point occurrences. It was similar to the issue to be discussed in Great Sumatra-Andama earthquake data set in Section 6. We chose κ = 1000, 2000, 5000, 10000 such that the total expected number of points were equal to 1000, 2000, 5000, and 10000, respectively.
In order to generate a Poisson STPP, we first generated the total number of points n from a Poisson random variable with expected values equal to κ. We then generated temporal locations t 1 , · · · , t n uniformly on [0, T ]. For every t i , we generated the spatial locations s i independently from φ[(s − s 0 )/σ 0 (t i )]/σ 2 0 (t i ). A Poisson STPP was then derived. In order to generate a Poisson cluster STPP, we first generated the total number of parent points n 0 from a Poisson random variable with expected values equal to κ/k. We then generated parent temporal locations t 1 , · · · , t n0 uniformly on [0, 1] as well as their spatial locations s 1 , · · · , s n0 independently from φ[(s − s 0 )/σ 0 (t i )]/σ 2 0 (t i ). After parent points were derived, we independently generated offspring points around parent points, where each parent point had the P oisson(k) number of offsprings and the spatiotemporal locations of offspring points to their parent points were determined by a Gaussian distribution with the standard deviation equal to σ. We chose k = 4 and σ = 0.01 in our simulation.
At the beginning of our simulations, we evaluated the performance of the classical estimator of intensity functions based on simulated realizations. We ignored the time of occurrences and attmpted to estimate λ T (s) byλ h,T (s) using (3.2) for each individual realization. We chose K h (s) as the density of the independent bivariate normal distribution with both means equal to 0 and both variances equal to h 2 . We used various options of h to study the impact  (Figure 1). For distinct realizations, values ofλ h,T (s) significantly changed even if values of h were not changed (not shown). Although their values varied greatly, the circular shapes in the contour plot ofλ h,T (s) were not significantly affected by h. The entire pattern was extremely reliable among distinct realizations. Based on the findings, we guessed that the gradient angles of intensity functions might not be sensitive to the bandwidth h although it could significantly affect estimates of intensity functions. This motivated us to study the performance of estimated gradient angles rather than estimated values of intensity functions.

of the bandwidth onλ h,T (s). For a given h, we studied variations ofλ h,T (s) by comparing the values between different realizations. We found that for each individual realization values ofλ h,T (s) highly depended on the choices of h
The performance ofθ h (s) given by (3.6) were evaluated in two situations. In the first situation, we assumed s 0 was always fixed at (0, 0), implying that λ(s, t) was angle invariant in the entire time domain. In the second situation, we partitioned the time domain into two periods. The first period was formed by time within [0, T/2], where s 0 was fixed at (0, 0). The second period was formed by time within (T/2, T ], where s 0 was fixed at (0.5, 0). This situation reflected the case when a sudden event occurred at t = 0.5T , which moved the gradient center of ∇ s λ(s, t) or the spatial mode of λ(s, t) from (0, 0) to (0.5, 0). We used all of the point occurrences in the computation ofθ h (s) when s 0 was always fixed at (0, 0) ( Figure 2). The values ofθ h (s) contained the length and the direction information of the gradient vectors. It showed that the lengths of the gradient vector were significantly affected by the choices of h but the angles were not. This was consistent with our previous findings obtained from Figure  1. To study the reliability ofθ h (s), we compared the gradient angle patterns among different realizations in our simulations. We did not find any significant changes visually.  We studied the performance ofθ h (s) according to the two time periods when s 0 varied (Figure 3). In the first time period, we computed the values ofθ h (s) using points with t i ≤ 0.5T . In the second time period, we computed the values ofθ h (s) using points with t i > 0.5T . We had identified significant changes in the comparison of gradient angle patterns between the two periods. The gradient center was almost fixed at (0, 0) in the first period or at (0.5, 0) in the second period, indicating that a global change of point occurrences was identified in the data. Gradient angles were not sensitive to the choices of h in both periods, which were also reliable in the comparison of patterns between distinct realizations. Based on our findings in the two situations, we further studied the performance ofθ h (s) if points were partitioned into more than two time periods (not shown). When s 0 was fixed at (0, 0), as the case considered in the first situation, we found that the angle patterns were almost identical among all of these periods. If s 0 varied from (0, 0) to (0.5, 0), as the case considered in the second period, the angle patterns had significant changes as time approached to 0.5T . After that, no significant changes were founded. Therefore, we conclude that the usage of gradient angle patterns can be more efficient in the study of the change of global patterns of spatiotemporal point data.
Although we have only displayed the results for Poisson STPPs, we also studied similar issues for Poisson cluster STPPs. We did not found any significant differences in our studies for Poisson cluster STPPs. Therefore, we decided not to show the results. In addition, we used a data driven approach to study the robustness of our findings. We carried out the cross-validation approach for individual realizations. We did not find any significant changes of the patterns ofθ h (s) in the approach. Based on our studies, we conclude that the findings reflected by Figures 2 and 3 should be general in all types of STPPs.
In the end, we evaluated the accuracy ofθ h (s) when s 0 was fixed at (0, 0), indicating that the issue was considered only for an angle invariant λ(s, t) in the entire temporal domain. We computed the angle difference between the true and the estimated angles at each point in the study area using where θ(s) always pointed to the spatial origin based on our simulation patterns. After values of d θ,h (s) were derived, we computed the mean integrated square error (MISE) of angle differences by integrating the squares of their values over the whole study area. We then compared the root MISE values for selected bandwidth h based on 10,000 simulation replications (Table 1). We found that the MISEs of the Poisson cluster STPPs were a little bit larger than the MISEs of the Poisson SPPs. The MISE decreased as κ increased. The change of MISEs was not large in all of the cases that we had studied.
In summary, we found that angles of gradient vectors of the first-order intensity function were not sensitive to the choice of the bandwidth. Gradient angle patterns were extremely stable among different realizations. We did not find any siginificant changes if the mode of the first-order intensity function did not vary. Estimates of gradient angles were more precise than estimates of intensity functions. Therefore, it is important to pay more attention to the change of angle patterns in practice.

Applications
We expected that our method would have wide applications in natural hazard studies. Earthquakes are considered as the most important natural hazard events. They often result in numerous deaths and damages. This motivated us to apply our method to earthquake data. Many sources of earthquake data have been established and available via internet and can be downloaded for free. Examples include the websites of the United States National Geophysical (USGS) data center, the Northern California Earthquake Data Center (NCEDC), the GeoCommunity data center, and many others. The database contains dates, depths, longitude, latitude, and magnitude of earthquakes at the regional or global level from thousands of years ago to recent years.
An important issue in the analysis of earthquake data is the problem of earthquake clusters caused by aftershocks. The presence of earthquake clusters often makes it hard to understand earthquake activities. A number of statistical methods have been proposed for earthquake clusters. One of the most important model is the epidemic-type aftershock sequences (ETAS) model [26,27,40]. The ETAS model, which is defined by a conditional intensity function only affected by ancestors but not offsprings, has been widely used for earthquakes [3,7,35]. The ancestors may be understood as mainshock earthquakes, which are described by their spatiotemporal locations and corresponding magnitudes. If an extremely large mainshock earthquake occurs, then within a short time period the ETAS model is primarily dominated by its aftershock patterns, which implies the impact of the other mainshock earthquakes can be ignored. Let the magnitude and the spatiotemporal location of the extremely large mainshock earthquake be denoted by M * and (s * , t * ) = (x * , y * , t * ), respectively. Then, the conditional intensity function [27,40] can be approximately expressed as where μ(x, y) represents the background earthquakes, j(M ) satisfies ∞ 0 j(m)dm = 1, M represents the magnitude, and (s, t) = (x, y, t) represents the spatiotemporal location of aftershock earthquakes. By integrating out the magnitude term given by j(M ), if the point pattern is only considered within a short time period after the extremely large mainshock earthquake, then μ(x, y) can be ignored, which implies that the intensity function of the point pattern can be approximately expressed by (2.4) in Example 3 of Section 2, which is angle invariant.
To investigate whether a spatiotemporal point pattern of aftershocks of extremely large earthquake satisfies Model (2.4), we collected the historical earthquake data from the NCEDC website. The website contained global earthquake activities since 1900. Since our interests were earthquake patterns in this century, we collected earthquake data with magnitude greater than or equal to 4.0 from 2000 to 2014 from the website. The data contained 197 major earthquakes (≥ 7.0) and 16 great earthquakes (≥ 8.0). Since Model (2.4) was approximately derived for extremely large earthquakes based on the ETAS, we focused on great earthquakes in the data. We had two interesting and important cases. The first case was the Great Tohoku Earthquake which occurred in March 11, 2011 with magnitude 9.1 at 38.30 latitude North and 142.37 longitude East. The Great Tohoku earthquake was the most powerful earthquake ever recorded to have hit Japan [33], which caused over twenty thousand life lost. Importantly, the Great Tohoku Earthquake also produced a severe tsunami which caused serious nuclear accidents in Fukushima Nuclear Power Plants affecting hundreds of thousands of residents in a few thousands square kilometers area. Besides the Great Tohoku Earthquake, the second important case was the Great Sumatra-Andama Earthquake which occurred in December 26, 2004 with magnitude 9.0 at 3.30 latitude North and 95.98 longitude East. The Great Sumatra-Andama earthquake was the most powerful earthquake in the area of Indian Ocean near the coast of Indonesia in the recorded history [22]. The earthquake caused over two hundred thousand life lost in over fourteen countries. Most of those were killed by the tsunami triggered by the earthquake.
To investigate the earthquake pattern caused by the Great Tohoku Earthquake, we focused on aftershock earthquakes that occurred within an area from 30 latitude north to 45 latitude north and from 130 longitude east to 150 longitude east from March 11, 2001 to September 10, 2001, which contained a sixmonth time period of aftershock earthquake occurrences of the Great Tohoku Earthquake. The dataset contained 4543 aftershock earthquakes with magnitude greater than or equal to 4.0. There were 656 earthquakes with magnitude between five and six, 72 earthquakes between six and seven, and three earthquakes between seven and eight. The largest aftershock earthquake with magnitude 7.9 and the second largest aftershock earthquake with magnitude 7.7 occurred in the same day of the mainshock earthquake. After the first day, the largest aftershock earthquake with magnitude 7.1 occurred about 27 days later. Therefore, there were no extremely large (i.e. great) aftershock earthquakes in the Great Tohoku Earthquake dataset.
We partitioned the sixth-month time period into six 30-day or 31-day time intervals. Within each time interval, we estimated the gradient pattern of aftershock occurrences using ∇λ h,τ (s) given by (3.5), where the normal kernel with bandwidth equal to 100km was used. The bandwidth value was obtained by a cross-validation procedure with a number of candidate values. We also considered other bandwidth values and found that the patterns were similar. We compared the patterns of ∇λ h,τ (s)/n (Figure 4), where n was the number of aftershock earthquakes in the time period. We had n equal to 3087 the first, 563 in the second, 308 in the third, 217 in the fourth, 195 in the fifth, and 173 in the sixth time periods, respectively. We had several interesting findings in the comparison between these patterns: (a) the gradient centers of all of the six patterns were almost fixed at 37.5 latitude north and 142.5 longitude east,  March 11, 2011. which was about 90km away from the epicenter; and (b) the angles of all of gradient vectors considered in the study area almost kept constants but their lengths varied significantly. Based on our findings, we could roughly conclude that the patterns displayed by Figure 4 were consistent with the assumptions of the ETAS model.
To investigate the earthquake pattern caused by Great Sumatra-Andama earthquake, we focused on aftershock earthquakes that occurred within an area from 10 latitude south to 15 latitude north and from 90 longitude east to 105 longitude east from December 26, 2004 to June 26, 2005, which contained a sixmonth time period of aftershock earthquake occurrences of the Great Sumatra-Andama earthquake. The dataset contained 4381 aftershock earthquakes with magnitude greater than or equal to 4.0. There were 687 earthquakes with magnitude between five and six, 43 earthquakes with magnitude between six and seven, 3 earthquakes with magnitude between seven and eight, and one extremely large aftershock earthquake with magnitude 8.6 at 2.09 latitude north and 97.11 longitude east on March 28, 2005. We also partitioned the sixth-month time period into six 30-day or 31-day time intervals. Within each time interval, we also computed ∇λ h,τ (s) using (3.5), where the normal kernel with bandwidth equal to 100km were used. This was also obtained by a cross-validation procedure with a number of candidate values. We also considered other bandwidth values and found that the patterns were similar. Similarly, we compared the patterns of ∇λ h,τ (s)/n ( Figure 5). We had n equal to 1288 in the first, 964 in the second, 201 in the third, 1348 in the fourth, 343 in the first, and 237 in the sixth time periods, respectively. Based on Figure 5, we had: (a) the gradient center of the  December 26, 2004 andMarch 28, 2005, respectively. first three months was almost fixed at latitude 7.0 north and longitude 93.5 east, which was about 500km northwest of the mainshock earthquake epicenter; (b) the largest aftershock earthquake with magnitude 8.6 occurred three month later at the opposite direction of the first three month gradient center from the location of the mainshock earthquake, which was at about 184km southeast of the mainshock epicenter; and (c) the gradient center gradually moved back to the mainshock epicenter after the largest aftershock earthquake occurred and the two centers were almost identical a few months later. It was clear that the gradient patterns displayed by Figure 5 was inconsistent with the assumption of the ETAS model. The ETAS model assumes aftershock earthquakes occur symmetrically in space about the epicenter of the mainshock earthquake, but this was not true in the pattern displayed by Figure 5.
In summary, we concluded that the angle patterns of gradient vectors of the first-order intensity function was almost time invariant in the 2011 Great Tohoku Earthquake data but not in the 2004 Great Sumatra-Andaman Earthquake data. The possible reason was the existence of extremely large aftershock earthquakes a few months later, which appeared in the 2004 Great Sumatra-Andaman Earthquake data but not in the 2011 Great Tohoku Earthquake data. The extremely large aftershock earthquake reflected the unbalanced release of energy which significantly affected the earthquake aftershock pattern. The ETAS model would be appropriate in modeling the aftershock earthquake patterns if there were not extremely large aftershock earthquakes. Otherwise, other models should be considered. If the gradient center of aftershock earthquakes was far away from the epicenter, then it was possible that there existed an extremely large aftershock earthquake in the opposite direction of the gradient center. This is an important future research topic.

Discussion
We have proposed a gradient angle-based analysis method for spatiotemporal point processes (STPPs). The method is developed based on a gradient-based assumption, called the angle invariant assumption, for the first-order intensity function of STPPs. If the angle invariant assumption holds, then the angle of gradient vectors of the first-order intensity in the spatial domain does not vary in time but the length may be time varied. This concept is important in the study of many human and natural phenomena when clusters of point patterns are present. Because angle patterns of gradient vectors within clusters are more reliable than their length patterns as time varies, it is possible for us to capture the main feature of STPPs by studying patterns of gradient angles of its firstorder intensity function. The method based on gradient angles suggests that one should pay more attention to the angles rather than the lengths of gradient vectors for dynamics of spatiotemporal point patterns. If gradient angle patterns have significant changes, then a global change may appear.
Since only the first-order property is assumed, the choice of the second-order property is flexible. Both the first-order and the second-order intensity functions are important in applications. It is interesting to combine our method with other methods for analysis of the second-order properties. For instance, if the second-order intensity is also modeled by the second-order intensity-reweighted stationary or isotropic process [1], then both the first-order and the secondorder intensity functions are specified. The joint use of our gradient angle-based method and other statistical methods for second-order properties may provide efficient and effective approaches to understand the dynamics of spatiotemporal point patterns.
We expect that our gradient angle-based method will have wide applications in natural hazard and infectious disease studies. Many natural hazard phenomena can be described by spatiotemporal point processes with marks. The angle pattern of gradient vectors describes the tendency of changes of occurrences, which may depend on the magnitude (i.e. marks) of events. The usage of angle patterns is likely to provide solutions to visualize the linkage between different time periods of the occurrences of natural hazard events. For infectious diseases, the angle pattern is even more important since it can be easily interpreted as the way of the spread of a disease. We suspect that the way of infection is changed if a significant change of the gradient angle pattern has been found.
There are a few possible extensions to our approach. First, the gradient anglebased analysis method only needs that the angle of gradient vector is invariant in time. It does not provide any further methods to model the angle patterns. To make the model more precise, one can use a parametric method to model the angle pattern such that a parametric approach is derived. Second, although we have provided a kernel-based nonparametric approach to estimate the angle pattern, we have not considered other approaches such as the smoothing spline approach [14]. Therefore, many nonparametric approaches may be considered. Third, explanatory variables are usually important in statistics but the problem has not been considered in our approach. It is of interest to consider a way to incorporate explanatory variables in our model and this should be an important problem in future research.  Using the standard method of simple functions for Lebesgue Integral, we can show the left-hand side of (4.7) is normally distributed. Using (4.5) and (4.6), we derive the conclusion of the lemma.
Proof of Theorem 3: For any interior point s ∈ D, there are lim h→0 C h (s) = 1 and lim h→0 ∇C h (s) = 0. Using the integral by part in (4.5)