A cyclically adjusted spatio-temporal kernel density estimation method for predictive crime hotspot analysis

ABSTRACT This paper presents a new method for predictive crime hotspot analysis that further improves the kernel density estimation (KDE) method and the spatio-temporal kernel density estimation (STKDE) method by accounting for temporal crime cycles and is therefore termed the ‘cyclically adjusted STKDE (cSTKDE) method’. The case study on robbery incidents in Baton Rouge, Louisiana, shows a temporal cycle with a 6-month period of statistical significance from January 2010 to May 2018. This identified period is incorporated into the temporal kernel function of the new cSTKDE method. For validation, the Forecast Accuracy Index (FAI) and Forecast Precision Index (FPI) are used to evaluate the performance across 52 weeks in 2013. For 11 consecutive weeks since the beginning of 2013, the cSTKDE method outperforms the STKDE by 89% lower average abs(1-FAI) and 17% higher average FPI, and outperforms the KDE by 90% lower average abs(1-FAI) and 8% higher average FPI. Overall, the scenario with the best predictive accuracy by the cSTKDE is recommended over the traditional KDE or STKDE method as most feasible and effective in implementation of hotspot policing in practice.


Introduction
Predictive policing is the application of spatial analysis techniques to identify likely places and times and offenders and victims for police intervention and crime prevention (Perry et al. 2013). The fact that 50% of the crimes are found at three or four percent of the microcrime places in a city (Sherman et al. 1989;Weisburd et al. 2004) has led scholars and policy practitioners to focus much of their interests on what is called crime hotspots (Sherman & Weisburd, 1995;Weisburd & Braga, 2006). Crimes tend to cluster both spatially and temporally (Eck and Weisburd 1995;Ratcliffe 2010;Hu et al. 2018). That is, crimes are more likely to occur close to the same time and location that they have occurred in the past. This study focuses on predicting crime hotspots in times and places or spatio-temporal hotspots.
Many theories of criminal behaviour support the observation of crime hotspots. For example, the rational choice theory (Cornish and Clarke, 1986) states that a person considering committing a crime goes through the process of evaluating perceived risks, gains, needs, apprehension possibilities, punishment possibilities, and specific factors regarding the situation and target. The routine activity theory (Cohen and Felson 1979) argues that crime occurs in a specific environment when three elements converge: a motivated offender, a suitable target, and the absence of an authority figure or guardian to prevent a crime from occurring. That is to say, geographic and temporal characteristics impact the location and time at which those patterns appear. By identifying many of these patterns and factors through spatial and temporal analytics, we can curtail potential criminals' decisions to commit crimes with tactical interventions. This forms the theoretical foundation for predictive policing.
Among the approaches to forecasting places and times with an elevated risk of crime, Perry et al. (2013) highlighted two of the recent predictive analysis methods -advanced hotspot identification and spatiotemporal analysis. Both have great potentials for use in law enforcement. Crime hotspots are crime spatial clusters with an above average number of crime incidents or where people have a higher-than-expected risk of victimization (Chainey and Ratcliffe 2005). More recent advancements incorporate the temporal element into predictive crime hotspot analysis (Hu et al. 2018). The repeat and near repeat victimization literature provides a way to understand the importance of temporal dynamics of crime (Polvi et al. 1991;Farrell & Pease, 1993;Laycock, 2001). It is equally important to consider the temporal component of crime incidents and effectively identify time periods with raised crime risks.
Among the spatio-temporal models for crime hotspot prediction, Maciejewski et al. (2010) employed a standard epidemiological algorithm for time-series analysis to provide hints about when the outbreaks may occur in the process of detecting hotspots. Lukasczyk et al. (2015) proposed a novel visualization technique by applying the topological notion of Reeb graphs within the kernel density estimation (KDE) method to identify hotspots. Other researchers focused on improving the structure of KDE to account for the temporal dimension. Bowers, Johnson, and Pease (2004) proposed a prospective hotspot mapping method using spatial and temporal weighting functions. Brunsdon, Corcoran, and Higgs (2007) developed a spatiotemporal kernel density estimation (STKDE) method by introducing the temporal component to the traditional two-dimensional KDE. Nakaya and Yano (2010) mapped crime events in a three-dimensional space-time cube environment based on STKDE and scan statistics. Delmelle et al. (2014) used the same method to analyse disease patterns. Hu et al. (2018) restructured the STKDE using the generalized product kernels developed by Li and Racine (2007) and applied their model to study predictive crime hotspot mapping. Moreover, they incorporated the likelihood cross-validation to detect appropriate bandwidths (both spatial and temporal). They also developed a statistical significance test based on a null distribution of uniformly distributed random samples.
One notable omission in most spatio-temporal predictive crime hotspot studies is to incorporate the periodicity property of crime. Many studies have investigated the periodicity of different types of crimes as early as Quetelet (1842). Some recent works examined the relationship between several crime types (e.g. assaults and property crimes) and times of the year and searched for peaks and correlations with external factors. For example, Breetzke and Cohn (2012) investigated the seasonality of assault incidents across urban neighbourhoods and attributed the assault seasonality to the neighbourhood deprivation factor. Malik et al. (2014) incorporated seasonality and periodicity properties of crime to predict spatiotemporal patterns of future crime incidents. Linning (2015) explored the relationship between property crimes and weather variables. Notably, some previous studies used spectral analysis to identify the periodicity of crime. Rhodes et al. (2007) used Fourier transformations to model seasonal arrest patterns. Biermann et al. (2009) used Fourier analysis to explore associations between violent crimes and moon's phases but failed to establish any links. Breetzke (2016) employed Fourier analysis to identify periodic peaks of violent and property crimes. Cohn and Breetzke (2017) used Fourier analysis to identify periodic moments in time at which the risk of being a victim of violence and property crimes in the city of Tshwane was heightened. Venturini and Babalis (2016) used the Lomb-Scargle periodogram, also known as least-squares spectral analysis, to investigate the periodicity of different types of crimes in San Francisco.
The existing STKDE method assumes a monotonic crime trend along the temporal dimension, which fails to account for the periodicity of crime. To fill this research gap, this study thus develops a predictive crime hotspot model by adapting the STKDE method to account for the commonly observed cyclical trend in crime occurrence, termed 'cyclically adjusted STKDE or cSTKDE method'. The proposed cSTKDE method has two major components. The first component includes a data-driven module to evaluate and measure the periodic pattern in crime, while the second component measures the cyclically adjusted density at locations and detects crime spatio-temporal hotspots. To illustrate this method, it is applied to the robbery incident data collected between January 2010 and May 2018 in Baton Rouge, Louisiana.

The cSTKDE model
The KDE method can be thought of as a measurement of the weighted average of surrounding points within a search bandwidth, and the weight declines according to a kernel function of distance. The (spatial) kernel density at point (x, y) is estimated to be: where K s is a spatial bivariate kernel function of distance d, i = 1, 2, . . . , n represents the observed points (x i , y i ) within a defined bandwidth ðd i � bw s ), and (x, y) represents the location where the density f is to be estimated. The STKDE method developed by Brunsdon, Corcoran, and Higgs (2007) is essentially the multiplication of the spatial bivariate kernel in Equation (1) and a univariate kernel along the temporal dimension for estimating the spatio-temporal kernel density of an event, and it can be formulated as: where (x i , y i , t i ) represents an event at location (x i , y i ) and time t i , (x, y, t) represents any location in the space-time domain, the density f is to be estimated,K s is the same spatial bivariate kernel function defined in Equation (1), K t is the univariate temporal kernel function, and the spatial and temporal kernels are within their corresponding bandwidths d s � bw s ; d t � bw t ð Þ. Both K s and K t decline with distance from the location and time, respectively, being estimated. The focus of our cyclically adjusted STKDE or cSTKDE model is on refining the temporal kernel function K t by incorporating a temporal cycle. In essence, when predicting the hotspot at a time point, one needs to best capture the historical crime trend that not only declines monotonically over time but also fluctuates periodically, for example, due to seasonality. Such a refinement by capturing another layer of complexity in the temporal trend of crime is expected to further improve the model's performance in predictive hotspot mapping, as illustrated in three scenarios in Figure 1. Figure 1(a) illustrates the temporal kernel function when only the cyclical trend is accounted for, while the temporal decay effect is not. Since time point d 1 is closer to the origin (reference time point) than time point d 2 , the traditional STKDE method would assign d 1 a higher weight than d 2 with respect to the origin. However, as d 2 is closer to an integer multiple of one period than d 1 , d 2 actually receives a higher weight than d 1 . This pattern is captured in the temporal weighting function K t developed as follows: where T stands for the crime period, ∆t stands for the temporal distance from the reference time point, and the cosine function is used to ensure the value falling between 0 and 1. The second and third scenarios consider both the cyclical effect and temporal decay effect, but in two different ways, depending on the strength of the temporal decay. Specifically, the scenario in Figure 1(b) assigns d 2 a higher weight than d 1 , while the scenario in Figure 1(c) assigns d 1 a higher weight than d 2 . To capture the effects in Figures 1(b) and 1(c), Equation (3a) is revised as where β is the decay coefficient ϵ [0, 1] and the rest symbols are the same as previously defined. The exponent int Δt T À � means that the temporal weight decreases gradually when the temporal distance increases beyond one period.
When Δt = nT (i.e. an integer multiple of T and n > 1), the exponent of β = int Δt T À 1 À � , and the temporal weighting function becomes: In summary, three temporal weighting functions K t are developed in the proposed cSTKDE model. One should use the K t in Equation (3a) when the temporal distance between two time instances is less than or equal to one crime period, in Equation (3b) when the temporal distance is greater than one crime period and not equal to an integer multiple of crime period, and in Equation (3c) when the temporal distance is greater than one crime period and equal to an integer multiple of crime period. This is in contrast to the monotonically decreasing weighting function K t in the traditional STKDE model in Equation (2).
To measure the spatio-temporal density at a given location using our cSTKDE model, one can simply replace K t in Equation (2) with three newly defined temporal weighting functions in Equations (3a-3c). Most standard kernels are non-negative, symmetric and have an integral of 1 (Yin and Wilson 2020). According to Equations (3a-3c), theK t in the cSTKDE is non-negative but not symmetric, and the integral is greater than 1. Since K t is used to forecast future crime based on historic crime in the study, the temporal distance is always positive, and hence being asymmetric does not have an effect. As for the integral of K t being greater than 1, it results in larger absolute values of the density estimates, but in a homogeneous way across an area. Thus, it does not affect the statistical distribution of the density estimates in an area, from which hotspots are identified.

Predictive performance evaluation metrics
There are many metrics to evaluate the predictive performance of derived crime hotspots. The hit rate, for example, is a popular one. It is calculated as the ratio of crimes captured by the identified hotspot (n te ) out of all crimes in the testing data (n te ) (Bowers, Johnson, and Pease 2004;Hart and Zandbergen 2014). However, the hit rate is significantly affected by the area size of hotspots. A more meaningful metric is the predictive accuracy index or PAI (Chainey, Tompson, and Uhlig 2008), which is the hit rate divided by the percentage of hotspot area (a) of study area size (A). In essence, PAI measures the success of forecasting points in the most efficiently sized area and hence is more analogous to precision than accuracy. Therefore, the PAI is used in this study to describe the predictive precision and renamed as the Forecast Precision Index (FPI), formulated as: A higher value of FPI represents a better model performance.
In addition, another index, the forecast accuracy index (FAI), is designed to assess the predictive accuracy for evaluating the effectiveness of hotspots in describing the predicted cluster's size and shape (Swain 2012). It measures how closely (or differently) the predictive hotspot's density in the testing data resembles that in the training data, i.e. the effectiveness of the description. The FAI is formulated as: where n tr is the incident count of training data in hotspots, and n tr is the total number of incidents in training data. A smaller absolute value of (1-FAI) indicates a better model performance. There usually exists a tradeoff between the predictive accuracy and precision (Swain 2012).

Data and study area
The study area is the city of Baton  100 represents a perfect match, and lower scores represent decreasing match accuracy. In this study, the threshold score is set as 90 in this study. Table 1 summarizes the total number of crime events in the study area from January 2010 to May 2018. The numbers were usually the highest for aggravated assault, followed by robbery, illegal use of weapons, sexual assault, and homicide. However, in some years (e.g. 2017 and Jan-May 2018), illegal use of weapon outnumbered robbery.

Crime period detection
The first step in the analysis is to detect the crime period present in the data using the periodogram method in spectral analysis. One advantage of this method is its capacity to deal with missing samples or uneven sampling steps, and therefore it has broad applications (Venturini and Baralis, 2016). The periodogram is also referred to as least-squares spectral analysis, as it fits a least-square of sinusoidal functions over the data. The original daily robbery data are aggregated into a monthly resolution, as shown in Figure 2.
The periodogram method requires that the input data are stationary time series. Therefore, the monthly robbery incident data need to be detrended. This is accomplished by estimating a linear function of robbery counts (y) over time (t, in month), such as y = −0.575*t + 94.382 (R 2 = 0.608) After deducting the estimated count from the total count, the resulting residuals are analysed by the periodogram method. Fisher statistic test shows that robbery has a period of 6 months at the 95% significance level. By adding the cyclical trend, the model's R 2 value is improved over the regression with the linear trend alone (from 0.608 to 0.646):

Predictive crime hotspot by the cSTKDE method
This study uses the longer period of data (2010-2018) for the purpose of identifying temporal trend, and then a subset of the robbery incidents from 2012 to 2014 in the middle to implement the cSTKDE method. One major data preparation task is the split of the data into training and testing datasets. Specifically, the predictive hotspots in each week of 2013 will be evaluated, with the first week of 2013 as time point 0 and the last week as time point 52 for a total of 53 time points. For each time point, the robbery incidents within the temporal bandwidth are set as the training dataset, and the 3-month data after that time point is set as the testing   153  114  126  77  99  108  128  220  72  Sexual assault  213  183  163  170  243  307  341  275  108  Robbery  1052  969  1047  877  674  652  647  470  167  Aggravated assault  1083  1219  1125  904  868  846  939  894  365  Illegal use of weapon  466  740  654  348  382  444  571  627  dataset. The time point moves forward 1 week at a time, and the training and testing dataset periods roll forwards as shown in Figure 3. The five parameters in the cSTKDE model are summarized in Table 2-spatial bandwidth, temporal bandwidth, spatial kernel function, cyclical period, and temporal decay coefficient. After some deliberation for balancing the workload and a minimum need for examining the impact of each parameter, a total of three values are selected and examined for each of the first four parameters. Given the key innovation of the cSTKDE method for refining the temporal kernel, a greater number (5) of values for the last parameter β are chosen. More values in finer increments could be explored in future studies.
For all kernel-based methods, bandwidth values are among the most important parameters (Hu, Miller, and Li 2014). We present guidelines for selecting the optimal bandwidth values, such as the rule-of-thumb method (Silverman 1986), the cross-validation method (Brunsdon 1995), and the distance-based method (Fotheringham, Brunsdon, and Charlton 2000). Alternatively, some studies recommend calculating the bandwidth derived from the number of observed incidents in the study area (Bailey and Gatrell 1995), or the shorter side of the study areas' minimum bounding rectangle (Chainey 2011), while other studies suggest exploring different bandwidth selections (Chainey and Ratcliffe 2005).
Here, three different values-700 m, 900 m, and 1100 m -are selected as the spatial bandwidth bw s by considering the study area size and average activity space. The upper bound of 1100 m slightly exceeds residents' average range of neighbourhood activities, relevant to their perceived safety levels of various neighbourhoods. A lower bound of 700 m is used to ensure enough crime incidents within their buffer zones for KDE. As for the spatial kernel K s , the Epanechnikov, Quartic, and Triweight kernel functions that are commonly used in crime and transportation KDE analyses (Yin and Wilson 2020 ;Chainey, Reid, and Stuart 2002;Chainey and Ratcliffe 2005;Hu et al. 2018) are examined.
Given the 6-month period in the robbery data, both bw t and T are set as 3 months, 6 months, and 12 months, corresponding to three scenarios of temporal distance less than, equal to, and greater than one crime period. Note that the specific coverage of time period for the training dataset always immediately precedes the beginning week of the testing dataset. For the temporal decay coefficient β, a total of five values within the range (0, 1) are explored: 1/3, 1/2, 2/3, 3/4, and 4/5. Its lower bound 1/3 (with the maximum decay) is chosen to ensure that recent temporal cycles (e.g. the first and second cycles) still have detectable influences on kernel densities when compared to those with no temporal cycles at all. The upper bound 4/5 is selected so that the minimum temporal decay still has discernable effects on kernel densities when compared to temporal cycles without any decay at all.   ) 3 months, 6 months, 12 months 3. Spatial kernel function (K s ) Epanechnikov, Quartic, Triweight 4. Crime cyclical period (T) 3 months, 6 months, 12 months 5. Temporal decay coefficient (β) 1/3, 1/2, 2/3, 3/4, 4/5 Figure 4 summarizes the process of implementing the cSTKDE method for timei 2 0; 52 ½ �. For one grid cell (100 m × 100 m) in the study area, we calculate the spatial distance and temporal distance with each incident record in the training dataset, and select the qualified training data byd s � bw s ; d t � bw t . For one grid cell surrounded by n qualified points in the training data, we summarize the weights of the n incident records and obtain the estimated density value for that grid cell. Repeat the process to obtain density estimates for all grid cells in the study area. Finally, crime hotspots are identified by a thematic threshold of density values greater than 1.96 standard deviations (i.e. 95% confidence level). Next, we evaluate the predictive performance of the identified hotspots by calculating the FAI and FPI indices based on the testing dataset.

Sensitivity analysis of cSTKDE
A sensitivity analysis of the cSTKDE method is conducted to understand the impact of a given parameter on model results. The mean values for FAI and FPI over 53 weeks in 2013 for each unique combination of parameter settings are obtained. Table 3 summarizes the trends on how the performance indices change when each parameter in the cSTKDE method changes. Table 3 shows that for both FAI and FPI, a larger temporal bandwidth leads to better results. When it comes to spatial bandwidth or kernel function, there is a trade-off between the two indices. A better FAI is supported by a larger spatial bandwidth and thus a stronger smoothing kernel. However, a better FPI corresponds to a smaller spatial bandwidth and thus a less smoothing kernel. Indeed, a smaller bw s tends to generate more dispersed hotspots of smaller (or better focused) area, and thus helps improve the predictive precision but loses some ground on effectiveness.
In summary, the scenario of cSTKDE withbw t = 12 months,bw s = 1100 m, K s = Epanechnikov, T = 6 months, β ¼ 1 3 ; 1 2 ; 2 3 ; 3 4 has the best mean FAI = 0.960; and the scenario of cSTKDE withbw t = 12 months,bw s = 700 m, K s = Triweight, T = 12 months,β ¼ 1 has the best mean FPI = 5.956. Both scenarios have the largest temporal bandwidth of 12 months. The best FAI corresponds to the largest spatial bandwidth of 1100 m and a crime period of 6 months. While the best FPI corresponds to the smallest spatial bandwidth and a crime period of 12 months.
There is a trade-off between the two indices. Prediction accuracy measures how well the size and shape of the predictive hotspot are described. Prediction precision emphasizes how efficiently the crime in the testing dataset is captured by the predicted  Table 3. Performance of cSTKDE method in different parameters.

Parameter
Mean value of FAI Mean value of FPI Temporal bandwidth (bw t , months) 3 < 6 < 12 Spatial bandwidth (bw s , m) 700 < 900 < 1100 1100 < 900 < 700 Spatial kernel function (K s ) Triweight < Quartic < Epanechnikov Epanechnikov < Quartic < Triweight Crime cyclical period (T, months) 12 & 3 months < 6 months when bw t =6; little variability or inconsistent trend when bw t =3 or 12 12 & 3 months < 6 months when bw t =3 or 6; little variability or inconsistent trend when bw t =12 Temporal decay coefficient (β) Very little variability or inconsistent trend hotspot. It is desirable to strike a balance between accuracy and precision in hotspot policing since high accuracy of the predicted hotspot could lead to more effectiveness of the crime prevention actions and high precision of the predicted hotspot could lead to a more efficient way of using the police force and other recourses.

Assessing the performance of cSTKDE in comparison to KDE and STKDE methods
On the practice front, a law enforcement agency may adapt a strategy based on the best-case scenario by a method and change the strategy as often as weekly. Such a strategy is enabled especially by the cSTKDE and STKDE methods that account for the temporal variability of crime patterns. This section examines the weekly predictive hotspot patterns based on the best FAI and FPI values by the three methods. Figure 5(a) shows how the value of abs(1-FAI) changes weekly in 2013. The three coloured lines stand for abs(1-FAI) values by the cSTKDE, STKDE and KDE methods, and their values range 0 ~ 0.35. A smaller value means a better FAI that is closer to one. Two temporal ranges with more than five consecutive weeks are spotted when the cSTKDE method enjoys the best prediction accuracy, i.e. smallest abs(1-FAI) among the three methods: weeks 0-11 and weeks 38-52 for a total of 26 weeks (week 0 is the starting point of the study year so it is not counted towards the total number of weeks). Similarly, Figure 5(b) shows how the value of FPI varies weekly in 2013. Two temporal ranges with more than five consecutive weeks stand out with the cSTKDE method possessing the best prediction precision, i.e. highest FPI: weeks 0-17 and weeks 30-35 for a total of 23 weeks.
The above observations show that the cSTKDE enjoys an edge over both STKDE and KDE in a significant number of weeks but not the entire 2013. In practice, there is a need to adjust the method as well as its associated parameters in order to predict the hotspot most effectively and efficiently.
Based on the measures outlined in Figures 5(a)-(b), we select and take a close look at 1 week to further illustrate the differences in predictive hotspot maps across the three methods.
Recall that a better FAI value (nearer to 1) indicates that the predictive hotspot represents a better replicate of the relative size of spatial clusters in reported crimes. We first discuss how our new method improves the effective description of the size and shape of hotspot areas. In Figure 5(a), the largest gap of abs(1-FAI) is −0.1510 between the cSTKDE and STKDE is at the 44th week when both the methods have their best FAIs. Figures 6(a)-(b) show the predictive hotspot maps of the 44th week in 2013 by the cSTKDE and STKDE methods, respectively. We can find that the predictive hotspots by the two methods are largely consistent with each other, but the minor differences can be spotted as those hotspots by the cSTKDE (on the left) are more consolidated than those by the STKDE (on the right). Note that the two major hotspots to the east of I-110 are more extensive in Figure 6(a) and are dispersed into much smaller hotspots in Figure 6(b). In addition, a minor hotspot in the southwest corner is visible in Figure 6(a) but missing in 6b, and two minor hotspots along the US Hwy 190 (one to its southwest and another to its northeast) in Figure 6(b) are absent in Figure 6(a). Similarly, the largest gap of abs (1-FAI) between the cSTKDE and the KDE is −0.1340 in the first week when the two methods have their best FAIs. Figures 7(a)-(b) show the predictive hotspot maps of the first week in 2013 by the cSTKDE and KDE methods, respectively. This time, the two major hotspots to the east of I-110 in Figure 7(a) (on the left) merge into one large hotspot in Figure 7(b) (on the right). In addition, two minor hotspots in the east (one along the US Hwy 190 and another along I-12) in Figure 7(b) are missing from Figure 7(a).
Here we shift the focus to examining how the three methods differ in their best predictive hotspot maps when measured in FPI. Based on Figure 5(b) showing the weekly highest FPIs across the three methods in 2013, the largest difference of FPI between the cSTKDE and STKDE methods is 1.6910 in the sixth week, and the results of their predictive hotspots are shown in Figure  A1 in Appendix. Similarly, the largest difference in FPI between the cSTKDE and KDE methods is also 0.9383 in the sixth week, and their results are shown in Figure A2 in Appendix. Since the hotspots on these maps are highly scattered, it is hard to visually identify much difference between them.
When we view these maps altogether (Figures 6,7, A1, A2), there are two major observations: (1) The hotspots on all four maps selected by the best FPI ( Figures A1 & A2 in the Appendix) are far more scattered than those maps selected by the best FAI (Figures 6 & 7). In other words, the predictive hotspot maps for achieving better FAIs tend to be more consolidated into a small number of clusters, whereas the maps for improving FPIs tend to be more scattered into small, numerous, and localized pockets. The two assessment indices represent the competing objectives of hotspot policing in effectiveness and efficiency, and the tradeoff between them.
(2) The hotspots identified by the three methods are largely consistent when the best scenarios are compared in terms of either index. This indicates the spatial stability of hotspots over time, as revealed by many studies. By accounting for the temporal variability, STKDE and cSTKDE introduce minor but notable refinements. The first observation suggests that the hotspots with the best FPI might be too numerous and dispersed to be meaningful for offering practical guidance on police patrol tactics. The hotspots with the best FAI are more feasible for hotspot policing. The second observation merits a more in-depth investigation as the focus of this paper is on the value of the refined temporal kernel function. We now examine how the hotspots predicted by each method change across the weeks. For limited space, we only cover the first 9 weeks by a biweekly increment. Figure 8 shows how the hotspots of the three methods change from week 1 to week 9 when considering the best FAI. We can find that the cSTKDE method provides more obvious changes between weeks than the other two methods. The major hotspots to the east of I-110 predicted by the cSTKDE method are two separate ones in week 1, then become a consolidated one hotspot in week 3, and then become two separated spots again from week 3 to week 9. However, the hotspots predicted by the STKDE method at the same location do not change too much in the size or shape from week 1 to week 9. Obviously, the KDE method does not account for temporal variability and does not predict any changes in hotspots. One minor hotspot along the US Hwy 190 (in the northeast) does not show in the first week hotspot map predicted by cSTKDE but shows from week 3 to week 9.
However, for the KDE or STKDE method, that minor hotspot shows consistently from week 1 to week 9. According to the distribution of testing data, we can find that the number of robbery crimes at that minor hotspot location started increasing in week 3 and continued the trend till week 9. Therefore, the cSTKDE method captures this temporal variation better than the other two methods. Table 4 shows the area percentage of the identified hotspots. We show that when considering both the best FPI and FAI, cSTKDE has a relatively lower hotspot area percentage than KDE and STKDE except for the ninth week in FPI, although this difference is overall less than 2%. The ability of cSTKDE to describe the hotspots' size and shape and to capture hotspots' temporal variation in terms of the best FAI, has some advantages than KDE and STKDE.

Implications in public policy and police practice
Crime analysis has a long tradition of identifying crime hotspots. It can be traced back to the old days of 'pushpin maps' as a common practice in police departments in the United States, where analysts physically placed pushpins on large street maps to indicate crime hazards (FBI, 1944). The geographic concentration of crime in very limited areas is striking, and naturally suggests deploying law enforcement resources to those hotspot areas, which leads the way in place-based policing strategy (Weisburd 2005). Related data support and analysis on hotspots has powered the progress of increasing usage of computerized mapping and GIS in crime studies and police practice (Harries 1999). The success of hotspot policing relies on the quality of identified hotspots, which guide the law enforcement on where and when to allocate the police force.
(1) The balancing act between effectiveness vs. precision. As Swain (2012) pointed out, there usually exists a trade-off between the predictive accuracy (FAI) and precision (FPI), corresponding to sometimes competing goals in hotspot policing, namely effectiveness versus efficiency. Findings from this study suggest that the cSTKDE may not always possess an advantage in efficiency (the portion of crimes captured by the hotspot relative to the portion of its area size out of the study area size) or effectiveness (the portion of crimes in the hotspot in the testing data relative to the portion in the training data) over STKDE or KDE. The two indices are often not consistent with each other, and thus it is unrealistic to find a method that wins on both all the time.  (2) The practical matter in valuing one measure over the other. If police resource is less of a concern, sufficient force could be deployed to cover all predictive hotspot areas. In this case, a minor difference in hotspot area size prescribed by the three methods is less of a concern for police. Precision is outweighed by accuracy, and one may rely on the predictive hotspot method with the best FAIs to guide the policing plan. When the resource is very limited, it could be more desirable to use the minimal force in achieving the maximal impact, and therefore a more plausible strategy is to seek guidance from a method with the best FPI. In this case study, we recommend adopting the best-FAI policing strategy, most often by the cSTKDE method, over the best-FPI one when such a decision may incur a larger treatment area, but the expanded area is reasonable and can be covered by existing resource. This is based on this study on robbery in Baton Rouge in 2013, and further work on different crimes in different areas at different times are needed.
(3) Accounting for the spatial pattern of hotspots.
Based on these results, a hotspot map further driven by an optimal (smallest) FPI is prone to generate scattered hotspots in a large number that are apart from each other. On the other hand, a hotspot map pursuing a more favourable FAI is more likely to yield a small number of consolidated hotspots. It is logistically more challenging to deploy police forces in many groups to attend to numerous hotspots, especially when these places are far away from each other. The total area size of these hotspots may be smaller and more precise, but the effort to cover them may not be less. Its effective implementation on the ground needs to account for the spatial pattern of identified hot spots in size, number, shape, and position from each other. (4) Complexity in the dynamics between criminals and police. The interaction between criminals and police takes place in space and evolves over time, and the spatiotemporal dynamics is complex. For example, the rational choice theory (Clarke et al. 1993) suggests that potential criminals may be deterred by increased policing presence in hotspot areas, and it leads to decreased crime activities there in a short term and moves hotspots elsewhere. On the other hand, the routine activity theory (Cohen and Felson 1979) implies that crimes tend to be repeated in the same location around the routine activity space of criminals, and it helps explain the persistence of high crime areas over time. Police and criminals change their next moves in response to the change of actions in each other. Such a complex dynamics is best captured by some modern agent-based crime models (e.g. Zhu and Wang 2021). If police remain focused on the same hotspots over a long period of time by assuming a constant behaviour for criminals, it cannot be as effective as time goes on. In other words, we need to adjust the policing strategy, spatiotemporally, guided by a model that predicts the crime pattern that is accurate in both time and space. Among the three methods, the cSTKDE method not only accounts for the spatial variation trend but also attempts to capture the complex temporal cyclical trend accurately.
In summary, the cSTKDE method has the best overall performance in effectiveness of predictive hotspot among the three methods. Its seeming deficiency in precision measurement in comparison to the KDE is not as consequential in its implication for practice since the predictive hotspot areas identified by the three methods are all within a narrow range of about 5% of the study area. The cSTKDE also generates more favourable results for other practical matters in implementing hotspot policing such as more consolidated and fewer hotspots. Overall, the cSTKDE is recommended and is closely in line with the modern term of 'precision policing' (Ashraf 2020) enabled by data analytics.

Concluding remarks
This study develops a new method cSTKDE by considering the cyclic trend in the temporal dimension of crime patterns in order to further improve the predictive crime hotspot analysis. Specifically, we first use a spectral analysis method, periodogram, to identify the potential periods in the robbery incident data collected in Baton Rouge, Louisiana, and the results demonstrate that robbery has a statistically significant cyclical trend. The existing STKDE method assuming a monotonic temporal trend is then refined to the cSTKDE method to capture both the monotonic and cyclical trends along the temporal dimension. The cSTKDE method is implemented for predicting robbery hotspots in the study area with a large number of parameter settings and generates a rich set of results. The results are assessed using performance measures such as FAI and FPI to identify the best one among the KDE, STKDE, and cSTKDE methods. Major findings from the study can be summarized as follows: First, robbery has a statistically significant period of 6 months in Baton Rouge. Second, by integrating the cyclical trend into the linear trend, the regression models have better fitting power for robbery than the one including the linear trend alone. This suggests the need for incorporating the cyclical trend in the STKDE modelling approach. Finally, overall, the cSTKDE method is more appropriate and meaningful and offers more practical crime patterns than the existing STKDE and KDE methods.
This research has some limitations to be addressed in future works. First, one may aggregate the crime data into finer temporal resolution like daily and use the periodogram method to detect a more granular and accurate period. Secondly, we can experiment with more parameter settings including a temporal weighting (kernel) function that differs from the spatial weighting (kernel) function and further expands the number of scenarios for identifying the best combinations. Third, for a better understanding of the temporal dynamics in crime patterns, future work can test a series of values for a phase parameter controlling which phase of the crime cycle temporal decay starts, such as (0, π/4, π/2, π, 3π/4, 3π/2). It could be beneficial to consider additional nested cyclic patterns, which can point to criminogenic factors working at different temporal scales (e.g. seasonal pattern due to climate vs. monthly pattern by paycheck cycles). Fourth, a customized automated tool is needed to make the parameter choices with finer increments, and thus generate the best model with parameter values of higher precision. For all the above issues, their successful implementations conceivably rely on data of high quality, large volume, and over a long period of time. This may lead us to a study area of a larger city with reliable and consistent crime data for a longer time. From this study, the results may vary by crime type and for different periods of time.