Dataset and figures on time-series analysis of child restraint policy impact in Chile

The main objective of this data article is to present the data set which depicts the impact of child restraint legislation in Chile and its regions. The population of the study consisted of all car crashes records provided by the national police from 2002 to 2014, which included children aged 0–3. Auto Regressive Integrated Moving Average ARIMA and Poisson model were used to present the association between the dependent and independent variables of interest. When the data are analyzed, it will help to determine the degree of relationship and the strength of significance between child restraint legislation policies enacted in 2005 and 2007, and child occupant fatalities and injuries. The data are related to “Impact of child restraint policies on child occupant fatalities and injuries in Chile and its regions: An interrupted time-series study” (Nazif-Munoz et al., 2018).


Subject area
Road safety More specific subject area Child restraint legislation policies Type of data Table and figures How data were acquired Police officers collect and process all traffic incidents in Chile Data format Analyzed Experimental factors Extensive national database of traffic fatalities, injuries, and crashes in Chile Experimental features The impact of child restraint legislation on child injuries and fatalities Data source location Chile Data accessibility Data come from the National Commission of Road Safety https://www. conaset.cl/programa/observatorio-datos-estadistica/ Related research article Nazif-Munoz, J.I., Nandi, A., Ruiz-Casares, M., 2018. Impact of child restraint policies on child occupant fatalities and injuries in Chile and its regions: An interrupted time-series study. Accident Analysis and Prevention, 120 pp. 38-45 [1].

Value of the data
The data can be used as a platform for further investigation by other researchers interested in time series analyses.
These data provided here can be adopted when modeling the short and long-term impact of road safety policies, including linear and non-linear forms, like in Chile [1].
These data can be used to explore how to determine the best models when applying Auto Regressive Integrated Moving Average (ARIMA) models when assessing public policies.

Data
The data comprised of police data on the influence of child restraint legislation on reducing traffic child injuries and or fatalities for the period 2002-2014. Table 1 shows the descriptive statistics of the study variables. Tables 2-15 include ARIMA models considering linear, quadratic and logarithmic equations when assessing the impact of child restraint legislation for children per vehicle fleet. Tables 16-25 include Poisson models considering linear, quadratic and logarithmic equations when assessing the impact of child restraint legislation for children per vehicle fleet. Tables 26-27 describe the results with children per children population. Tables 28-29 describe the results in which traffic crashes and injuries are introduced as controls when assessing the ARIMA models. All tables contain a series of dummy variables representing each month of year were used to control for seasonality patterns. Seasonality is a condition that may affect traffic fatalities and injuries variation since in certain months of the year it can be observed more collisions due to higher traffic volume than in other months. Figs. 1-8 depict fitted and raw data of child injuries and fatalities for Chile including several regions.

Experimental design, materials and methods
Since each time series has a unique structure, ARIMA models are developed using a three-stage iterative process: (1) identification, (2) estimation and (3) diagnostic [2]. Identification involves examining both autocorrelation and partial autocorrelation matrices to establish, first, whether the series was stochastic or deterministic, and, second, autoregressive and moving average parameters [3,4].
We explain the ARIMA model selection processes for two different dependent variables which correspond to Chile and Chile's Metropolitan Region. We provide the explanation of the selection process for traffic fatalities at the national level (Chile), and traffic injuries for the Metropolitan Region.

Traffic fatalities at the national level
To identifying the order of differencing in the ARIMA model for traffic fatalities at the national level, we proceed by testing first the stationarity of the time series for the variable traffic fatalities in Chile. Two tests can be applied to identify the realization of a stationary process Dickey-Fuller [4,5]             and Phillips-Perron [6]. Results of these tests displayed in Table 30 suggest that the time-series of this variable have a stationary process, and therefore the series do not require to be differentiated. In order to confirm the latter, we proceed to analyze the autocorrelation of residuals for 40 lags for this variable.
As we can observe the distribution of the autocorrelations of the residuals has both negative and positive results confirming a stationary process in the series.
To identify the AR and/or MA terms for the ARIMAs model for traffic fatalities at the national level we inspect the distribution of the residuals depicted in Fig. 9. This figure suggests the presence of an  AR model rather than a MA one, since the distribution of the residuals after the first two lags is different from 0 [3]. Nevertheless, we can compare four models one with the absence of AR and MA terms (ARIMA (0,0,0), two with different AR terms ((1,0,0) and (2,0,0)), and one with an MA term (0,0,1) to confirm what we observed in Fig. 9. For this we analyze the partial autocorrelation of residuals in traffic fatalities at the national level. To identify an MA term one should observe a decaying process with negative values, whereas for the AR terms one should observe spikes in different lags which will determine the number of terms. We observe in Fig. 10 the presence of a positive partial autocorrelation suggesting an AR model rather than an MA. This is confirmed in Figs. 11 and 12, in which partial autocorrelation of residuals are not present in the first lag. In Fig. 13 we also observe the same pattern when we model an ARIMA (0,0,1).  This leads us to test the presence of white noise in each of these four models. In order to determine the white noise we provide the value for the Portmanetu (Q) test [7] in Table 31. A value lower than 0.05 identifies autocorrelation and therefore the confidence intervals of the estimators of interests are biased. Table 31 confirms the presence of a model with an AR term since the p values for the ARIMA (1,0,0) and ARIMA (2,0,0) models are higher than 0.05. In order to determine which model fits the data better, we proceed to compare the AIC and BIC values for each model. We observe in Table 32 that the ARIMA (1,0,0) model fits the data better, and therefore we can proceed to analyze traffic fatality variation with this model.

Traffic injuries for the Metropolitan Region
To identify whether the distribution of traffic injuries over time in the Metropolitan Region follows or not a stationary process we apply two tests, Dickey-Fuller and Phillips-Perron. In Table 33, we display the results for each test. Results of both tests suggest that the time-series of traffic injuries in the Metropolitan Region have a stationary process, and therefore the series do not require to be differentiated ( Tables 34 and 35).
We observe in Fig. 14, the autocorrelation of the residuals is positive in the first two lags, and then the autocorrelations of the residuals are not decaying consequently, but rather show a fluctuating patter. This suggest that the series may not need to be differentiated.  Similarly, to the series of traffic fatalities at the national level (Fig. 9), Fig. 14 suggests also the presence of an AR model rather than an MA one, since the distribution of the residuals after the first two lags is different from 0 [3]. Nevertheless, we can compare four models one with the absence of AR and MA terms ARIMA (0,0,0), two with different AR terms ((1,0,0) and (2,0,0)), and one with an MA term (0,0,1) to confirm what we observed in Fig. 14. For this we analyze the partial autocorrelation of residuals in traffic injuries for the Metropolitan Region. To identify an MA term, one should observe a decaying process with negative values, whereas for the AR terms one should observe spikes in different lags which will determine the number of terms ( Figs. 15,16 and 18). --À 0.00 À 0.00 0.00 Constant À 13.15 *** À 13.81 À 12.50 13.05 *** À 14.06 À 12.05 À 13.15 *** À 13.82 À 12.48 Lag 1 of dependent variable À 0.00 À 0.08 0.07 À 0.00 À 0.08 0.07 À 0.00 À 0.08 0.07 Lag 2 of dependent variable 0.05 À 0.03 0.14 0.05 À 0.03 0.14 0.05 À 0.03 0.14 All models contain 11 dummy variables to control for monthly variations and traffic fatalities of children population (aged 4-7) per vehicle fleet. **5% significance level; CI Confidence Interval; AIC Akaike information criterion; BIC Bayesian information criterion. *** 1% significance level. --À 0.00 À 0.00 0.00 Constant À 13.19 *** À 14.08 À 12.30 13.32 *** À 14.86 À 11.77 À 13.29 *** À 14.20 À 12.38 Lag 2 of dependent variable À 0.13 À 0.32 0.05 À 0.13 À 0.32 0.05 À 0.14 À 0.32 0.04 Out of these four figures only Fig. 17 displays partial autocorrelations of residuals not significant for the first 12 lags. This confirms the presence of an AR model. To determine the white noise for these models in the following table we provide the value for the Portmanetu (Q) tests. In which a value lower than 0.05 identifies autocorrelation and therefore the confidence intervals of the estimators of interests are biased.
According to these values only the ARIMA (2,0,0,) fits the data better for these series. Since the partial test associated to the presence of white noise is not significant at p o 0.05. To confirm this   result, we report the AIC and BIC values of the ARIMA (1,0,0) and ARIMA (2,0,0) models, since these two had the highest p-values. Following Burnham and Anderson [8] we observe that the ARIMA (2,0,0) model fits the data better than the ARIMA (1,0,0) model since its both AIC and BIC values are lower than the ones corresponding to the ARIMA (3,0,0). In sum the model ARIMA (2,0,0) is chosen to compare with an alternative ARIMA model with an MA part identified.
In sum the ARIMA (2,0,0) model is the most appropriate to assess traffic injury variation in the Metropolitan Region.