Applying the colocation quotient index to crash severity analyses

https://doi.org/10.1016/j.aap.2019.105368Get rights and content

Highlights

  • The colocation quotient was used to measure the spatial associations among crashes of various severities that occurred in College Station, Texas.

  • Crashes tended to be at the same injury level as those of neighboring ones, which was most significant for fatal crashes.

  • The colocation quotient matrix tended to be symmetrical in non-injury crashes versus injury crashes (minor injury, major injury, and fatal).

Abstract

Examining the spatial relationships among crashes of various severity levels is essential for gaining a better understanding of the severity distribution and potential contributing factors to collisions. However, relatively few scholars have focused on analyzing this type of data. Therefore, in this study, we utilized a new index, the colocation quotient, to measure the spatial associations among crashes of various severities that occurred in College Station, Texas. This new method has been widely used to define the colocation pattern of categorized data in various fields, but it has not yet been applied to crash severity data. According to our findings, (1) crashes tended to be at the same injury level as those of neighboring ones, which was most significant for fatal crashes and second most significant for non-injury crashes; (2) the colocation quotient matrix tended to be symmetrical in non-injury crashes versus injury crashes (minor injury, major injury, and fatal); and, (3) DWIs (driving while intoxicated) and hit-and runs did not show a strong pattern. These colocation quotient results could be helpful for predicting crash severity and by providing traffic engineers with more effective traffic safety measures.

Introduction

In order to improve traffic safety, we must define the spatial relationships among crashes of varying severity levels, which could lead to a better understanding of the severity distribution and potential contributing factors to crashes. However, most scholars who have studied this distribution have focused on predicting the number of crashes, or hotspot identification, rather than on severity analysis (Savolainen et al., 2011; Lord and Mannering, 2010). Furthermore, the majority of studies related to analyzing crash severity have employed advanced statistical models, such as the logit, the tobit, and ordered probit models (Kockelman and Kweon, 2002), multivariate Poisson–lognormal models (Park and Lord, 2007; Ye and Lord, 2014), Bayesian hierarchical analysis (Huang et al., 2008), and the gradient boosting data mining model (Zheng et al., 2018) to show the mathematical relationships among various severity levels and risk factors. However, relatively few scholars have examined the spatial dependence between crash severity levels (Chiou and Fu, 2013, 2015; Castro et al., 2013; Anarkooli et al., 2017; Liu et al., 2019; Zeng et al., 2019). Chiou and Fu (2013), for example, have used aggregated crash data and macro models for this purpose. In other words, in the few existing studies, researchers have aggregated crash severity point data into polygon data and then conducted further analyses of it instead of examining the “real” spatial relationships among crash points of varying severity levels (such as distance) and how they affect distribution in space. It must be noted that aggregating crash severity data for spatial analysis may affect the results depending on the extent of the study area (such as a county, zip code, or a specific intersection or segment of highway). Hence, we decided to conduct a study on disaggregated crash severity data in order to better understand severity patterns at the level of a highway or street network.

The new method used in this study, the colocation quotient (CLQ), can capture patterns by using original point data (Leslie and Kronenfeld, 2011). The method discussed here, also called co-occurrence, refers to the types of spatial associations (clustering, dispersion, or random tendencies) between two or more categories of a population. For example, if the CLQ of an “A” injury level1 crash to a “B” level injury crash (CLQA→B) is higher than one, an A-level crash is more likely to occur near a B-level type than other random severity level crashes. In addition, researchers use the CLQ to avoid the Modifiable Areal Unit Problem (MAUP), also known as the aggregation bias problem (Leslie and Kronenfeld, 2011; Wang et al., 2017), which can have a significant impact on statistical hypothesis testing results (e.g., positive impacts to negative values) when various study scales or areas are used (Xu et al., 2014). Because the CLQ is a point-based measurement of spatial phenomena, the researcher does not need to aggregate point data by districts. Furthermore, this method is more suitable for analyzing categorical data, such as crash severity (Leslie and Kronenfeld, 2011), rather than numerical data. It is mainly because crash severity data are grouped into five severity levels and recorded as categorical data as fatal (K), incapacitating injury (A), non-incapacitating injury (B), possible injury (C), property damage only (PDO or O).

Compared to the existing colocation index methodology, the CLQ provides more flexibly because it uses distance ranking and allows for an asymmetrical matrix (Leslie and Kronenfeld, 2011). Because distance rank is considered, there is no need for the researcher to obtain an accurate network distance, which can be particularly difficult to measure on a street network in a rural area or in a developing country without a detailed road map. In addition, because the CLQ matrix is asymmetrical, it can reveal correlations in various directions. For example, CLQC→I (e.g., how crashes are often clustered around an intersection) may not be equal to CLQI→C (e.g., how intersections cluster around crashes). Thus, this quotient can more accurately represent spatial patterns in the real world.

The CLQ works differently than other traditional indices that are commonly used to quantify spatial correlations, such as the bivariate Moran’s I, the normalized cross-vario-gram, Ripley’s k, Joint Statistics, and the Cross K function (Cliff and Ord, 1981; Vallejos 2008; Cromley et al., 2014). While all measurements are somewhat helpful for conducting crash severity analyses, they all have significant limitations that prevent them from delineating a suitable index for identifying the appropriate relationships between crash severity levels (Leslie and Kronenfeld, 2011). For example, some methods discussed above, such as the Pearson correlation, Bivariate ordinary least square, and cross vario-gram can be employed to successfully analyze continuous rather than nominal variables, which are of primary concern in our study. Also, the Moran’s I is more suitable to analyze autocorrelation one variable at the time. The Joint count statistics and Moran’s I are area-based methods, which are applied in a polygon framework rather than as point data. As for the most similar measure to our proposed CLQ, the problems of the cross-k-function include (1) using a metrical distance instead of a topological distance to define its neighbor, (2) measuring the spatial association of two populations instead of one single population, and (3) the inability to control for population clusters (Leslie and Kronenfeld, 2011; Cromley et al., 2014). Table 1 shows more details about the comparison between different methods. Only the CLQ can show the bidirectional relationship among different crash severity data by using an asymmetric matrix, while the cross-k-function and joint statistics can only show a unidirectional relationship. Furthermore, the CLQ can solve the false positive related to the population cluster problem. Recently, Hu et al. (2018) applied both the global and local CLQ to define crash hotspots involving pedestrians and cyclists. Due to this pioneering work, researchers are able to determine how the colocation quotient can be used to define the relationships among crashes of different severities. Therefore, the purpose of this study was to utilize this method to define the colocation patterns of crashes of various severity levels on a street network.

Section snippets

Background

This section presents background information on previous analyses of the severity levels of crashes in general, highway safety, and studies that have used the colocation index.

Methodology

The colocation quotient can be used to measure spatial associations among two or more categories of observations (type A points and type B points). The basic goal of this study was to compare the observed percentage of level B crashes among level A’s nearest neighbors (the numerator in Eq. 1) to the expected percentage (the denominator in Eq. 1). The equation for the colocation quotient is defined as follows:CLQAB=CAB/NANB'/(N-1)where,

CLQAB is the ratio of the observed to the expected

Dataset and descriptive statistics

Our study area was College Station, a mid-sized college town located in Central Texas with a population of approximately 100,000 during the period of data collection between January 2005 and September 2010. However, it has recently increased. The crash data were provided by the College Station Police Department (CSPD). There were 14,710 crash reports containing data such as the location, date, time, and severity of the crash. The road network in College Station is relatively simple: there are

Results and discussion

As described above, we used the crash dataset for College Station from 2005 to 2010 in this study. Crash severity is categorized into eight different groups and four main levels of injury: Non-injury (NON), minor injury (MIN), major injury (MAJ), and fatal (FAT). Police officers also have four types of crashes related to violations of traffic laws, such as hit and runs (HIT/HIT INJ) and driving-while-intoxicated (DWI MIN INJ/ DWI MAJ INJ), a total of eight crash severity types. Table 3 shows

Summary and conclusions

This paper has documented the application of the colocation quotient that can be used for estimating the spatial relationships among crash severity levels using disaggregated data. This method can show relationships that cannot be observed visually or through other correlation techniques, such as the bivariate Moran’s I, normalized cross variogram, Ripley’s k, and the Cross K function. For this reason, we applied the CLQ to a dataset collected in College Station, Texas between 2005 and 2010.

CRediT authorship contribution statement

Pei-Fen Kuo: Conceptualization, Methodology, Writing - original draft. Dominique Lord: Methodology, Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

acknowledgement

This research is partially supported by the Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 106-2410-H-006 -116.

References (24)

Cited by (11)

  • Flow colocation quotient: Measuring bivariate spatial association for flow data

    2023, Computers, Environment and Urban Systems
    Citation Excerpt :

    Thus, understanding the bivariate spatial associations among distinct populations has great significance to fully comprehend complex spatial patterns and processes (Leslie & Kronenfeld, 2011; Yue, Zhu, Ye, & Guo, 2017). Recently, analyses of bivariate spatial association have been widely performed in fields such as epidemic transmission (Souris & Bichaud, 2011), urban traffic safety (Hu, Zhang, & Shelton, 2018; Kuo & Lord, 2020), crime analysis (He et al., 2020; Wang, Hu, Wang, & Li, 2017; Yue et al., 2017), real estate sales (Anselin & Li, 2019), and urban transportation services (Tao & Thill, 2019, 2020). With the rapid development of information and communication technologies and the widespread use of location-aware devices, large amounts of flow data have been collected (Kong, Liu, Wang, Tong, & Zhang, 2017; Liu, Gong, Gong, & Liu, 2015; Liu, Kang, Gong, & Liu, 2015; Shi, Chi, Liu, & Liu, 2015).

  • A Review of Incident Prediction, Resource Allocation, and Dispatch Models for Emergency Management

    2022, Accident Analysis and Prevention
    Citation Excerpt :

    The most commonly used features in this regard are the number of lanes, annual average daily traffic (AADT), segment length, width of the lanes, features regarding shoulders, horizontal turns and slopes (Ma et al., 2008; Zeng et al., 2017; Wen et al., 2021), the presence of uncontrolled left-turn lane, the presence of bus stops and surveillance cameras, median widths, speed limit (Khazraee et al., 2018; Huang et al., 2008), and features specific to intersections (Chin, 2003; Huang et al., 2008). Population density (Parsa et al., 2020), road density (Bao et al., 2019), and socio-economic features can also be important predictors of accidents rates, for example, the density of the bars in a region has been used in crash prediction, in particular hit and run accidents (Kuo and Lord, 2020). Crashes exhibit strong spatial–temporal incident correlation.

  • A visual approach for defining the spatial relationships among crashes, crimes, and alcohol retailers: Applying the color mixing theorem to define the colocation pattern of multiple variables

    2021, Accident Analysis and Prevention
    Citation Excerpt :

    It is clear that crimes, crashes and alcohol retailers are closely linked and occur frequently in the business district that has a higher population and traffic volume. There might be a similar spatial pattern if this colocation were to be applied in Asia (Kuo and Lord, 2020). The major goal of uncertainty is to use color to show the probability of whether a specific event happened or not based on the entropy function (Seiple and Lim 2017).

  • Highway Safety Analytics and Modeling

    2021, Highway Safety Analytics and Modeling
View all citing articles on Scopus
View full text