The relationship between aggressive driving and driver performance: A systematic review with meta-analysis

Traffic crashes remain a leading cause of accidental human death where aggressive driving is a significant contributing factor. To review the driver ’ s performance presented in aggressive driving, this systematic review screens 2412 pieces of relevant literature, selects and synthesizes 31 reports with 34 primary studies that investigated the driver ’ s control performance among the general driver population in four-wheeled passenger vehicles and published with full text in English. These 34 selected studies involved 1731 participants in total. By examining the selected 34 studies, the measures relating to vehicle speed (e.g., mean speed, n = 22), lateral control (e.g., lane deviation, n = 17) and driving errors (e.g., violation of traffic rules, n = 12) were reported most frequently with a significant difference observed between aggressive driving and driving in the control group. The result of the meta -analysis indicates that the aggressive driving behaviour would have 1) a signifi- cantly faster speed than the behaviour in the control group with an increase of 5.32 km/h (95% confidence interval, [3.27, 7.37] km/h) based on 8 studies with 639 participants in total; 2) 2.51 times more driving errors (95% confidence interval, [1.32, 3.71] times) than the behaviour in the control group, based on 5 studies with 136 participants in total. This finding can be used to support the identification and quantification of aggressive driving behaviour, which could form the basis of an in-vehicle aggressive driving monitoring system.


Definition of aggressive driving
Aggressive driving (AD) has long been known as a significant risk to road safety and driving experience (Perry, 1968;Doob & Gross, 1968). The mechanism of such aggressive behaviour can be explained by the frustration-aggression model (Berkowitz 1989;Shinar, 1998;Dollard et al., 1939). The theory attempted to attribute human aggression to the frustration they encountered (Alonso et al., 2019). For example, frustrations like high congestion on the road could lead to aggressive behaviours like frequent lane changing while leaving a small headway for the following drivers. However, there is no widely accepted definition of AD (Zhang et al., 2017;Suhr, 2016;Tasca, 2000;Dula & Geller, 2003). Shinar offered a point of view that AD can be classified into instrumental or hostile AD where instrumental AD is a behaviour that drivers adopt to overcome frustrating obstacles while hostile AD is a way to vent anger (Shinar, 1998;Baron and Byrne, 1994). The goal of instrumental aggression was not to harm the victim but to obtain other goals proactively (Anderson & Bushman, 2002;Berkowitz 1993, Geen 2001). Hostile aggression, or affective, impulsive, or reactive aggression, has a distinctive difference when compared with instrumental aggression as its immediate intent is to cause harm (Anderson & Bushman, 2002;Bushman & Anderson, 2001). However, this dichotomy did not have a clear cut-off line as both instrumental and hostile AD can appear at the same time. The American Automobile Association (2009) defines AD as "any unsafe driving behaviour, performed deliberately and with ill intention or disregard for safety". The National Highway Traffic Safety Administration (NHTSA) defines aggressive driving as "an individual commits a combination of moving traffic offences so as to endanger other persons or property" (McCartt et al., 2001, p.1). Some researchers focus more on the behaviour and result: a pattern of unsafe driving behaviour that puts the driver and others at risk (Harris et al., 2014;Houston & Harris, 2003).
In this study, aggressive driving is defined as "any driving behavior that intentionally (whether fueled by anger or frustration or as a calculated means to an end) endangers others psychologically, physically, or both" (Ellison-Potter et al., 2001). The justification is that the intention, though hard to measure, is a distinction that differs between mistake and lapse. Some mistakes are due to a lack of experience or other factors rather than the driver's willingness. For example, if a driver is driving faster than the posted speed limit and changing lanes frequently due to being late for a meeting, such behaviour can be considered aggressive driving. However, if a novice driver misses the speed limit sign and drives faster than the posted speed limit, such behaviours should be excluded from AD as the novice driver lacks the intention to endanger other road users or to obtain other goals proactively.

Significance of research
Road traffic injury is one of the leading causes of death and killed around 1.28 million people in 2019 (World Health Organization, 2020). In the European Union, it is estimated that 22,700 fatalities were reported in 2019, and more than 1.2 million people were injured due to road traffic injuries (European Commission, 2021). To improve road safety, which can be considered as the desirable interaction between the road user, vehicle, and road infrastructure, there is a need to better understand driver behaviours, via their driving performance (Perello-March et al., 2022). According to the European Road Assessment Programme, human factors are a contributing element to the occurrence of 90-95% of road crashes; road and environment is a contributing factor to 28-35% of crashes, and vehicle issues to 8-10% of crashes (note: factors interacted with each other, which caused a total percentage higher than 100%; European Road Assessment Programme, 2015; Goniewicz et al., 2016). Several other studies have also indicated that human factors make up a larger portion of the causation of road crashes (Micheale, 2017;Zhang et al., 2013). To improve driving safety, the understanding of human drivers at a deeper level is essential.
Whether it is an absolute quantity or a relative proportion, aggressive driving (AD) is harming the driving environment. 78% of U.S. drivers reported having engaged in at least one aggressive driving behaviour in the past year (American Automobile Association Foundation for Traffic Safety, 2016). In China, it was estimated that aggressive driving behaviours, such as speeding, running red lights, and weaving behaviour, accounted for approximately 95% of all traffic deaths in 2011 Traffic Administration Bureau of China State Security Ministry, 2011). Besides the significant contribution to crash risk, AD also has a negative influence on energy consumption and emissions of air pollutants (Berry, 2010;Adamidis et al., 2020) and a relatively low traffic flow stability (Rong et al., 2011). Aggressive behaviours are also a cause of other drivers' feelings of irritation and can induce aggressive driving behaviour in others (Björklund, 2008). In addition to the significant negative impact on road user's health and safety, some aggressive driving behaviours like rapid acceleration and rapid deceleration would also lead to the deterioration of fuel consumption (Faria et al., 2019), battery life (Darcovich et al., 2017;Liu et al., 2021) and battery performance (Sagaria et al., 2021).
In addition to the concern for driving safety in traditional traffic, as the era of autonomous vehicles approaches, road traffic will be a mix of both autonomous and non-autonomous vehicles (Woodman et al., 2019;Robinson et al., 2021). The challenge of creating a planning algorithm for mixed traffic while considering aggressive driving behaviour has not been fully investigated yet (Kala & Warwick 2013). More specifically, the unexpected aggressive cut-in behaviour of the human driver may not only have an impact on driving comfort but also on driving safety . It is therefore timely, to focus research on understanding and managing AD to enhance safety and comfort for future transportation. To conclude, AD is a threat to both current and future transportation comfort, safety, and efficiency. This systematic review will help with quantifying AD behaviour as there has been little work to synthesize this to date. At the same time, this review helps with identifying and evaluating the behavioural measurement and induction methods adopted in the AD study. Finally, the geographical location of the AD research will be reviewed so that the conclusion can be drawn in an accurate boundary and the area that requires further research can be highlighted.

Research gap in aggressive driving
Research on the causal factors of AD has been conducted on different topics. Several factors have been identified and associated with aggressive driving. Considering the driver as the research object, the internal factors include both trait anger and driving (situational) anger (Bogdan et al., 2016), sensation seeking, impulsiveness, boredom proneness (Dahlen et al., 2005), narcissism (Edwards et al., 2013) and dispositional aggressiveness (Krahé, 2005); the external factors include congestion (Emo et al., 2016;Li et al., 2020a,b), time pressure (Fitzpatrick et al., 2017), lead driver status (Stephens & Groeger, 2014), and presence of police (Stanojević et al., 2018).
However, when concerning how to measure the quantitative outcome of AD, there has been little justification for choosing the driving performance measure (DPM). In other words, why these DPMs were chosen to quantify the behavioural difference is still somewhat arbitrary (Ābele et al., 2020;Fitzpatrick et al., 2017;Zhang et al., 2016). In the Methodology section of these recent AD studies, the researchers did not provide sufficient justifications for selecting these DPMs. Even if the selection process is intuitive and obvious, the reason why the other similar DPMs were not chosen remains unclear. In addition, although there is no universal definition of driving performance when concerning comparability, it is reasonable to review what measure has been used for the reference of further research (Papantoniou et al., 2017). Hence, there is a need to further investigate the application of DPMs in the previous AD research, especially for what DPMs have been used and proved to be effective in identifying the difference between normal driving and aggressive driving.
Besides, the effect size of AD measures is not systematically quantified yet. With the hypothesis that there is a significant behavioural difference, the magnitude of this difference still needs further study. For example, some observational studies (Sarkar et al., 2000) and surveybased studies (Vanlaar et al., 2008;Stephens & Fitzharris, 2017) claims that AD leads to a higher speed and even exceed the speed limit. However, exactly how fast is the speed in AD when compared with the naturalistic condition is not known yet. The relatively smaller sample size in different primary research may influence the statistical power. With the meta-analysis , the previous finding in the individual studies can be summarized to improve the precision.
Finally, when research needs to induce situational anger in simulated driving, the induction method varies. Some studies simulate the obstruction caused by slow-leading vehicles (Li et al., 2020a,b), while others would require the participants to recall or imagine a certain irritating situation to induce a specific emotion (Zhang et al., 2016). To the best of the knowledge of the authors, the application of such induction methods in aggressive driving was rarely reviewed. Hence, there is a need to 1) study what kind of induction methods were used; 2) assess the effect of different induction methods and provide a reference for further research.

Research questions
Following the context in Section 1.3, this systematic review will focus on three questions in AD research: 1. What are the common driving performance measures employed in aggressive driving studies?
What measurements have been adopted in previous aggressive driving research? Which measure(s) have been proven to be effective in distinguishing the difference between neutral and aggressive driving? The rationale for choosing the specific DPMs is not explained well and even has no justification.
2. What are the behavioural differences between aggressive driving and normal driving?
It is not reasonable to define a bad or dangerous driver with only one DPM (Su et al., 2020). But we can still define the effect size of AD to provide a more intuitive and straightforward baseline for further comparison. The effect size with a significant difference could be considered as a quantitative reference to define AD. For different research purposes, which DPMs shall be chosen when considering sensitiveness and effectiveness? (p-value represents whether the difference is significantly different and effect size represents the magnitude of difference).
3. In the current driving context, which induction method would produce a valid subjective anger difference before-and-after the induction?
To study the behaviour in an aggressive state, the induction method is vital for the success of conducting the experiment. The "induction method" here is referred to as the method adopted to induce situational subjective anger in driver-in-the-loop research. Is there a significant effect on subjective anger before and after the induction?
To conclude, as AD research updates rapidly, it is timely to systematically review the performance of AD to integrate the evidence across separate research studies, confirm current practices to imitate AD, and identify and inform areas for future AD research (Munn et al., 2018).

Method
In this study, a systematic literature review is conducted to review the driving performance of AD. A meta-analysis is adopted to quantitively analysed the driving performance measures.

Search strategy
The relevant literature indexed in Scopus, IEEE Xplore digital library, PsycInfo and TRID (Transportation Research Integrated Database) was sourced and analysed. After the studies were selected from the databases, the reference list and the citation list of these studies were screened as backward and forward searches. The backward and forward searches were conducted in Scopus except for the theses, as Scopus did not index such types of documents. The backward and forward searches of the selected thesis were conducted in ProQuest. Based on the 3 research questions and scoping searches, the inclusion and exclusion criteria were determined as follows.

Inclusion criteria
Prospective research was required to meet the inclusion criteria in Table 1.

Exclusion criteria
To further focus on the research question, the exclusion criteria were set in Table 2.

Screening process
The research publications indexed in Scopus, IEEE, PsycInfo and TRID were searched by matching the key term subset and their title, abstract and keywords: (1) AND (aggressive OR aggression OR anger OR angry OR rage) (2) AND (driver OR driving) (3) AND (behavior OR behaviour OR safety OR performance OR task OR risk OR crash OR collision) (4) AND (simulator OR simulation OR simulat*) By searching the databases and conducting forward and backward searches, 2412 results were found. Following the Preferred Reporting Items for Systematic Reviews and meta-Analyses (PRISMA, Page et al, 2021), ZS screened the preliminary result as Fig. 1, and ME, RW, JS, examined the final screening result (the researchers are identified by their initials). Disagreement regarding inclusion and exclusion was solved through discussion or referring to a third investigator's opinions. The asterisk (*) is a wildcard character for the derivatives of "simulator" like simulating and simulated.
During the screening process, the authors used some specific features to help with the classification. At first, as the research focused on simulator research, the terms like "naturalistic driving study (NDS)" can be used to exclude those experiment tests in the real world. Secondly, although there is a lot of research focusing on the modelling of aggressive behaviour which may contain information on aggressive driving, it is hard to find a comparison between neutral and aggressive states. Hence, this research may not help with further analysis. Finally, the authors identified that the adjective "aggressive" is not a term just used in human-related studies but also used in the semiconductor industry and animal research. Hence, if the title or abstract contains irrelevant terms like "Field Programmable Gate Arrays (FPGAs)", "chips" or "animal", the studies were easily excluded.

Data extraction
The DPM data extracted from the 31 selected reports were organized in the data extraction table. Only the numerical data reported in the text The vehicle is a four-wheeled Passenger vehicle 3 Considering research ethics and comparability, only simulator studies were analysed 4 Population: age over 18 years old, no consumption of illegal drugs, no restriction on gender, non-professional driver 5 Intervention: Subjects in the experimental group should be induced to anger or aggressive driving during the experimental test. 6 Comparison: the study had to have at least 1 aggressive-control (within-or between-subject) experimental trial comparing the driver's behaviour 7 Outcome: the study had to report at least 1 driving performance measure of both the aggressive and control groups Table 2 Exclusion criteria for publication screening.
Order Criteria 1 Letters to the editor, editorial, focus, perspectives, commentary, and reviews. Only empirical studies were included 2 Research involving tricycles or two-wheelers 3 Studies where the driving performance data were not adequately reported (e.g. there is no mean ± standard deviation). 4 Research using only self-reported measures.
or table of the articles were extracted. The authors of this review attempted to contact the authors that report relevant DPMs but only reported the mean of the DPMs without the standard deviation. Unfortunately, no authors responded and no data was obtained directly from the study investigators. Referring to the Cochrane handbook for systematic reviews , the following information was extracted from each study: (1) basic information: title, author, year of publication, country, journal; (2) study methods: study design, induction method, the profession of the participant, number of participants, gender ratio, age, driving experience, driving scenario; (3) driving performance measures: mean speed, the standard deviation of speed, number of overall driving errors, etc.; (4) miscellaneous: research objectives, key finding, miscellaneous comments. All data are converted to the International System of Units (SI) and its derived units so that the comparison is consistent.

Quality assessment
Referring to various quality assessment documents, Kmet et al., (2004) provide two tools for appraising quantitative and qualitative research respectively. This research adopts the tool evaluating the quantitative analysis for the reason that its applicability matches the type of selected papers, and it has been adopted in several systematic reviews (e.g., Castellucci et al., 2020;Lindsay, 2017). This tool has 14 criteria for assessing the methodological quality and each item can be assigned 2 points (fully met), 1 point (partially met), 0 points (not met). Those items which are not applicable to the specific study are marked as "n/a" and excluded from the calculation of the summary score. Here, we used this tool to calculate a summary score for each paper by summing the total score obtained across relevant items and dividing by the total possible score (i.e.: 28 -(number of "n/a" x 2)). A higher score represents a higher overall methodological quality which may range from 0 to 1.0.

Data synthesis
Considering the variety of research designs in the selected studies, the analysis of the results is a combination of narrative synthesis and meta-analysis. The meta-analysis was conducted by using Review Manager Version 5.4 (The Cochrane Collaboration, London, UK.). If there is no special statement, the significance level is set as p < 0.05.

Narrative synthesis
For those studies that cannot perform a meta-analysis due to experimental design, a narrative synthesis will be provided which summarizes the relative trend of each driving performance. More specifically, the frequency with which these DPMs are used is counted and the common induction methods are reviewed. Finally, the geographical area where the research is conducted is also analysed.

meta-Analysis
The meta-analysis only included randomised control trials . The trials were required to include both male and female participants so that generalisation could be ensured. For this research, the DPMs to be analysed (mean speed and the number of overall errors between the control group and aggressive group) were considered continuous data. Since there are substantial differences in the driving scenarios and induction methods, a random effects model was used to synthesize different yet related intervention effects (Borenstein et al., 2010). The input data were collected from the selected studies and analysed by the DerSimonian and Laird method (random-effect model, DerSimonian & Laird, 1986;DerSimonian & Kacker, 2007) in Review Manager (Deeks and Higgins, 2010;Version 5.4, The Cochrane Collaboration, London, UK.). This method used the observed data (in this study, it was the number of participants, mean and standard deviation of DPMs in each study) to estimate the overall population effect and its confidence interval. To avoid the negative consequence caused by the selection of the model, the results of both the random-effect model and fixed-effect model are calculated and reported in Section 3.4. The statistical heterogeneity between studies was examined by the visual inspection of results and the inconsistency check via evaluating the I 2 (Higgins et al., 2003). If I 2 is larger than 50%, the heterogeneity between the selected trials needs to be further investigated. To present a more intuitive result, the size of the effect was selected as the mean difference.

Sensitivity analysis
To investigate the robustness, especially for the influence of the small-study effects and effect model (random or fixed), we perform a sensitivity analysis. The influence of an individual study is assessed by removing one study at a time and observing the change in the confidence interval. The influence of the effect model is assessed by observing the difference in results between the random-effect model and the fixedeffect model. In addition, the influence of using different summary statistics (mean difference, MD, and standardized mean difference, SMD) is also assessed by observing the change in the heterogeneity indicator I 2 and the p-value for total effect size. The result of the sensitivity analysis is reported at the end of the result of each meta-analysis. A forest plot was created using MATLAB (version 2020b, The MathWorks, Inc., Natick, Massachusetts, USA).

Publication bias assessment
Publication bias (also referred to as non-reporting bias, or selective reporting bias), is a bias that was caused by the underrepresentation of those measures that were not statistically significant . Publication bias could negatively affect the validity and generalization of conclusions (Lin & Chu, 2018). To mitigate publication bias from the source, several measures were undertaken. At first, we choose a varied source of databases that contains not only journal papers and conference papers but also grey literature to conduct our primary search (Dalton et al., 2016). Scopus includes 77.8 million records from journals, books and book series, conference proceedings, and trade publications (Elsevier, 2020). IEEE Xplore digital library covers over 5.7 million records from journals, conference papers, technical standards, and books (IEEE, 2022). PsycInfo provides over 5 million pieces of scholarly literature in the psychological, social, behavioral, and health sciences (American Psychological Association, 2022). The TRID Database contains more than 1.3 million records of references to books, technical reports, conference proceedings, and journal articles in the field of transportation research (Transportation Research Board, 2022). To further examine the potential publication bias, a funnel plot was used to visualise and assess the potential bias (Sterne et al., 2005;Light et al., 1984). If the distribution of individual studies is symmetry relative to the line representing the overall effect, the publication bias is considered as low. If there is an asymmetrical funnel plot, the publication bias needs to be suspected. The funnel plot is created by using MATLAB (version 2020b, The MathWorks, Inc., Natick, Massachusetts, USA).

Result
We present the results from the analysis of the 31 selected studies (34 experiments) in the form of a narrative synthesis and meta-analysis.

Selected studies
By applying the inclusion and exclusion criteria, 31 relevant studies (34 experiments) were selected, and the corresponding data were extracted. The study characteristic is summarized in Table 3.
As a rigorous meta-analysis cannot mix the heterogeneous experimental design (randomized controlled trials and non-randomized controlled trials), the conclusion for all the selected studies is summarized in narrative form . The synthesis result is organized according to the research questions.

What are the common DPMs employed in the aggressive driving study?
Firstly, amongst the driving performance measures that significantly differed between control and AD groups, the measures related to vehicle speed (e.g., mean speed and standard deviation of speed, n = 22), lateral control (e.g. the number of lane deviation and standard deviation of lateral position (SDLP), n = 17) and driving errors (e.g. violation of traffic rules and the number of collisions, n = 12) were reported most frequently with significant difference between aggressive driving and driving in the control group.
Other DPMs frequently reported were time-related measures (e.g., To investigate the impact of anger on attentional processing and its consequences on driving performance.
RCT + Crossover To explore the effects of anger and happiness on the driving behaviour of drivers who encounter a pedestriancrossing event on an unmarked road, which requires strategic and behavioural choices non-RCT + Crossover ( To explore aggressive driving behavior under the influence of the genre of music background.

Australia
Stephens (2008) EX2* (1)To identify how the level of impediment and lead driver behaviour influenced driver anger while driving, mood change while driving as well as driver behaviour.
(2)To identify whether the driver's heart rate differed according to anger ratings and/or level of impediment.  (Stephens, 2008), 5 experiments were reported and 4 experiments matched the selection criteria. EX = Experiment. response time, time to collisions, and time headway; n = 9) and risktaking behaviour (e.g., yellow light crossing, left-turn gap acceptance; n = 7) and measures related to vehicle control (e.g., steering wheel angle, brake position, and gas pedal position; n = 8).

In the current driving context, which induction method would produce a better result?
The induction method for the arousal of an aggressive state is also an important part of a successful simulator-based study as the main purpose is to investigate the difference between an aggressive state and a neutral state. In the selected studies, the methods to induce the driver into an aggressive state vary. Imagining themself in an aggressive state (e.g., recalling their previous aggressive experience, or watching video clips) is the most popular method (n = 20). Impediment and time pressure were used in n = 11 studies. The use of music (n = 3) and road status of the vehicle (e.g., ambulance, learner plate; n = 2) were also used to attempt to invoke aggressive behaviour. The honking of surrounding vehicles (n = 1) and aggressive text presented on roadside banners and billboards (n = 1) were also adopted to induce aggressive behaviour. Although all the studies reported a significant difference in DPMs, not all the induction methods induced a subjective anger mood. Fitzpatrick et al. (2017) found that, in a between-subject design, the different levels of time pressure did not induce a significant difference in subjective aggressiveness. For the rest of the induction methods, subjective anger was reported to significantly increase after the induction.

Quality assessment
By inspecting the 34 selected studies, all the selected studies were higher than the relatively liberal cut-point of 55% (Kmet et al., 2004). The final score and the SJR (SCImago Journal Rank) quartile are listed in Table 4. Based on the assessment result, all the selected studies are included for further analysis.

meta-analysis
The two DPMs, mean speed and the number of overall errors, were analysed as they were the two most popular behaviour measures in AD research. For mean speed and number of overall errors, 8 studies and 5 studies were included based on their research design (RCTs) and participants' gender distribution (have both genders), respectively. For lateral control measures, as they were reported with different units (e.g., the number of lane departures, the standard deviation of lateral position and steering wheel angle), synthesising them with meta-analysis is neither feasible nor meaningful.

Mean speed
The mean speed (km/h) under the different states is selected as the comparator here and the analysis result is presented in Table 5 and as a Forest plot in Fig. 2. The mean difference of the mean speed of these 8 RCTs shows a good overlap (Fig. 2) and the I 2 is <50%. Hence, the selected studies are considered homogeneous. The overall sample size is 639. The overall result also confirms that the aggressive drivers significantly drive faster than the normal driver with a mean value of 4.68 km/h (95% confidence interval [2.61, 6.75] km/h).
To ensure the robustness of the meta-analysis, a sensitivity analysis for the mean speed is performed. By removing the result of one study at a time or changing the effect model into the fixed-model or changing the summary statistics to standardized mean differences, the heterogeneity is always lower than 40%, and the p-value of the test for overall effect was always < 0.05. This result indicates that the heterogeneity is moderate or even might not be important. Hence, the result of this metaanalysis can be considered robust.
To evaluate the publication bias of this meta-analysis, the funnel plot of the selected studies was plotted in Fig. 3. The point in the funnel pot represents the individual studies in this meta-analysis. The vertical axis is representing the standard error and the horizontal axis is representing the effect size in km/h. For a meta-analysis without publication bias, the studies should symmetrically distribute around the line of overall effect and within the 95% confidence contour. By examining the symmetry distribution of the study and the 95% confidence region, the publication bias of this meta-analysis can be considered as low.

Number of overall errors
As the other most common DPMs in AD studies, the overall errors made in driving are reported in 5 studies as presented in Table 6. The overall errors here referred to all mistakes the driver made during driving traffic violations (e.g., collisions, violation of stop signs, exceeding speed limits and lane departures). In Fig. 4, the positive value of the x-axis represents how many more errors occur in aggressive driving behaviour than the driving errors that occur in the control group. In Fig. 4, the negative value of the x-axis represents that fewer driving errors occurred in aggressive driving when compared with the control group. As shown in the forest plot, Fig. 4, the mean difference of the overall errors of these 5 RCTs has a good overlap and the I 2 is far<30%. Hence, the selected studies are considered homogeneous. The overall sample size is 136. The overall mean also confirms that the drivers in an aggressive state tend to make 2.51 more errors than the driver in a natural state (95% confidence interval = [1.32, 3.71]).
Just like the sensitivity analysis for mean speed, by removing the result of one study at a time or changing the effect model into the fixedmodel or changing the summary statistics to standardized mean differences, the heterogeneity is always lower than 50%, and the p-value of the test for overall effect is far lower than 0.05. This result indicates that heterogeneity can be considered unimportant. Hence, the result of this meta-analysis can be considered robust.
A funnel plot of the selected studies was used to evaluate the  (Stephens, 2008), 5 experiments were reported and 4 experiments matched the selection criteria. EX = Experiment.  3.27, 7.37] Heterogeneity: Tau 2 = 2.08; Chi 2 = 9.36, df = 7 (P = 0.23); I 2 = 25% Test for overall effect: Z = 5.08 (P < 0.00001) Fig. 2. Forest plot of the mean difference of mean speed (error bars represent the 95% confidence interval). A positive value on the x-axis represents how much faster aggressive driving behaviour is than the driving behaviour of the control group; a negative value indicates a mean speed slower than the control group. publication bias of this meta-analysis (Fig. 5). By examining the symmetry distribution of the study and the 95% confidence region, a significant asymmetry of this funnel plot was suggested. As all the studies were distributed on the right side of the overall effect line, it is possible that those studies, which have a lower number of overall driving errors in aggressive driving, were underreported or not searched. Besides publication bias, poor methodological quality, true heterogeneity, artefactual and chance could lead to the asymmetry of the funnel plot .71] Heterogeneity: Tau 2 = 0.54; Chi 2 = 5.63, df = 4 (P = 0.23); I 2 = 29% Test for overall effect: Z = 4.12 (P < 0.0001) Fig. 4. Forest plot of meta-analysis of the number of overall errors (error bars represent the 95% confidence interval). A positive value on the x-axis represents how many more errors occur in aggressive driving behaviour than in the driving behaviour of the control group; a negative value indicates a lower number of driving errors in aggressive driving behaviour than in the control group.  (Sterne et al., 2011). Hence, although significant results were achieved in the meta-analysis, the interpretation should be cautious.

Discussion
To quantify aggressive driving behaviour and update our understanding of the latest aggressive driving studies, this systematic review screened 2412 pieces of relevant literature and selected 31 relevant research. The quality of the selected literature was examined, and the robustness of the meta-analysis result was checked by sensitivity analysis.

Main findings
Based on the result of narrative synthesis and meta-analysis, the three research questions set out in Section 1.4 can be answered as follows.
For research question 1, previously, research concerning AD choose the DPMs somewhat subjectively as the justification for selecting the specific measures was not explained in detail or even missing, and the effective DPMs that could tell the difference between AD and normal driving were not reviewed and synthesized. This review re-examined the DPMs employed in the previous studies reported in the latest published work and suggested the measures that could distinguish AD from the control group. In the latest empirical aggressive driving studies, especially for the studies based on the driving simulator, the speed-related DPMs (e.g. mean speed, the standard deviation of speed) are reported the most. In this regard, the choice of speed was considered to be the major behavioural difference. These speed-related DPMs have been proven to be effective in distinguishing the difference between natural and aggressive drivers as the difference is statistically significant.
For research question 2, based on the meta-analysis, the aggressive drivers tend to drive 5.32 km/h faster and make 2.51 more errors than the natural driver. This finding is consistent with the individual studies and general perception. Based on this finding, the quantification of AD would have two solid references: 1) the driver in AD would choose a slightly but significantly higher speed in the same scenario; 2) the driver in AD would make more mistakes. For the quantitative reference of AD, the lower limit of the confidence interval, 3.27 km/h and 1.32 times, could be considered as the threshold to distinguish AD from normal driving in a more accurate way. The relationship between speed and the probability of crashes is complex when considering the factors like time exposure and distance exposure (Pei et al., 2012). Although no consensus has been reached (Imprialou et al., 2016), some recent empirical study suggests a positive relationship between speed and the risk of a crash (Aarts & Van Schagen, 2006;Wang et al.,2018). Hence, the higher speed of AD could be considered a risk factor. As for the number of errors, the relationship would be much more intuitive as human errors (including violations) contribute to 93% of crashes (Khattak et al., 2021).
The reason why a driver would choose a higher speed and made more errors during aggressive driving could be explained by the frustrationaggression theory (Berkowitz, 1989). When the driver encountered frustrating events, like traffic congestion and slow leading vehicles, some drivers may choose to use aggressive behaviour as a response. They may adopt a higher speed to eliminate/escape from the frustration. Another hypothesis for this phenomenon is that, when irritated by the surrounding driver, the driver would be distracted from the current driving task and divert their cognitive resources to other secondary tasks like expressing anger verbally. The secondary task may lead to worse driving performance (Blanco et al., 2006). To mitigate these negative influences, music (Fakhrhosseini et al., 2014) and speech-based agents could be adopted (Jeon and Croschere, 2015).
For research question 3, before investigating AD, it is important to understand how to induce the driver into a situational aggressive state. However, there is little review for such a specific topic. Hence, this review would record the induction method used in recent research and its effect. Imagining an aggressive situation and impediments from leading vehicles (including the situation with time pressure) are common and effective ways to induce the driver into an aggressive state. Most subjective feedback from the participants indicates that the level of anger gets higher significantly and lasts throughout the experiment. Although one study reported no significant difference in the mean scores of the aggressiveness questionnaire before and after the induction, the rest of the induction method could be considered effective (Fitzpatrick et al., 2017). The potential reason for such a result could be the fact that it is a between-subject design where the baseline for different drivers may vary.
To conclude, we highlight several key points. For AD research, speedrelated measures (e.g., mean speed and standard deviation of speed) and time-related measures are the common DPMs adopted in the latest research. To induce an aggressive state, imaging an aggressive situation, and impediment (including time pressure) are the two major methods adopted in simulator research. Drivers in an aggressive state significantly drove faster than the drivers in a natural state and the lower limit of the mean difference of mean speed was 3.27 km/h, which can be considered as a quantitative cut-off point for distinguishing the aggressive and the normal. In previous research, "excessive speeding" was considered a feature of AD (Paleti et al., 2010). However, in terms of "excessive speeding", what are the actual difference between AD and normal driving is not clearly defined. Based on the current finding, the difference of 3.27 km/h can be considered as the reference to separate AD and normal driving.

Limitations
This study has some limitations that need to be acknowledged. Firstly, all the selected studies are simulator-based, which may lead to concerns about the validity when applying the result (Wynne et al., 2019). To study human driver's behaviour, naturalistic driving study (NDS, Campbell, 2012), field operational tests (FOT, Benmimoun et al., 2013), observation study (Walker et al., 2006), self-report study (Deffenbacher et al., 1994), driver interview  have been used. Each method would have its own benefits and drawbacks. The debate regarding the validity of driving simulators is not a new topic (Törnros, 1998). In relevant studies, the behaviour comparison between a driving simulator and an on-the-road test has been conducted (Bella, 2008;Mayhew et al., 2011;Meuleners & Fraser, 2015;Wynne et al., 2019). The result suggests that the driving simulator can reflect the driver's behaviour, with at least relative validity. In addition, as a controlled environment, the driving simulator could ensure repeatability (Iwata et al., 2021;Classen & Brooks, 2014) The potential ethical concerns in participants' safety may also prevent such kinds of experiments to be conducted in public traffic. Secondly, when comparing with the large driver population, the sample size of each study or even all the included studies is too small (less than one ten-thousandth). Thirdly, considering the methodological rigour, the non-RCTs were not included in the meta-analysis. However, if the overall risk of bias is assessed as moderate to low, synthesizing the findings from non-RCTs could provide further information. In addition, as mentioned in Section 4.1, the selection, calculation, and report of DPMs, are somewhat subjective due to the lack of justification on why specific DPMs were chosen. Hence, the conclusion of this study should be considered as the summary of previous practice rather than the gold standard for choosing specific DPM. The definition and calculation of DPMs could refer to the SAE Standard J2944 (SAE, 2015). Regarding the language bias, the current study only selected the studies that were reported in English due to the capability of the author, which may lead to selection bias. Regarding the publication bias of the total number of driving errors identified in Fig. 5, although no significant heterogeneity was found, the conclusion that aggressive driving would involve more driving errors should be made with caution. Finally, as shown in Table 3, the included research was all from developed or developing areas, which influence the generalization of this study in the underrepresented less-developed areas. The cultural impact can play a significant role in aggressive driving (Sârbescu et al., 2014). Hence, when interpreting this result, it should be noticed that the lessdeveloped areas are undersampled and the generalization shall be limited to the developing and developed areas.

Conclusion
This paper first discusses the knowledge gap in AD research, with a focus on driving performance measures adopted in AD research. This is followed by summarizing and synthesising the empirical findings in the selected studies. Based on this review, we have found that the speedrelated measures (e.g., mean speed and standard deviation of speed) are the most common measure adopted in AD research and the mean speed in aggressive driving is significantly higher than the speed in the control group, which makes the speed as a useful measure in studying the AD behaviour. However, on other hand, the other measures like frequent acceleration/deceleration and unpolite gestures were not reported as speed-related measures. As AD is a complex phenomenon coming with multidimensional performance (e.g., vehicular parameters, body posture, and even physiological signal variation), the other measures require more investigation. Besides, to the best knowledge of the authors, this study provides the only meta-analysis for quantitatively synthesizing the AD behaviour research in the last decade. In addition, the induction methods adopted in recent studies were reviewed and imaging an irritative scenario was the most frequent method with effect. However, several limitations were also identified in Section 4.2. To fully understand AD behaviours, more empirical studies are needed so that the characteristics of AD can be identified.
Continuing from this study, future research is needed to address several areas. Firstly, aggressive behaviour is still a vague definition in terms of quantification. To provide a better understanding of AD, more similar quantitative research that investigates the other measures are needed. Besides the speed and errors mentioned in this study, the acceleration and gap acceptance could be further examined. In addition, future studies should justify their choice of DPMs. As the driving behaviour of human drivers can be measured at different levels with different measures, a clear justification would help to reduce subjective bias and ensure reproducibility. Another important area is the application of induction methods, as the time pressure in a between-subject design did not induce a significant difference in subjective anger. This may imply that either the between-subject design or the time pressure could arouse the subject's emotions in experimental conditions. From the practical side, potentially, this 3.27 km/h could be considered as a reference point for the detection of aggressive drivers, which also provides more information for the policymaker and the academia. The potential impact on the industry of this research includes the design of driver state monitoring systems.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.