A dataset of Chinese drivers’ driving behaviors and socio-cultural factors related to driving

Given the high fatality rate due to road traffic accidents in China, understanding the factors influencing aggressive driving behaviors among Chinese drivers is essential to alleviate the problem. The paper describes a dataset of 1039 Chinese drivers’ driving behaviors and the socio-cultural factors associated with the behaviors. The dataset was collected through an online survey. The dataset comprises five main categories: 1) driving information, 2) aggressive driving behaviors, 3) friend/peer influence, 4) family influence, and 5) socio-demographic information. The dataset is valuable for public health and transportation researchers to explore factors influencing drivers’ driving behaviors and public safety in China. The dataset's construct validity was confirmed by the Bayesian Mindsponge Framework (BMF) analytics. Specifically, the analysis shows that safe driving behaviors are affected by information promoting safe driving that is passively and actively absorbed from friends/peers (friends/peers being role models and friends’/peers’ support, respectively). The result is consistent with the Mindsponge Theory's information-processing mechanism in human minds.

Type of data • Survey data How the data were acquired Initially, online interviews were conducted with 54 participants to identify their aggressive driving behaviors and understand the reasons behind those behaviors.
All the interviews were conducted through WeChat from April 10 to 25, 2022. Then, convenient sampling was employed for survey collection. The survey was designed on the WeChat mini-app "Survey Star." The group owners distributed the link to the survey in four WeChat public groups from May 1 to 5, 2022. It is important to note that all group members were insured by the company owned by the group owners. These WeChat public groups were selected to ensure all the respondents own a vehicle. Each group has 500 members at its maximum, which culminated in a total of 20 0 0 drivers. In total, 1039 drivers completed the online survey. A response rate of 51.95% was obtained. Data format Raw, Analyzed and/or Filtered Description of data collection The dataset consists of responses from 1039 Chinese respondents. Most respondents were males, accounting for 61.50%. The educational levels of the respondents were distributed relatively equally across the middle school and below (17.81%), high school or vocational college (29.45%), undergraduate or associate college (28.49%), master's degree (20.60%), and doctoral degree (3.66%). Most respondents had a monthly income of 30 0 0 to 150 0 0 RMB (69.78%). People in the age group of 25-40 occupied the highest percentage of the respondents. Around 90% of the respondents sometimes/often/always drove their vehicles for work. Most respondents had 1-3 years of holding a driving license (39.27%). Meanwhile, 17.81% of the respondents had a driving license for less than one year, and 6.16% had a driving license for more than ten years. Data source location • Country: China Data accessibility The dataset (saved and encoded in a CSV file) and its detailed description (saved in an XLSX file) are deposited together at: https://osf.io/stcj2/ (DOI: 10.17605/OSF.IO/STCJ2 )

Value of the Data
• The dataset offers the resources to study the influences of socio-cultural elements on Chinese drivers' aggressive driving behaviors. • Researchers in the public health and transportation fields can employ the current dataset to generate insightful results for reducing traffic accidents in China. • Socio-cultural researchers can employ the dataset to explore the influences of socio-cultural factors on Chinese driving behaviors. • Making the data open helps reduce the costs of conducting public health and transportation research on road traffic accidents, increases transparency and integrity, and supports reproducibility in the future.

Objective
According to the World Health Organization (WHO), road traffic accident was the 8 th leading cause of death in 2019, with 17.38 deaths per 10 0,0 0 0 population [1] . Besides the health costs, road traffic incidents also incurred an accumulative economic loss of over 352 billion yuan from 1996 to 2015 [2] . Risky and aggressive driving behavior is one of the reasons behind the high fatality of road injury in China [2] . Thus, studying the factors influencing drivers' aggressive driving behaviors is imperative. Many studies have been conducted to reveal the predictors of aggressive driving behaviors, such as drivers' socio-demographic attributes, personality, seasonal variations, the period of day (peak or off-peak), the type of highway, and characteristics of the built environment [3][4][5][6][7][8][9] . However, the knowledge of socio-cultural influences on these behaviors remains limited. The current dataset offers public health and transportation researchers the resources to discover the influences of socio-cultural factors on Chinese drivers' aggressive driving behaviors. Also, making the data open helps reduce the costs of conducting public health and transportation research on road traffic accidents, increases transparency and integrity, and supports reproducibility in the future [ 10 , 11 ].

Data sample
The dataset consists of responses from 1039 Chinese respondents. Most respondents were males, accounting for 61.50%. The educational levels of the respondents were distributed relatively equally across the middle school and below (17.81%), high school or vocational college (29.45%), undergraduate or associate college (28.49%), master's degree (20.60%), and doctoral degree (3.66%). Most respondents had a monthly income of 30 0 0 to 150 0 0 RMB (69.78%). People in the age group of 25-40 occupied the highest percentage of the respondents, with 65.54%, whereas only 2.60% of the respondents belonged to the age group of 50 + .
Around 90% of the respondents sometimes/often/always drove their vehicles for work. Most of the respondents had 1-3 years of holding a driving license (39.27%). Meanwhile, 17.81% of the respondents had a driving license for less than one year, and 6.16% had a driving license for more than ten years.

Response coding
The current section presents how the responses of five major categories were coded according to the following order: 1) Driving information, 2) Aggressive driving behaviors, 3) Friend/peer influence, 4) Family influence, and 5) Socio-demographic information.
Two main types of responses are categorical (including binary variables) and numerical variables. In the following subsections, we describe categorical variables using seven kinds of information corresponding with seven columns: "Variable," "Name," "Explanation," "Level," "Code," "Frequency," and "Proportion." For numerical variables, the last three columns are replaced with "Range," "Mean," and "Standard deviation."

Driving information
The dataset's first category comprises five variables (three categorical and two numerical variables). These variables were generated from questions about drivers' driving experience, skill, and confidence. They are coded as A1 to A5 ( Table 1 ).

Aggressive driving behaviors
The second category focuses on the drivers' aggressive driving behaviors (see Table 2 ). The category consists of seven numerical variables, coded as B1 to B7 . The questions used to generate these variables were generated by referring to the Aggressive Driving Behavior Scale and modification based on the former interview with 54 Chinese drivers.

Friend/peer influence
The third category focuses on the influence of drivers' peers/friends on their driving behaviors (see Table 3 ). This category has nine variables, reflecting peers'/friends' support for safe driving (variables C1 -C6 ), peers/friends being role models (variables C7 and C8 ), and drivers' care for their peers'/friends' safety when being in the car (variable C9 ).

Family influence
Family influence on drivers' driving behaviors can be studied using variables in the fourth category (see Table 4 ). This category has 12 variables, reflecting how family members influence the drivers' driving behaviors.

Socio-demographic information
The socio-demographic information of drivers is recorded by variables in the fifth category (see Table 5 ). It includes the drivers' self-identified gender ( E1 ), educational level ( E2 ), monthly income ( E3 ), and age ( E4 ).

Survey design and collection procedure
The survey was systematically designed with five major steps: (1) Pilot interviews and literature review, (2) questionnaire design, (3) survey collection, (4) dataset generation, and (5) data analysis.
Initially, online interviews were conducted with 54 participants to identify their aggressive driving behaviors and understand the reasons behind those behaviors. Specifically, the interviewees were asked how their peers/friends and parents influenced their driving behaviors. All the interviews were conducted through WeChat from April 10 to 25, 2022. These participants were members of a WeChat public group of drivers who had purchased insurance from the same insurance provider that established the group. Each interview lasted approximately 15-30 minutes. The interview results were later used to formulate questions in the questionnaire.
In addition, the Aggressive Driving Behavior Scale was employed to measure the drivers' selfrated aggressive driving behavior. However, based on the interviews' results, some of the questions in the scale were modified and localized to make them more appropriate to the Chinese context. In particular, the original Aggressive Driving Behavior Scale is an 11-item questionnaire with a 6-point Likert scale (1 -never, 2 -almost never, 3 -sometimes, 4 -fairly often, 5very often, 6 -always) [12] . The localized version used a 5-point Likert scale (from 1 -highly disagree to 5 -highly agree), condensed 11 items to 7 items, and employed a positive tune to lower participants' resistance to questions that might sound offensive and face-losing in Chinese culture.
Then, convenient sampling was employed for survey collection. The survey was designed on the WeChat mini-app "Survey Star." The group owners distributed the link to the survey in four WeChat public groups from May 1 to 5, 2022. It is important to note that all group members were insured by the company owned by the group owners. These WeChat public groups were selected to ensure all the respondents own a vehicle. Each group has 500 members at its maximum, which culminated in a total of 20 0 0 drivers. In total, 1039 drivers completed the online survey. A response rate of 51.95% was obtained. All participants received informed consent before they started filling in the questionnaire. All the samples are saved and encoded in a CSV file for later analysis. The detailed description of the dataset is shown in an XLSX file which is deposited together with the dataset at: https://osf.io/stcj2/

Dataset Validation
We employed the Bayesian Mindsponge Framework (BMF) analytics to check the construct validity of the dataset [ 13 , 14 ]. BMF analytics utilizes the Mindsponge Theory for theoretical reasoning and Bayesian inference for statistical analysis [15][16][17] . Specifically, we conducted the Bayesian analysis to test the hypothesized relations based on the information-processing mechanism of Mindsponge Theory. The Mindsponge Theory has effectively explained various complex socio-psychological phenomena [ 18 , 19 ]. Thus, the dataset is deemed valid if the findings generated using the current dataset are consistent with the Mindsponge Theory.
In the information-processing mechanism of the Mindsponge Theory, information is considered as the foundation on which physical reality is constructed, so the social interactions can be viewed as the information-exchange processes among minds (or information collection-cumprocessors). The information-processing mechanism is obliged to the set theory logic, so a psychological or behavioral phenomenon can be measured based on the existence and density of the information within a conceptual set (i.e., mind, environment, social interactions). For example, C7 and C8 variables help measure the accessibility and the accessible information density of a respondent to safe driving information from their friends, who are regarded as information sources.
One of the fundamental assumptions of the Mindsponge Theory is that the human mind tends to be influenced by the information absorbed from trusted external sources. To test this assumption, we examined whether safe driving behaviors are affected by information promoting safe driving that is absorbed from friends. To measure the safe driving behaviors of the respondents, we created the composite variable SafeDriving from averaging variables B1 to B7 . The internal reliability of these seven variables is acceptable, with a Cronbach's alpha being 0.943.
Friends are selected as external information sources because respondents tend to trust people they consider friends. There are two ways information promoting safe driving can be absorbed from friends: active and passive absorption. Active absorption is measured by questions in the sub-category "Friends' role model," which reflects the degree the respondents observe the safe driving behaviors of their friends. FriendRoleModel is the composite variable generated by averaging variables C7 and C8 . The internal reliability of these two variables is acceptable, with a Cronbach's alpha being 0.826 [20] .
Meanwhile, passive absorption is measured by questions in the sub-category "Friends' support," which reflects the degree to which the respondents were supported to drive safely by friends. FriendsSupport is the composite variable generated by averaging variables C1 to C6 . The internal reliability of these six variables is acceptable, with a Cronbach's alpha being 0.933.
In general, we tested the following model: The probability around μ is determined by the form of the normal distribution, whose width is specified by the standard deviation σ . μ i indicates the respondent i 's degree of safe driving behavior; F riendRoleModel i indicates respondent i 's degree of absorbing information promoting safe driving from friends actively; F riendsSupport i indicates respondent i 's degree of absorbing information promoting safe driving from friends passively. Model 1 has four parameters: the coefficients, β F riendRoleModel and β F riendsSupport , the intercept, β 0 , and the standard deviation of the "noise", σ . The coefficients of the predictor variables are distributed as a normal distribution around the mean denoted M and with the standard deviation denoted S. All the estimated results of Model 1 are shown in Table 6 . The effective sample size ( n_eff) is larger than 10 0 0, and the shrink factor ( Rhat ) is equal to 1 in all cases of parameters. These statistics suggest that Model 1's Markov chains are well-convergent. Visually, the Markov chains shown in the trace plots also fluctuate around a central equilibrium, also confirming the convergence of Model 1 (see Fig. 1 ). As the Markov chains are convergent, the estimated results are qualified for interpretation.
According to the estimated results, FriendRoleModel and FriendSupport are positive predictors of SafeDriving ( M Frie ndRo leMo del = 0.29 and S Frie ndRo leMo del = 0.02; M Frie ndSu pport = 0.67 and S Frie ndSu pport = 0.02). The posterior distributions with the Highest Posterior Distribution Intervals (HPDI) at 89% of all the parameters are shown in Fig. 2 . In all cases, the HPDI is entirely located on the positive side of the x -axis (or > 0), indicating that the positive predictions of FriendRoleModel and FriendSupport are highly reliable.

Ethics Statements
This study was approved by the Institutional Review Board (IRB) of the China University of Political Science and Law on March 18, 2022. The IRB ensures that all ethical standards have been met and that all participants' rights, welfare, and well-being have been fully considered and protected. All collected data have been anonymized and comply with relevant ethical standards and data protection regulations.

Declaration of Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
A dataset of Chinese drivers ' driving behaviors and socio-cultural factors related to driving (Original data) (OSF).