Dataset on commuting patterns and mode-switching behavior under prospective policy scenarios for public transport

This paper covers a broadly used methodology used in travel behavior research aiming at determining individual and alternative-specific variables that influence the choice of the transportation mode for commuting trips. Data used in the analysis were obtained in July 2015 by means of a computer-assisted telephonic interview survey conducted in Cluj Metropolitan Area, Romania. The survey collected a wide range of day-by-day travel patterns, socioeconomic data, and attitudes and perceptions toward urban transportation services. Given the lack of studies from emerging, post-socialist countries, the survey assigned a section dedicated to an alternative ticketing policy for public transport services in order to evaluate the willingness of commuters to switch to a more sustainable transportation through non-coercive interventions. A revealed preference – stated preference modelling methodology was adopted in order to reveal the role of socioeconomic characteristics, along with features of transport supply and built environment in explaining commuting patterns and forecast sustainable modal splits. Both the survey and the methodology are scalable and flexible to be used, adapted, and applied in a wide range of transport policies regarding modal shifting strategies.


Data
The dataset described in this paper contains socio-economic and demographic characteristics, travel behavior, associated spatial features, and attitudinal indicators. Data was collected using a stratified sample representing the daily, morning rush-hour commuters from the Metropolitan Area of Cluj, Romania (Fig. 1). Several studies were previously conducted [1,2] with regards to data collection from individuals for the assessment of sustainable urban mobility in post-socialist urban areas [3], but data was scarce, and studies were exploratory. Toșa et al. [4] conducted a comprehensive study by revealing the generational differences and their demographic, socioeconomic, and attitudinal characteristics in quantitatively explaining commuting patterns within Cluj Metropolitan Area, Romania. To those were added elements of transport supply and built environment that contribute to the refinement of the choice process. The associated data and the methodology will be described in the following. The cleaned and processed dataset is part from a total of 1079 respondents who participated in the data collection process, and it comprises of a quota of 544 individuals (50.42%).
The socio-economic and demographic section in the questionnaire collected information related to the respondent, such as on gender, family size, age, education level and occupation, marital status, driver's license, household type, and income and. Table 1 integrates these characteristics by their associated levels and shows the percentage distribution. For each variable, the sum of the elements sample sizes equals the final quota of the respondents, i.e. 544 individuals.
The commuting behavior section of the questionnaire addressed questions to track the transport modes used for commuting and their corresponding weekly frequency. The transport modes included Specifications Table   Subject Social Sciences Specific subject area Transportation, Travel Behavior Analysis Type of data Value of the Data Data contains commuter level information on current and prospective travel behavior, socioeconomic characteristics, derived spatial features, and attitudinal data for transport services in emerging urban areas. Stakeholders with interest in transport policies can be provided with important insights on public response to noncoercive actions with regards to sustainable transportation in urban areas.
The empirical study could serve as a reliable framework for testing transportation policies, using the same or different modal attributes of interest. The data brings additional value in explaining travel behavior in emerging urban areas in the context of post-communist countries from Central and Eastern Europe.
both motorized modes, such as public transport, car, motorcycle or taxi, and non-motorized modes, such as bicycle, and walking. These travel modes were selected for the questionnaire in order to emphasize their impact on traffic congestion and level of service. Therefore, modes such as motorcycle, car, and taxi represent personal motorized modes, and walking and bicycle represent non-motorized modes. The frequency data was processed in order to obtain the representative transport mode used for commuting, i.e. the mode with the highest weekly frequency. Therefore, the merged modes considered were named (1) non-motorized, (2) Private motorized, and (3) Public transport. The modal split data is revealed in Table 2. As in Table 1, for each variable, the sum of the elements sample sizes equals the final quota of the respondents. Information on the place of residence and work was requested, but this data was considered sensitive, and therefore is not shared within the dataset. Nevertheless, the associated spatial features that include distances, consider the location of respondent origin and destination within the study area, correlated with the relative position of downtown area, and the bus station locations. Accordingly, population density was extracted from the geographic information system (GIS) model, as well as the accessibility of public transport stations to respondent's home and workplace and were shared in the dataset. This information was synthesized in Table 3. For each variable, the sum of the elements sample sizes equals the final quota of the respondents.
Attitudinal questions were selected from the questionnaire to capture opinions related to the adequacy of the public transport network, car dependency, and issues related to traffic congestion within metropolitan area (see Table 4).

Experimental design, materials, and methods
We employed a combined estimation method between Revealed Preferences (RP) and Stated Preferences (SP). While RP-data models describe real-life choices and represent actual travel behavior, SP-data-based choice experiments set hypothetical alternatives and record individuals' preferences [5]. The adopted methodology uses disaggregated data on respondents' travel behavior and related individual data and provides rich behavioral predictions [6e9]. This modelling methodology has been proven to be highly effective in determining the role of selected variables in the selection process and identifying the effects of new policy interventions within transportation sector [10,11].
When modeling RP and SP as random utility models with discrete choices, the utility associated with each transport mode can be expressed as an additive function between regressor vectors describing characteristics X i of an individual i, and characteristics Y ij and Y ik , of the transport alternatives j or k, and characteristics of the specific effect of the SP experiment (Z ik ), scaled with respect to corresponding parameter vectors a i , b ij , b ik and g ik , respectively. The RP and SP models could be jointly estimated and subsequently maximizing the log-likelihood of the following function: represent the marginal probabilities of the selection of the j or k transport mode, in RP, and SP model respectively. The unknown parameters within the three mode-choice models (RP, SP, and the combined RPeSP) were estimated by using the GAUSS econometric software (version 3.2.32).
The commuting data employed in this paper were gathered from a cross-sectional survey conducted in the CNMA in July 2015 by means of a computer assisted telephone interview (CATI). The SP section was based on the experimental design intended to test the respondent's likelihood to change the current commuting habits over the introduction of the alternative ticketing policy. The alternatives are characterized by a set of relevant attributes and must offer clear and simple choices to the respondent [12,13]. The attributes of the proposed policy included a monthly pass consisting of (1) type, with 2 assigned levels, (2) price, with 3 assigned levels, and (3) incentive bonus, with 3 assigned levels. The public transport types of monthly tickets considered were the two-line and the all-line passes. For each type of monthly pass, 3 levels of pricing were considered, a low, medium, and high costs. The bonus values were considered as percentage of the monthly cost, and set to 2, 5 and 15% points. A full factorial design was employed to generate 18 cases of preference, as a combination of the attributes, as seen in Table 5. Out of the total 18 scenarios for the transport policy alternative, each respondent was presented a single case to accept or reject during the interview.
The combined estimation of RP and SP models reveals the values and significance levels of certain coefficients and helps identify the role of certain variables in the choice process [14]. This study assesses differences between generations and their role in tailoring travel demand in emerging urban areas. In this way, several models can be customized by specific socio-economic and demographic features, in order to reveal idiosyncrasies between groups of interest.

Acknowledgments
The author acknowledges financial support for data collection from the KAKENHI Grant-in-Aid for Scientific Research (number 2604707 and 15F14707) received from the Japan Society for the Promotion of Science (JSPS).