Abstract

In the context of countries in the so-called Global South, where passenger railway services are either nonexistent or poorly performed, discrete choice models are useful to identify the attributes that affect users’ choices and provide insights on their behaviour in regional long-distance trips. Several theories and models have been proposed to understand travel behaviour for effective strategical decision in the transport field. The well-knownRandom Utility Maximization (RUM) approach has been widely used for such purposes, while the Random Regret Minimization (RRM) approach has been recently explored in the literature. However, the magnitude in the difference of levels of the attributes, or the stimulus perception, may affect the results of such models and biases the estimations. Therefore, this paper aims to assess the stimulus perception in mode choice to compare conventional rail (CR) and high-speed rail (HSR) services for passenger transport in intercity trips in Brazil. Estimations of RUM and RRM models were performed with a dataset from a stated preference survey comparing two railway technologies (CR and HSR) with other modes of transport (car, bus, and airplanes) for long-distance trips in the Southeast region of Brazil. Findings provide useful insights about the impacts of travel costs, travel times, and frequency of services, as well as sociodemographic characteristics of users. From the modelling outputs, it was found that users are affected by the magnitude of travel costs, time, and frequency only in business trips by HSR in the Brazilian context.

1. Introduction

Despite one of the main aspects of sustainable development goals (SDGs) in transport, modal shift in liberal democracies depend on the attitudes and behaviour of potential users towards railways. This is even more pronounced in the so-called Global South where passenger railway services are either nonexisting or used for the lack of alternative choices despite their quality. In the literature, several theories and models have been proposed to understand travel behaviour for more effective strategical decision. Discrete choice models have been used to identify the attributes that affect users’ choices and provide insights on their behaviour in regional long-distance trips.

The random utility maximization (RUM) approach has been dominant in the field, while several other theories and approaches have been proposed aiming at better explaining travel behaviour. These include the use of machine learning techniques [1], and discrete choice modelling in the random regret minimization (RRM) approach and decision field theory [2, 3].

The random regret minimization models have been proposed for several applications within the transport literature, for instance, road safety [4], freight transport [57], traffic allocation and route decision [2, 810], demand for recreational activities [11], traffic calming schemes [12], and passenger mode choice [13]. Please refer to Chorus et al. [14] and Jing et al. [15] for further references and applications.

Despite the extensive discussions to date, such noncompensatory theories still require investigation. For example, it is argued that the stimulus perception, i.e., the magnitude in the difference of levels of the attributes, may affect the estimations of RRM models and biases the results [16], thus requiring further investigations in different contexts. Therefore, we look for evidence on how the stimulus perception affects mode choice decision between different railway technologies in long-distance trips in the Global South.

This paper assesses the stimulus perception in mode choice modelling under the random regret minimization approach to compare conventional rail (CR) services of the average speeds of 150 kph and high-speed rail (HSR) services of up to 300 kph for passenger transport. We estimated conventional discrete choice models based on the RUM and RRM approaches, and regret-based models that consider the effects of the magnitude of the levels of attributes proposed by Jang et al. [16]. Estimations were performed with a dataset from a stated preference survey comprising the two railway technologies (CR and HSR) and other modes of transport (car, bus, and airplanes) in long-distance regional trips in the Southeast region of Brazil.

The contribution of this paper is twofold. First, we compare and discuss the formulation available in the literature that accounts for the stimulus perception in discrete choice models from the perspective of the RRM approach. Second, we compare the effects of the magnitude in the levels of the attributes of different railway technologies that promote distinguished services to users in long-distance regional trips in the Global South context.

The outline of the paper is as follows: Section 2 presents the literature on the RUM and RRM formulations, and applications in mode choice modelling. Section 3 describes the method, including the dataset and the proposed models, and Section 4 reports the results of the estimations and discussions. Finally, Section 5 discusses the findings of the study, and provides conclusions and rows for further research.

2. Modelling Framework

The random utility maximization (RUM) approach accounts a utility for individual in relation to alternative . It comprises a deterministic component defined by the combination of the attributes of each alternative available to the individual () weighted by their respective parameter , and a stochastic term as follows:

The probability of an alternative being chosen collapses to the multinomial logit (MNL) model assuming that the error terms are independent and identically distributed (IID) and follow an extreme value (EV) Type I distribution [17] as follows:

Alternatively, discrete choice models estimated on the basis of the regret theory were firstly addressed by Chorus et al. [18] and Chorus [19], where individuals make binary comparisons between the attributes of the chosen and the nonchosen alternatives to minimize regret. The formulation of the regret function proposed by Chorus [19] is as follows:where refers the sum of differences between the considered and the competing alternatives, is the parameter related to the attribute, and and are the levels of attribute for alternatives and , respectively.

Assuming that the regret is composed by stochastic terms that follow IID EV Type I distributions, MNL models are also used to estimate choice probabilities as follows:

Previous research shows that RRM models are sensitive to the choice set composition and to the compensatory effect in the decision-making [2022].

van Cranenburgh et al. [23] extended the classical RRM model [19] to the RRM model by including the scale of regret parameter as follows:

The CRRM and Pure-RRM (P-RRM) models are special cases of the RRM when and , respectively. Choice behaviour is equally represented by the RUM when [23].

Despite the extensive literature with applications of the RRM in the past decade, Jang et al. [16] argued that the predictive success of regret-based models would be enhanced by incorporating different perceptions in the differences between the attributes depending on their magnitude and range. Therefore, a nonlinear representation was proposed to include the assumption that the perception of stimuli (attributes) is proportional to their absolute values. The formulation of the general nonlinear representation assumes that individuals perceive the differences in the attributes based on a generalized interpretation of Weber’s law [24, 25] that collapses to the “paired logarithmic based on the generalized Weber’s law” (PLGW) model when the logarithmic specification is applied:where indicates the perceived stimuli changes. If from the point of view of statistical significance, the regret model expressed in equation (6) becomes the original formulations proposed by Chorus [19]; otherwise, the regret is generated by Weber’s law [25] in some level defined by the value of , which comprises the perception of the attribute proportional to its absolute value.

Huang et al. [26] proposed an extension of the assuming the regret function of in the so-called-Weber model (hereby ) to explore the empirical performance of Weber’s law:

If , then the model collapses to the RUM, and if for a given attribute, it collapses to the .

Few applications of the are found in the transport literature. Jang et al. [16] studied commuting daily travel choices and consumers’ choices of shopping centre. The attribute differences when judging regret was analysed by Jang et al. [27] using two datasets with mixed regret-rejoice models, motivating further exploration of its application in more complex choice contexts. They used a revealed preference data set on mode choice behaviour in the Noord Brabant area of the Netherlands. Therefore, further analysis of the stimulus perception into regret-based models should be conducted in different contexts, as for regional long-distance mode choice in the Global South countries.

3. Methods

3.1. Data

The dataset was obtained from a stated preference (SP) survey comprising a choice set with the most common modes used for intercity trips in Brazil. Private vehicles (CAR) are usually used by single individuals mostly for business purposes or groups of people for nonbusiness activities. On the other hand, trips by coaches (BUS) are usually overnight services and airplanes (AIR) are mainly used for trips between the largest cities. In the experiment, we considered two railway alternatives that are not operated in the country, conventional (CR) and high-speed (HSR) rail. For the surveys, CR and HSR were never available simultaneously on a given scenario due to the differences between them. Conventional rail has lower fares and higher travel times, and its punctuality is less reliable than HSR.

The survey focused on four states (São Paulo, Espírito Santo, Rio de Janeiro, and Minas Gerais) that make up the Southeast Region of Brazil (see Figure 1). The region is home to approximately 44% of the Brazilian population distributed by density per city according to Figure 2. Note that most of the population concentrates mainly in the coast next to the capitals of the states of Rio de Janeiro, and São Paulo.

Scenarios simulating real-world choice situations were limited to long-distance trips where railways are more competitive [28, 29]. In the studied context, it meant cities with more than 200,000 inhabitants situated between 100 km and 1,000 km from each other.

3.2. Experiment Specification

The experiment was developed to handle the combinations of alternatives and their respective attributes and levels. Figure 3 shows the 14 origins and destinations with largest population in the states of Southeast Brazil chosen to define the scenarios considering that all of them have airport infrastructure.

Beyond the geographic delimitations, the survey was divided by trip purpose (business and nonbusiness) based on the experience of the respondent about a journey in the past. Depending on the location of the residence and the destination, the respondent faced one out of four scenarios of modal choice given the following combinations: SC1, with CAR, BUS, and CR; SC2, with CAR, BUS, CR, and AIR; SC3, with CAR, BUS, and HSR; or SC4, with CAR, BUS, HSR, and AIR (see Figure 4).

Scenarios one (SC1) and three (SC3) did not include trips by airplane to enable extending the results of the models presented in this paper to the regional connections in which cities do not operate commercial flights, despite the fact that all the cities considered in the survey had such services available to users.

Each alternative included the most important attributes that could affect the mode choices. The travel time, petrol, and toll costs were used to represent the trips by CAR (, , and , respectively), while the remaining th alternatives (BUS, CR, HSR, and AIR) were described by travel time, fare, and frequency (, , and ). Three levels were assigned to each attribute (high, medium, and low). Travel times were defined by the average speeds of each model as shown in Table 1, while frequency was set in intervals of 12 hours (low), 6 hours (medium), and 3 hours (high) except for CAR.

Travel costs were defined per alternative: from data of the Brazilian National Petrol Agency [31]; based on the average toll price per kilometer in Southeast Brazil; from regulated prices by the Brazilian Land Transport Agency [32]; and according to the benchmark studies carried out in Europe [33, 34] because intercity passenger railway transport does not exist in Brazil; and finally, based on prices regulated by the Brazilian National Aviation Agency [35]. All the costs were calculated according to the distance between origins and destinations as summarized in Table 2. For CAR, the levels of cost were calculated by the average value of petrol in Brazilian currency per kilometer (BRL/km) multiplied by the average distance between cities in kilometers (km) and divided by the average fuel consumption of a vehicle (km/liters) plus the distance multiplied by the average toll cost in Brazilian currency per kilometer (BRL/km). The costs of the remaining alternatives (BUS, CR, HSR, and AIR) were calculated by the average distance between cities multiplied by the respective values of fare in Brazilian currency per kilometer (BRL/km). For reference, 1 US Dollar (USD) was equivalent to 5 Brazilian Reais (BRL) at the time of the survey (2013).

The attributes varied according to an orthogonal fractional factorial design involving 27 profiles for each combination of available alternatives (SC1, SC2, SC3, and SC4). The choice sets were further grouped into three blocks such that each respondent answered nine choice tasks. Therefore, 216 observations would be obtained in one replication of the experiment comprising the four combinations of available alternatives (SC1, SC2, SC3, and SC4) per trip purpose (business and nonbusiness).

The questionnaire flow consisted of an explanation of the purposes of the survey and its privacy terms, and a set of cities randomly chosen from Figure 3 shown over a map where respondents were requested to choose their origin (Figure 5(a)). The survey terminated if none of the available options matches the respondent’s living location, otherwise the survey followed to another map exemplified in Figure 5(b), in which a destination should have been selected among other set of cities randomly chosen to which he/she had already traveled to. For that trip, the respondent had to declare the chosen transportation mode, purpose, and perceived travel times (access and egress). If the respondent had never traveled to any of the available destinations, the survey was also terminated.

The following page of the survey showed the overall characteristics of the alternatives: availability and travel time reliability of CAR and BUS; and check-in requirements, operating times over the day, and reliability for CR or HSR and AIR. Next, the respondent was faced with the scenarios to choose the preferred mode to travel between the chosen origin and destination as exemplified in Figure 6. Finally, the respondents were asked to share their sociodemographic information (age, employment status, average household income and number of members, and driver license ownership).

An online pilot involving 37 respondents was carried out between September 2, 2013 and September 20, 2013, to assess the effectiveness of the questionnaire. The final version of the survey was carried out online from October 24, 2013 to November 5, 2013, with respondents selected from a database of a specialized survey company. Care was taken to identify nonprofessional respondents living in one of the states of the Southeast region in Brazil that had already traveled at least once from the origin to the destination shown in the hypothetical scenarios. Moreover, at the beginning of the survey, the respondent was advised about the contents and the risks of the survey, and agreed to participate in the experiment. The scenarios were randomized at the beginning of the survey and 580 respondents completed the questionnaire in its final version, resulting in 5,220 observations (mode choices) that were used in the models presented in this paper.

Table 3 details the sociodemographic distribution of respondents. The number of household members concentrates between two and four, and the overall income concentrates between BRL 1,500 and BRL 3,500. The choice set combination (SC1, SC2, SC3, and SC4) were also balanced among the participants, with proportional distribution per trip purpose in each combination. Please refer to Figure 3 for the location of the cities in the Brazilian Southeast region.

Table 4 shows the percentage of tasks answered per scenario of CR and HSR with and without AIR per trip purpose (business or nonbusiness, noted as B and NB, respectively).

3.3. Models

We estimated several utility and regret-based multinomial logit (MNL) models (RUM, , , , , , and ) for business and nonbusiness purposes considering the attributes of each alternative and the sociodemographic attributes to assess the differences between the choice behaviour approaches and the stimulus perception in the attributes (time, cost, and frequency) between the CR and the HSR technologies in long-distance trips in Brazil.

For the model with the PLGW component (see equation (8)), we estimated one model with generic and . We estimated RMU models with generic and specific scale parameters for the attributes processed by the RRM approach, and fixed scale parameters of the attributes processed by the RUM approach equal to 1 ().

The models were estimated for each railway technology (HSR and CR) and trip purpose (business, , and nonbusiness, ), resulting in 28 models as for the 7 specifications per technology and per trip purpose. The final models were specified using the likelihood ratio test with specific and generic parameters for each attribute based on equations (1), (3), and (57), except for the parameters referring to the stimulus perception () and the scale of regret parameter () which were set generic for cost, time, and frequency for all alternatives. The parameters were estimated by likelihood maximization using Apollo package [36] implemented in R [37]. The railway alternatives are the reference for the specification of the sociodemographic parameters.

4. Results

4.1. Goodness-of-Fit

The results of the estimates are presented in the following. The final loglikelihood, likelihood ratio (LR) test, adjusted , Akaike information criterion (AIC) and Bayesian information criterion (BIC) of the models estimated for the CR and HSR for business and nonbusiness trips are shown in Table 5. Note that the results of the LR test is greater than the statistic with 95% confidence level and degrees of freedom for all the models, i.e., they explain the choices better than the models without parameters.

The estimations for the business trips by CR resulted in maximum absolute difference between the RUM and the remaining models equals 32.63 points of loglikelihood. The RRM approach does not improve the performance of the models in any case, although the difference between the RUM and the RRM is only 3.11 points of loglikelihood. The PLW and PLGW also underperform compared to the RUM with minimum and maximum differences of 3.75 (PLGW) and 32.63 (CPLW) points of final loglikelihood, respectively. The values of the adjusted , AIC, and BIC follow the same pattern in the models.

The estimates of the HSR experiment for business trips show that all the models outperformed the RUM, and the PLGW best performed compared to the others, with respective minimum and maximum differences of 6.09 (RRM) and 17.0 (RUM) points of final loglikelihood with two additional parameters. The random regret models (RRM) provided a good performance compared to the PLGW, with maximum difference of 6.09 points of loglikelihood compared to the RRM. The CPLW, CPLGW, and PLW resulted in similar performance, whereas the minimum difference is 9.54 points of loglikelihood comparing the CPLGW and the PLGW. The adjusted , AIC, and BIC also follow the results of the final loglikelihood.

The measures of performance of the models for nonbusiness trips by CR show that the RUM also outperformed the other models as in the case of the business trips, with minimum and maximum differences of 8.46 (CRRM) and 41.29 (CPLW) points of loglikelihood, respectively. All the PLW and PLGW models underperformed compared to both RUM and RRM models. Finally, the results for the HSR experiments show that the models have similar performance, except for the PLW models. The RUM model best performs compared to the others, with minimum and maximum differences of 1.46 (PLGW) and 40.46 (CPLW) points of final loglikelihood. The RRM models have similar performance and maximum difference of final loglikelihood equals 2.06 (CRRM) compared to the RUM. The PLGW also have similar results and maximum difference of 1.73 (CPLGW) points of final loglikelihood compared to the RUM. The results for the adjusted , AIC, and BIC are also similar to the results of the final loglikelihood in both CR and HSR experiments.

4.2. Parameters
4.2.1. Business Trips

Tables 69 present the estimates of parameters for business trips by CR and HSR. The alternative specific constants (ASCs) are positive for CAR and BUS in most of the models (except for CAR in the RRM and the PLGW of the HSR, though it is not significant at 95% confidence level in the PLGW). Conversely, they are negative for AIR, showing that either CR or HSR is, in general, preferable to AIR and less preferred than CAR and BUS. Parameters related to travel time, cost, and frequency are negative in all the models as expected and most of them are significant at 95% confidence level for the CRRM, PLW, RRM, and PLW in the experiments of CR and HSR. Exceptions are and in the CRRM, in the PLW, and in the RRM, , and in the PLW for the CR experiment, and in the PLGW and in the PLGW for the HSR experiment. The parameters referring to age and income are mostly significant at 95% confidence level and vary across the models and alternatives. However, in general, older people with higher income are more likely to choose cars instead of the other alternatives in business trips. It should be noted that we do not intend to find the best model over the combinations of attributes per alternative but provide insights on the values and significance of the attributes.

More importantly, the estimations of and provided interesting results. For the , all the parameters are positive and significant at 95% confidence level (except by the RRM and PLW models in the CR experiment), whereas for the HSR experiment, the values are higher than 1.00 (minimum of 1.325 and maximum of 2.183), which means that individuals address equal importance to gains and losses in the level of service attributes.

In the case of for the CR experiments, the values are all negative against the expectations because the range of should be between 0 and 1; however, they are not significant at 95% confidence level. For the HSR experiment, the results show positive and significant values to such parameters. It means that stimulus perception affects mode choice in high-speed rail services and not in conventional rail. In both cases the parameter values are around 0.3, indicating individuals perceive different degrees of regret of each alternative [16].

4.2.2. Nonbusiness Trips

Tables 1013 present the values of the estimated parameters for nonbusiness trips by CR and HSR. In such models, the ASCs are also positive for CAR and BUS, and negative for AIR (except the negative value for CAR in the PLGW of the CR, though it is not significant at 95% confidence level), showing that CR and HSR are more likely to be chosen than AIR, but less preferred than CAR and BUS in nonbusiness long-distance trips.

Parameters related to travel time, cost, and frequency are negative in all the models, except by the and of the PLGW and the of the PLGW in the CR experiment which were not significant. The estimations that are not significant at 95% confidence level refer to the and of the PLW and the PLW, and the for the CR experiment; for the HSR experiment, the , of the PLGW and the μPLGW, the of the CRRM, PLGW, μRRM, μPLW, and μPLGW, and the of the PLGW, μPLW, and μPLGW are not significant at 95% level of confidence.

The sociodemographic attributes are mostly negative and significant at 95% confidence level for age and income, except for the age parameters of AIR, which are positive and suggest that older people are more likely to travel by airplanes instead of railways in nonbusiness trips.

The estimations of and also provide interesting results. For the , all the parameters are positive and significant at 95% confidence level. For the CR experiment, the estimations are approximately equal to 1.00 (minimum of 0.882 and maximum of 1.045) showing that individuals, in general, address equal importance to gains and losses in travel times. The HSR experiments resulted in values of lower than 1.00 (minimum of 0.390 and maximum of 0.913) such that individuals impose more regret in these cases, especially in the PLW.

In the case of for the CR experiments, the values are all positive, significant at 95% confidence level and greater than one; however, these values are against the expectations because they should vary between 0 and 1. Although we did not impose constraints in the estimation procedures to guarantee the correct range for this parameter, these models should not be used to estimate the choice probabilities because they would provide biased results. For the HSR experiment, the results are positive but not significant. It means that stimulus perception does not affect mode choice in CR and HSR services in the context of nonbusiness trips.

4.3. Discussion

Our research adds a Global South perspective to the analyses of new railway services. Several studies have found the relevant attributes that affect the rail mode choice in different markets [38]. Koppelman and Wen [39] concluded that travel time is the most important mode choice determinant by car, train, and air transport serving the connection between Toronto and Montreal. The HSR choice between Madrid and Barcelona is essentially affected by fare, travel time, frequency, and trip purpose [40], while fare is the most important mode choice attribute in the Seoul–Daejon market [41]. In Chile, a hypothetical HSR service between Santiago and Concepción is mostly affected by fare, travel time, and service delay [42].

In this paper, we found models where travel time, cost, and frequency are attributes that affect the mode choice when railway services are supposedly available to Brazilian travelers, for business and nonbusiness purposes in both high-speed or conventional rail markets. In addition, our findings show that elderly users performing business trips are, in general, less likely to choose rail options over their alternatives. In nonbusiness trips, older users show different behaviour, preferring air travel over rail counterparts in some cases. Moreover, individuals with higher income are, in general, less likely to choose alternatives to driving when on business trips. Yet, they are more willing to choose railways in nonbusiness trips.

The estimations suggest that travelers are more sensitive to travel time in HSR than CR for business and nonbusiness trips. On the other hand, lower frequency (i.e., greater interval between trains) affects the choices for HSR more than CR either for business or nonbusiness trips. Cost attributes depend on the model and shows that individuals are more sensible to costs of private vehicle and fare of bus and air transport when HSR is available for nonbusiness trips. However, the estimations show that travelers are equally sensible to costs when CR is available either for business or nonbusiness trips. Such findings are in line with the evidence found in the aforementioned research.

5. Conclusions

This paper investigates the stimulus perception to variations in the levels of the attributes of different railway services (conventional rail and high-speed rail) compared to other alternatives (bus, car, and air transport) in long-distance trips in Brazil. We provide evidence on the potential effects of the differences between the travel times of several alternatives for passenger transport between the largest cities in Southeast Brazil using data of a stated preference survey.

Our research contributes to mode choice literature by providing an overview of the effects of attributes and sociodemographic variables using different travel behaviour theories and models. This can offer valuable insights into the discussion on how the differences in the weighting of the attributes between alternatives may affect mode choice.

When comparing random utility maximization with extensions of the random regret minimization approaches, our findings show that the RUM models best perform in most of the cases. However, the differences in their final loglikelihood are not high (varying from 17.0 in the HSR experiment for business trips to 41.29 in the CR experiment for nonbusiness purpose). Regardless of these differences, it should be noted that the main goal of this research was to assess the stimulus perception of users considering both railway technologies rather than finding the best model for predictions. In this sense, we point out that users are affected by the magnitude of travel costs, time, and frequency in business trips by HSR in the Brazilian context.

This research could be further extended in several directions. More research should look into the estimation of models considering other behavioural theories such as the decision field [43], quantum choice models [44], Bayesian belief networks [45], and also with the use of artificial intelligence techniques. Finally, such models may be used in the appraisal of new railway lines in countries of Global South with similar sociodemographic characteristics to shape future networks comprising different services in terms of speed, frequency, and vehicle layout. This would require further investigations on costs and benefits of conventional and high-speed railway systems and implementing network design models through optimization and simulation.

Despite the insights provided by the data and models about new railway services in Southeast Brazil, the research has a few shortcomings that are worth to be addressed. First, the dataset refers to a survey carried out in 2013 and may reflect a few differences over the choices for long-distance trips in Brazil; however, new railway services for such purposes have not been implemented in the time span between the survey and the current application of the data, and the overall sociodemographic conditions of the population did not change significantly over the past few years. Moreover, the other models comprising different approaches other than discrete choice (e.g., by means of artificial intelligence techniques such as neural networks) would be used.

Finally, we argue that the conclusions brought by the dataset and models presented herein are valid to collaborate for policy making in the long term for the Brazilian railway market. For example, measures of performance, such as value of travel time by different railway technologies would be useful in project analysis.

Data Availability

The information utilized in this study refers to responses from a Stated Preference Survey and is not publicly available due to privacy policies that applied during the time of collection.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the São Paulo Research Foundation (FAPESP) (grant no.# 2019/07428-2).