Origin-Destination-BasedTravel TimeReliability underDifferent Rainfall Intensities: An Investigation Using Open-Source Data

Origin-destination(O-D-) based travel time reliability (TTR) is fundamental to next-generation navigation tools aiming to provide both travel time and reliability information. While previous works are mostly focused on route-based TTR and use either ad hoc data or simulation in the analyses, this study uses open-source Uber Movement and Weather Underground data to systematically analyze the impact of rainfall intensity on O-D-based travel time reliability. )e authors classified three years of travel time data in downtown Boston into one hundred origin-destination pairs and integrated them with the weather data (rain). A lognormal mixture model was applied to fit travel time distributions and calculate the buffer index.)emedian, trimmedmean, interquartile range, and one-way analysis of variance were used for quantification of the characteristics. )e study found some results that tended to agree with the previous findings in the literature, such that, in general, rain reduces the O-D-based travel time reliability, and some seemed to be unique and worthy of discussion: firstly, although in general the reduction in travel time reliability gets larger as the intensity of rainfall increases, it appears that the change is more significant when rainfall intensity changes from light to moderate but becomes fairly marginal when it changes from normal to light or from moderate to extremely intensive; secondly, regardless of normal or rainy weather, the O-D-based travel time reliability and its consistency in different O-D pairs with similar average travel time always tend to improve along with the increase of average travel time. In addition to the technical findings, this study also contributes to the state of the art by promoting the application of real-world and publicly available data in TTR analyses.


Introduction
Travel time reliability (TTR) plays a vital role in various applications such as evaluation of network performance [1], measuring the improvement of traffic operations and management strategies [2], quantification of service quality [3], enhancing the experience of traveler's route choice [4], and determining freeway bottlenecks [5].
Among the route level (microscopic), origin-destination (O-D) level (mesoscopic), and network level (macroscopic) studies, the route level TTR analyses have received much more attention in the past. Besides the demand from the practical side, route-based data which are relatively easier to obtain should be another reason. For instance, using the data from California State Route 91, one research found that traveler's route choice was more delicate to TTR than travel time [6]. Chepuri et al. assessed the performance of various TTR measures with bus route data collected in Chennai, India. ey recommended using 95th percentile travel and buffer time as reliability indicators for bus routes [7]. Some recent route-based studies can be found in [8][9][10][11][12]. Because the focus of route-based studies is usually on one or a few specific routes, data were usually project-specific and most were discarded upon completion of a project, which makes continuous investigation difficult and sometimes impossible.
It is not uncommon that under certain circumstances route-based and O-D-based analyses may get similar or the same results because a route is associated with at least one origin and one destination and thus can be viewed as a particular case of an O-D-based study. Comparison studies are limited, though. In [13], the authors concluded that there were no significant differences between O-D-based and route-based estimates in most part of the studied time periods. In [14], the researchers found that adding an alternative path tends to decrease the O-D-based TTR. Network level TTR studies are mostly simulation-based in a lack of real-world data. Some notable studies include but are not limited to the work of [15][16][17][18]. Studies based on traffic simulation are sometimes subject to serious errors caused by the underlying problems of the simulation model. A detailed discussion of simulation-based approaches is beyond the scope of the study.
Many factors, such as connected vehicles, traffic incidents, weather, work zones, special events, types of traffic control, and the dynamics of traffic flow, have impacts on TTR. Accordingly, the study of the impact of these factors has become one of the prominent topics in the TTR field [19]. For instance, in the literature [20], the authors attempted to quantify the contribution of various features on TTR and found that demand-capacity imbalance and accidents are the two factors that most affect TTR. In [21], researchers uncovered that deployment of connected vehicles improves TTR in the work zone environment, and higher benefits come along with higher market penetration levels. Additionally, the impact of rain on speed and travel time and the route level TTR have been well studied, and some results are conducive. For example, studies have discovered that speed reduction could vary from 10% to 25% in general rainy days [22] and an average increase of travel time by 11% might be expected in peak hours under the impact of a certain level of precipitation [23]. Adverse weather exacerbates TTR, especially during peak periods [24]. However, some findings are controversial and need further investigation, especially when it relates to TTR. Chien and Kolluri found that TTR would diminish when weather condition changes from dry to rain, as indicated by an expansion of 16% in the buffer index [10], while in another study [25], the authors suspected that rain and snow might have caused lower standard deviation and coefficient of variation of travel time and thus increased TTR. While it is understandable that different studies may produce controversial results, ad hoc data might have played a role. In a review of the literature, we found that studies are limited with respect to the impact of rain on the O-D level TTR; moreover, most of the data used in previous studies were project-specific and only covered a short period of time that was not even sound for a full-scale statistical analysis.

Research Objectives.
e availability of publicly available open-source data in recent years has made a detailed investigation of O-D-based TTR possible. A major thrust of this study is to use Uber Movement data and Underground Weather data to systematically analyze the impact of rain on O-D-based TTR. Uber and Underground Weather data provide an ideal and probably the only opportunity for applying real-world data in such studies because Uber data are O-D-specific and cover a lot longer time span while Underground Weather data provide very detailed weather data. A significant contribution of using publicly available data is that the results can be easily verified and compared to those that use ad hoc data or computer-aided simulation, studies based on real-world data always have a better value in the literature. Additionally, O-D-based TTR is fundamental to next-generation-navigation tools that are aimed at providing both travel time and reliability information. is paper only focuses on the impact of rain, but there are a lot more deserving further investigations along this line, such as the impact of other weather events and the combined effects of weather and work zones.
In this study, the authors investigated the impact of rain at various levels of intensity on O-D-based TTR, through the analyses of three-year travel time and weather data and a hundred O-D pairs collected from downtown Boston. A general lognormal mixture model was adopted to fit distributions and calculate the buffer index values. While a portion of the results was in proper alignment with previous studies, some turned out to be unique. e rest of the paper is organized as follows: Section 2 introduces the data used in this research, which includes the O-D-based travel time data from Uber Movement and historical weather data collected from the Weather Underground website; Section 3 depicts the typical TTR measures and the analytical approach developed based on the Gaussian mixture model; Section 4 presents the results, and Section 5 summarizes the findings and conclusions and concludes the paper by discussions and future research.

Uber Movement and Weather Data.
e O-D-based travel time data used in this research are from Boston, the United States, retrieved from the Uber Movement website (https://movement.uber.com). e website provides detailed information on average travel time (ATT), classified by fivetime intervals during a day, including early morning (00-07 h), AM peak (07-10 h), midday (10-16 h), PM peak (16-19 h), and evening (19-24 h). To make the results statistically sound, three-year data were used, which span from 1/1/2016 to 12/31/2018, and a hundred O-D pairs were selected. Uber already classified ATT ranges by 5-minute intervals. Considering that lots of data seem to be missing in the dataset with the ATTrange of 25 minutes and beyond, we selected five categories in the analysis: (a) 0-5 mins, (b) 5-10 mins, (c) 10-15 mins, (d) 15-20 mins, and (e) 20-25 mins. Figure 1 shows all the origin and destination nodes included in this study. Table 1 depicts the O-D pairs classified into five groups based on the ATT. Note that travel time in the table is directional and one-way (e.g., 2-16 indicates node 2 to node 16) because of the limited availability of two-way travel time in-between the O-D pairs. Twenty origin-destinations were selected in each ATT category. Table 2 presents descriptive statistics for ATT data from Uber, including mean and standard deviation (SD) in different periods. e studied area is in level terrain. e corresponding weather data in the subject area were collected from the Underground Weather data at https:// www.wunderground.com/history. In this study, weather conditions with no precipitation, such as clear, cloudy, or overcast, are classified as normal weather. Meanwhile, fog or haze conditions were excluded so that the focus could be placed on the impact of rain. Rain condition was defined as the rainy weather that caused effective precipitations.
In alignment with the 24-hour travel time data, the sum of rainfall in each matched period (like 00-07 h) was calculated and converted into a 24-hour value. Table 3 summarizes the definition of the data used in this research.

Data Screening.
e travel time data have already been preprocessed and filtered by Uber before uploading to the Internet. In general, the data were well prepared, and the work N 0 1000m Origin or destination zone Origin or destination ID 1 Figure 1: e origin and destination nodes.  for outlier removal was rather simple. ere are a couple of null cells without any data within the ATT range of 20-25 mins, such as from Lincoln Road (ID: 1) to Harborwark (ID: 26); these cells were removed. Additionally, Boston has considerable snowfalls in winter months, which may cause a negative impact for a longer period of time even after the snow. According to the analysis in [26], the impact of snow on travel time was associated with the severity of snow and road conditions, and it usually takes at least six hours after the snow for travel time to become stabilized. In this study, the data recorded one-day after regular snow (≤ 5.0 mm/24 h) and two-day after heavy snow (> 5.0 mm/24 h) were excluded. As a result, around 540,000 valid records, including nearly 68,000 light rain records, 18,000 moderate rain records, 11,000 heavy rain records, and 13,000 extreme rain records, were included in the study.

Measures of Travel Time Reliability.
In addition to the conventional measures (mean, standard deviation, and coefficient of variation, e.g.), there are some other TTR measures, such as travel time variability (TTV), planning travel time index (PI), and buffer index (BI). Among these measures, BI has been widely utilized in existing literature [7,8,13,27,28] and as concluded in [29] has a high consistency with the coefficient of variation and thus is best suitable for the measure of TTR. e authors adopted the idea and took BI as the primary TTR measure. en, we used interquartile range (IQR), the median, and trimmed mean of BI-values, as well as the analysis of variance for BI variation ratio, to quantify the impact from rainfall intensity. e buffer index can be generally formulated as follows: where T p is the percentile travel time and T c is the contrastive travel time (e.g., mean travel time, median travel time, and free-flow travel time).
Obviously, the higher the BI-value is, the less reliable the travel time will be. In this study, the 95th percentile travel time T p and the mean travel time T c were adopted as the BI parameters. e interquartile range is the distance between the 75th and 25th percentiles, and the trimmed mean excludes the 5% highest and the 5% lowest data for reducing the error caused by the extreme data.

Lognormal Mixture Model.
With respect to the calculation of the percentile value in BI, in earlier literature [30], the author directly calculated the percentile value according to the available data without considering the statistical regression, which was easily subject to statistical errors (e.g., regression to the mean). In later studies, various regression methods were applied, such as multiple linear regression [20], and continuous probability distribution functions, such as Weibull distribution [31], lognormal distribution [32], and generalized Pareto distribution [33]. Multiple regression was found to be more suitable for the multiparameter impact study, while for the single factor (e.g., rainfall intensity) analysis, the latter seems to be more desirable. However, the complexity of the problem makes it difficult for the real data to fit well with traditional prior distribution, such as the lognormal distribution. Recent studies attempted to use multistate models, such as the Gaussian mixture model, lognormal mixture model, and gamma mixture model for better results [8,13,14,16,17,[34][35][36]. Among these methods, the lognormal mixture model (LMM) is outperformed and was recommended by many researchers [8,34,36]. When LMM is applied, the best fitting usually occurs at a low K-value (e.g., K � 2 or 3), which may also help improve computational efficiency. Accordingly, LMM was selected for this research.
LMM is essentially a linear combination of multiple lognormal distributions with a weight sum value of 1. e general formula of LMM is as follows: where t is the travel time; ω k , μ k , σ k are the weight, mean, and standard deviation of the k th lognormal distribution, respectively; and L is the lognormal probability density. e equation is subject to K k�1 ω k � 1.

Expectation-Maximization Algorithm.
Since mixture models (like LMM) involve latent variables, maximum likelihood estimate (MLE) cannot be used directly to estimate the parameters. Presently, the expectation-maximization (EM) algorithm is the most commonly used approach for multimodal parameter estimates, where an expectation (E) step calculates the expected log-likelihood by estimating the current parameters, and a maximization (M) step  maximizes the expectation of the log-likelihood in the E step. Algorithm 1 depicts the complete process of the EM algorithm.

Supplement Algorithms.
Before the application of LMM and EM, two issues need to be addressed: the optimal K-value and the inverse function of the cumulative distribution function (CDF) of LMM. e former can be resolved by referring to the method in [34], where the K-value was determined by the minimum Akaike information criterion (AIC) estimation with the null hypothesis not rejected by the one-sample Kolmogorov-Smirnov (K-S) test. AIC is defined as where C is the number of parameters and Li is the likelihood function.
For the second issue, since there is no corresponding original form of the CDF of LMM, it is impossible to obtain the percentile value by solving the inverse function of the original function. For this reason, the bisection method was adopted, with a stop threshold of 0.00001. e complete TTR estimation framework is presented in Figure 2. We summarized all calculations into five groups according to the TTR range, that is, 0-5 mins, 5-10 mins, 10-15 mins, 15-20 mins, and 20-25 mins, including three location measures (median, trimmed mean, and interquartile range) and the one-way analysis of variance. MATLAB was used to run the s, and the final buffer index is presented in the form of the average value calculated after 50 fittings. Figure 3 shows an example fitting of LMM from Harrison Ave (ID: 11) to Huntington Ave (ID: 10) in the ATT range of 0-5 mins, which depicts a higher TTR under light rain and lower TTR under the rest of the rainfall intensity. Moreover, the impact of the O-D-based TTR increased with the increase of rainfall intensity (the variation of BI-value from 0.3332 to 0.4379). Overall, it shows that rain reduced the O-D-based TTR (see in Figure 3(f )).

Increasing TTR Reduction Impact by Rain.
More analyses were conducted for further investigation. Figure 4 and Table 4 (median of BI-values) interpret the results in terms of the median value, which shows that rain reduced the O-D-based TTR in each ATT range. Additionally, four out of five subfigures in Figure 4 demonstrate an increasing trend in TTR reduction when the rainfall intensity increases, with only one exception when being within the ATT range of 20-25 mins (BI-value decrease under heavy rain (0.3489) compared to that under moderate rain (0.3588)). e global mean values of the median under different rainfall intensity (the 8th row in Table 4) also revealed the trend.
Likewise, the trimmed mean of BI-values in Table 4 shows that rain has adverse effects on O-D-based TTR and the increasing reduction effect in terms of the global mean (the 14th row). In terms of time ranges, three out of five presented the increasing adverse effect in the range of 0-5 mins, 10-15 mins, and 20-25 mins.
Regarding the O-D-based results (see 8th column in Table 5), thirty-five out of a hundred O-Ds strictly met the regularity (an average of seven O-Ds in each range). Meanwhile, ninety-one out of a hundred O-D pairs (see 7th column in Table 5) show that the rain reduces the O-D-based TTR.
By far, not surprisingly, a dominating feature is that rain reduces O-D-based TTR, which aligns properly with people's perception as well as previous research at route level [10,22,23,37,38]. Notably, the low-probability anomalies (positive effect of rain), though not sufficient to negate the conclusion, may be a combination of multiple factors in a real environment, for example, a combination of rain, accidents, and work zones. is counterintuitive phenomenon will be discussed in a subsequent subsection specifically. More importantly, the results reveal that the negative impact grows with the increase in rainfall intensity. is trend seems doubtful, partly due to the exceptions in Table 4 and partly due to the fact that solely 35 percent of O-D pairs strictly conform to this trend. e reasons are generally twofold: (1) the interference of the complicated environment; (2) more seriously, the quite insignificant difference in impact between moderate, heavy, and extreme rain (this characteristic will be additionally discussed in subsequent subsection), resulting in extra challenging to achieve O-D pairs with satisfying the trend. Nonetheless, the current consequences can still expose the trend effectively.

Significant Impact from Light Rain to Moderate Rain.
A unique finding was revealed from the analysis; that is, the impact is a lot more significant when rainfall intensity changes from light to moderate, while other changes among the rainfall intensity categories seemed to cause only moderate impacts on the O-D-based TTR.
Comparing the variation ratio values in the parentheses in Table 4, we found that the average increase from light to moderate is up to 17.2% (median) and 14.6% (trimmed mean). On the contrary, the average BI variation ratio between the rest of the conditions is not significant, which is 2.7% (normal to light), 1.1% (moderate to heavy), and 5.8% (heavy to extreme), respectively, and 3.3%, 3.6%, and 4.0% in average trimmed mean values, respectively. e one-way analysis of variance (ANOVA) was used to demonstrate the statistical significance of the BI variation ratio between different rainfall intensity categories. Considering the trimmed mean in the analysis, we trimmed 10% values for the test as well. e analysis was conducted for a significance level of 0.05. As presented in Table 6, the null hypothesis is rejected with the P value of 3.5874e-16, far less than 0.05 in the 2-column source, which indicates significant   variation between the light and the moderate rain condition. Notwithstanding, the null hypothesis is true when testing significance is in the 3-column source (P value > 0.05), which supports the previous analysis that there is no significant difference between moderate rain, heavy rain, and extreme rain. e finding reveals that drivers are more sensitive to the change from light rain to moderate rain. While further investigation on driver behavior may be needed to fully explain this phenomenon fully, this finding is undoubtedly helpful in conducting more detailed and in-depth O-Dbased TTR analysis.

Other Findings regarding O-D-Based TTR.
Another notable finding from this research is that the O-D-based TTR tends to improve when ATT is longer, regardless of the normal and the rainy weather. It was explicitly recognized from Table 4 (column 3 to column 8) that four out of six columns in the median and five out of six columns in the trimmed mean both illustrate that the O-D-based TTR increased along with the increase of average travel time.
Further investigation was conducted against the O-D pairs with the same ATT. e authors calculated the interquartile range of each travel time range, as illustrated in Table 7 and Figure 4 (the areas of the rectangle). It can be found that the areas of the rectangle are shrinking from Figures 4(a)-4(e). Referring to the columns in Table 7, two out of six columns follow this trend. For the rest of the columns, there is only one exception in each column (e.g., the last cell in column 3). e results demonstrate that the consistency of the O-D-based TTR in different O-D pairs with similar ATT range tends to improve as ATT gets longer. Authors speculate that this may be attributable in large part to travel time fluctuations which have a decreasing effect on the longer ATT. For instance, the one-minute fluctuation exerts a greater influence on the O-D pair with a five-minute ATT than that with a ten-minute ATT. In practice, knowing this trend may significantly improve the accuracy of TTR prediction. Table 5, a so-called counterintuitive phenomenon (e.g., [10,25]) was also found in this study, which is the O-D-based TTR which was improved under the rainy weather (the global probability is 16%). Although the negative effect of the rain on TTR is still the dominating conclusion considering the low probability of the positive effect, this phenomenon remains an issue to be investigated. According to [20], rain likely affects the effects of  Step 1:

A Counterintuitive Phenomenon. Based on the statistics in
Initialize K-value and LMM.

Conclusions
is research uses open-source data to study the effects of varying rainfall intensity on O-D-based travel time reliability. e intensity covers light rain (0 and 10.0 mm/24 h), moderate rain (10.0 and 24.9 mm/24 h), heavy rain (25 and 50.0 mm/ 24 h), and extreme rain (>50.0 mm/24 h). An orithm based on the lognormal mixture model was adopted for analyzing the probability distribution functions of the O-D-based travel time data. en the buffer index, the three location measures, and the one-way analysis of variance were used for detailed analysis.
In general, rain lowers O-D-based travel time reliability, and the negative impact grows with the increase in rainfall intensity. With respect to the abnormal phenomenon mentioned in [25], it is restricted in low probability from massive results in the study, which cannot be a general conclusion but deserve investigation profoundly. e study also confirmed the existence of the so-called counterintuitive phenomenon mentioned in previous work [25] that, in some cases, TTR may be enhanced in rainy weather. Particularly, we discovered that O-D-based TTR was more sensitive when rainfall intensity changes from light to moderate but less notable when changes are among other categories such as no rain to light rain and moderate rain to heavy rain. e study also demonstrates that the O-D-based TTR in different O-D pairs with a similar ATT range tends to improve as ATT gets longer.
is study contributed to disclose the characteristics of the O-D-based TTR under the varying rainfall with the open-access data.
is study is helpful in enhancing current applications by providing more accurate O-Dbased travel time information under rain conditions; for example, the trend that consistency of the O-D-based TTR tends to improve with ATT increase can help to improve the accuracy of TTR prediction, when missing enough travel time information. Meanwhile, the research is conducive to promote the use of publicly available data in such investigations so that the results are verifiable, and the studies are sustainable. is paper only focuses on the impact of rain, but there are a lot of more deserving further investigations along the line of O-D-based TTR analysis. For future work, the impact of other weather events and the combined effects of weather and other factors such as work zones will be the focus.

Data Availability
e data used to support the findings of this study are available from the Uber Movement website (https://movement.uber. com)and Weather Underground website (https://www. wunderground.com/history).