Modeling hourly weather-related road traffic variations for different vehicle types in Germany

Weather has a substantial influence on people’s travel behavior. In this study we analyze if meteorological variables can improve predictions of hourly traffic counts at 1400 stations on federal roads and highways in Germany. Motorbikes, cars, vans and trucks are distinguished. It is evaluated in how far the mean squared error of Poisson regression models for hourly traffic counts is reduced by using precipitation, temperature, cloud cover and wind speed data. It is shown that in particular motorbike counts are strongly weather-dependent. On federal roads the mean squared error is reduced by up to 60% in models with meteorological predictor variables, when compared to models without meteorological variables. A detailed analysis of the models for motorbike counts reveals non-linear relationships between the meteorological variables and motorbike counts. Car counts are shown to be specifically sensitive to weather in touristic regions like seaside resorts and nature parks. The findings allow for several potential applications like improvements of route planning in navigation systems, implementations in traffic management systems, day-ahead planning of visitor numbers in touristic areas or the usage in road crash modelling.


Introduction
There is strong evidence that weather has a substantial influence on people's travel behavior. However, both strength and direction of the relationship between weather parameters and travel behavior can vary between different locations, depending on characteristics of the local climate or region-specific travel culture [1]. In particular in the mid-latitudes, temperatures can change from adverse to pleasant conditions within the year. Higher temperatures generally lead to an increase of outdoor activities [2][3][4] and an increasing use of bicycles [1,5,6]. However, very high temperatures above 25 to 30°C can be disadvantageous for outdoor activities [7] and cycling [8,9]. Low temperatures can lead to reduced car traffic, however in case of trucks the impact is less pronounced [10].
Precipitation generally leads to reduced outdoor activities [11][12][13]. Also car traffic is reduced during rainfall [14,15], which appears to be particularly the case at weekends [16]. It might play a role that in case of shopping and leisure activities trips are canceled or the mode of transportation and the destination changes due to rainfall [17]. Considerable traffic reductions are reported with snowfall [14,[18][19][20][21][22]. In general, truck traffic is less affected than car traffic [18], because commercial vehicles are less likely to divert trips due to adverse weather [23]. In urban areas precipitation can lead to switching from active (open-air) to motorized (sheltered) transport modes [24], leading to higher levels of transit ridership [25] and public transportation [26].
Compared to precipitation and temperature, wind speed is often overlooked in traffic studies [1]. Some studies document negative impacts of wind speed on cycling [27,28]. In the case of motorized road traffic, some studies show that wind speed decreases traffic Becker et al. European Transport Research Review (2022) 14:16 counts [17], other studies find mostly non-significant impacts of wind speed [29].
Boecker et al. [1] find substantial differences between the outcomes of different studies adressing the impact of weather on travel behaviour. They conclude that the existing literature presents an "incomplete and fragmented picture", identify gaps and suggest ideas for further research. Based on their findings, we address the following points in an approach to model the impact of weather on hourly traffic counts: • Traffic and meteorological data need to be matched accurately in time and space to study their relationships. This can be difficult, if traditional weather station data is used, because stations might be located far away from the location of the traffic measurement. This is particularly relevant in case of precipitation, which can vary strongly in time and space and might not be captured well by station data. Therefore, we use reanalysis and radar-based precipitation products to derive meteorological parameters from high-resolution gridded data sets. • Existing studies make use of a wide variety of multivariate modeling techniques. However, in many studies linear relationships are assumed between weather and different types of travel behavior, although not all effects seem to be linear in all situations [1]. By applying a stepwise predictor selection procedure, we explore non-linear relationships in a controlled setting. • While most studies focus on weather impacts on bicycles, cars, or trucks, little is known about weather impacts on motorbike usage. By analyzing a comprehensive database of long-term traffic measurements in Germany that includes motorbike counts, we can fill this gap.
This study aims to quantify to which extent meteorological parameters can improve the predictive skill of models for hourly traffic counts of different vehicle types. This is particularly relevant for application purposes, where an accurate estimation of traffic flow is important. Fields of application are, for example, road crash models, where traffic flow is the dominant factor for crash risk [30,31], travel-demand and mode-change modeling, traffic management, route planning in navigation systems, and air pollution management.

Traffic data
The German Federal Highway Research Institute (Bundesanstalt für Straßenwesen, BASt) operates a traffic measurement network on federal highways (Autobahn) and federal roads (Bundesstraßen). Federal highways usually have two or three lanes per direction and driving speeds of 100 km/h and more, while federal roads usually have one lane per direction and driving speeds of 100 km/h and less. At the traffic counting stations the hourly number of passing vehicles is registered separately for the two directions of travel. Since it was shown that driving direction is not relevant regarding weather effects [17], the sum of the hourly counts of both directions is used for the analyses at each station. The data set provides counts for different vehicle types. The vehicle types and corresponding abbreviations used in this study are motorbikes (mot), cars (car), vans (van), and trucks (trk). Count data from 2005 to 2018 is considered in this study. However, many of the available measurement stations have been installed after 2005 or show periods with missing data. Therefore, only stations for which at least five years of data are available are used. This ensures that enough data is available for the modeling procedure. Based on these criteria, 696 stations on highways and 704 stations on federal roads are selected for the analyses.

Reanalysis data
The fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) global atmospheric reanalysis (ERA5) is a synthesis of various heterogeneous observational data and model simulations, which is produced using a physical model together with a data assimilation scheme [32]. ERA5 contains different atmospheric and surface variables on a global grid with a spatial resolution of 30 km and an hourly temporal resolution. The advantage of ERA5 over station-based observations is the spatial and temporal homogeneity. But it should be noted that local station measurements can deviate from the gridded ERA5 values.
For each traffic counting station the corresponding ERA5 grid cell is identified and the hourly time series of temperature at 2 m height, maximum wind gusts, and total cloud cover is extracted. Using the hourly weather parameters directly as a predictor variable is problematic, in particular in case of temperature. Both temperature and traffic volume is high during the day and low at night, but not because of a causal relationship between the two variables, but because both variables depend on the elevation of the sun. To exclude this spurious relationship from the regression models, the daily maximum temperature, daily maximum wind gusts and daily average cloud cover is used for further analyses.

Radar data
The RADOLAN data set [33] provided by the German Meteorological Service contains hourly precipitation sums on a grid with a spatial resolution of 1 × 1 km for the area of Germany. RADOLAN combines radar reflectivities, measured by the 16 C-band Doppler radars of the German weather radar network, and ground-based precipitation gauge measurements. As from radar reflectivity we cannot directly infer the precipitation amount at the ground, observations from rain gauges are used to calibrate the precipitation amounts estimated from the radar reflectivity in an online-procedure. The RAD-OLAN data set thus combines the benefits of high spatial resolution of the radar network and the accuracy of gauge-based measurements. While the other meteorological predictor variables are aggregated in time, precipitation is included in the model in form of hourly values. All RADOLAN grid points within the radius of 10 km around a traffic station are selected and the spatial average of the hourly precipitation sum is calculated. This results in a predictor variable, which is representative for a larger area around a traffic station. This is reasonable, since the travel behavior of drivers passing a traffic station does not depend solely on the precipitation directly at the station.

Population data
To analyze wheter the impact of weather on traffic flow differs between urban and rural areas, population density data from the German census (Zensus 2011) is used [34]. The data set provides the number of inhabitants in Germany on a grid with a resolution of 1 × 1 km. The number of inhabitants per grid cell is provided as a discrete variable with seven classes. Each class corresponds to a certain range of inhabitant numbers (Table 1). For simplicity, we assume that the actual inhabitant number in a grid cell corresponds to the average of the class range. Since class 7 has no upper bound, the lower bound is used. For each traffic station, all grid cells within a radius of 10 km around the station are selected and the average population density is computed.

Linear regression and breakpoint detection
The standard linear regression model is a well known technique to relate a target variable y i to a linear combination of l predictor variables X i = (X i1 , . . . X il ) , where β = (β 1 , . . . , β l ) are the corresponding model parameters, α is the intercept and n is the number of available observations. Predictor variables can be continuous or categorical. Interaction terms can be used when the effect of a predictor variable on the target variable changes, depending on the value of other predictor variables [35].
In Eq. 1 β is usually assumed to be constant with respect to i. However, in case of traffic count data, modifications of the road network in the vicinity of a traffic station can lead to abrupt changes of traffic characteristics. Such breakpoints in the time series can be caused for example by construction sites, road closures or the opening of new roads. In this case, the relationship between X i and y i may change and the assumption of constant α and β is no longer valid.
The foundation for estimating single breakpoints in linear regression models was given by Bai [36] and was subsequently extended to multiple breaks [37][38][39]. To identify breakpoints in the traffic count time series, we use the R package strucchange [40,41], which implements the algorithm described in Bai and Perron [42] for simultaneous estimation of multiple breakpoints. Eq. 1 is extended to where j is the segment index, J m,n = i 1 , . . . , i m denotes the set of the m breakpoints, and by convention i 0 = 0 and i m+1 = n . For a given a set of breakpoints i 1 , . . . , i m the least-squares estimates for the β j can be obtained. The resulting minimal residual sum of squares is given by rss(i j−1 + 1, i j ) is the minimal residual sum of squares in the jth segment. The R package strucchange applies an efficient algorithm to find the breakpoints î 1 , . . . ,î m that minimize the objective function (1) where n h is the minimum length of a segment, which is specified by the user.

Poisson regression
If y is a count variable, the Poisson regression model can be applied, which belongs to the family of generalized linear models and uses the exponential function as the inverse link function to assure that y i ≥ 0 [43]. β is estimated using the iteratively reweighted least squares method [44].

Assessing model performance
The mean squared error is a common metric to evaluate model performance by comparing the modeled values f i to the observed values o i . The squared difference leads to a strong penalization of predictions with larger errors. A skill score is a relative measure of how a model performs compared to a reference model. The mean squared error skill score where MSE f is the score of the model under evaluation and MSE r is the score of the reference model. Positive values of the MSESS indicate an improvement compared to the reference model.
Cross-validation is applied by estimating model coefficients using a training data set and computing scores on an independent testing data set. Here, we split the data randomly into 10 sets. Parameters are estimated on 9 sets and the score is calculated on the remaining set. This is repeated 10 times such that for each set the resulting score is computed. These are then averaged and used for model comparison.

Model selection procedure
It is infeasible to manually inspect the functional relationships between traffic counts and various meteorological and non-meteorological predictor variables at all 1400 traffic stations. Therefore we apply an automatic (5) procedure that selects relevant predictor variables based on objective criteria and allows the evaluation of the benefit of including meteorological variables compared to a model without meteorological variables. The following three steps are applied successively for each traffic station and for each of the four vehicle types.

Step 1: Breakpoint detection
Breakpoints are detected in the traffic count time series as described above to identify systematic changes of traffic characteristics, e. g. due to modifications of the road network in the vicinity of a station. Although an efficient algorithm is used for estimating the locations of breakpoints, a considerable computational effort is required for long time series like the hourly traffic counts used in this study. Furthermore, since the breakpoint detection is based on linear regression, the method assumes that residual errors are normally distributed. However, this is not the case due to the nature of the count data. Both issues are solved by applying the breakpoint detection to daily instead of hourly sums of traffic counts. Firstly, the amount of data is reduced significantly. Secondly, by aggregating the data the distribution of the residual errors becomes approximately normal, which we tested using the Shapiro-Wilk test and the Anderson-Darling test [45]. The month of the year and the day of the week are included as categorical predictor variables in Eq. 2 to account for an annual and weekly cycle of traffic counts. The minimum length of a segment n h is set to 300 days to avoid too many and too short segmentations. The number of breakpoints m is selected by iteratively increasing it from 0 to 4. If an increase of m does not improve the RSS by more than 1%, the iteration is stopped and m is selected. Finally, a categorical variable with hourly resolution is generated, in which each segment corresponds to one category, based on the identified breakpoints. This variable is included in the model selection process described below. Note that the daily traffic data is only used to determine the dates of the breakpoints and that the following modeling steps are carried out with hourly data.

Step 2: Model without meteorological variables
After the identification of breakpoints based on daily aggregated traffic counts, Poisson regression models for hourly traffic counts are estimated. The BASt uses daily, weekly and annual cycles to classify the characteristics of individual traffic stations and distinguishes between periods with and without holidays [46]. We adopt this approach to develop a model NO_MET using only non-meteorological predictor variables and relevant interaction terms (see Table 2). NO_MET is used as a benchmark to quantify the improvement achieved by including meteorological variables later. Predictor variables are added to NO_MET in a step-wise procedure.
Starting with an intercept-only model, all remaining nonmeteorological variables and interactions are added to the model individually and the MSESS is computed using 10-fold cross-validation with random samples. The variable that leads to the largest improvement with respect to the MSESS is added to the model, if the MSESS is larger than 0.01, indicating the reduction of the MSE of more than 1% . The iteration is repeated with all remaining variables. If the MSESS is smaller than or equal to 0.01 the iteration is stopped.

Step 3: Model with meteorological variables
The iterative model selection procedure is repeated as described in step 2, but this time starting with the model NO_MET and iteratively adding meteorological predictor variables ( Table 2). This model (MET) is used to quantify the improvement of traffic count predictions by including meteorological variables compared to NO_ MET using the MSESS. To allow non-linear functional relationships between meteorological predictors and traffic counts, the meteorological variables are considered in the selection procedure with different exponents. Temperature, cloud cover and wind are considered with exponents k and precipitation with the exponents 1/k, with k = {1, 2, 3, 4} . In case of precipitation, the fraction allows for a sudden increase or decrease of crash counts with onsetting precipitation. This has already been successfully applied in a previous study for modeling the relationship between precipitation and road crash probabilities [47]. Additionally, each meteorological variables is included in the selection procedure as an interaction term with the categorical variable weekend, which has the three categories working day (Monday to Friday), Saturday and Sunday. This allows, for example, that precipitation can have a different effect on traffic on a Sunday, compared to a working day.

Statistics of meteorological variables
Before studying the effect of meteorological parameters on traffic volume, the occurrence frequencies and correlations of the meteorological parameters is analyzed. For each traffic station the probability density function of each meteorological variables is computed. The probability density function of daily maximum temperature, averaged over all stations, shows that temperature varies mainly between 0 and 30°C (Fig. 1a). In case hourly precipitation the distribution is strongly skewed towards low values (Fig. 1b). On average 71% of all hourly time steps show a precipitation of 0 mm, 19% show a precipitation between 0 and 0.1 mm and the remaining 10% correspond to precipitation amounts above 0.1 mm. The probability density for mean daily cloud cover is highest at cloud covers of 100%. Days with lower cloud covers are less frequent (Fig. 1c). Daily maximum wind gusts occur most frequently within the range between 5 and 20 m/s. The probability density function of temperature and wind gusts vary considerably between the different stations, while in case of precipitation and cloud cover the variability between the stations is much smaller.
To determine the strength and direction of potentially non-linear and monotonous relationships between the different meteorological variables, Spearman's rank-order correlations [48] are computed for each combination of the four variables at each traffic station (Table 3). This step is important to be aware of potential multicollinearity when estimating regression models. The strongest correlation of −0.37 is found between daily maximum temperature and daily mean cloud cover, indicating that low cloud cover correlates with high temperatures. Furthermore, positive correlations around 0.2 are found between cloud cover and precipitation, as well as between daily maximum wind gusts and precipitation and cloud cover. These correlations are reasonable and physically meaningful, however, they are small enough to justify the use of all three variables in the model selection process.

Selection of predictor variables
The model selection procedure described above is executed to develop models for hourly counts of different vehicle types at each traffic station by identifying those variables and interaction terms that improve the predictive skill of the model. Table 4 shows how frequently the  Table 2 are selected at the different stations. In case of all vehicle types, hour, dow (day of the week) and month are selected at 100% of the stations. hour and dow are always selected as an interaction term, indicating that the diurnal cycle of traffic counts changes on different days of the week. The variable break, which indicates breakpoints, is selected as an interaction with a linear trend or hour in most cases. In case of motorbikes and cars, temperature is selected at almost all stations, both on highways and federal roads. In case of vans, temperature is selected twice as often on highways than on federals roads. In most cases, temperature is selected as an interaction term with weekend, indicating that the effect of temperature on traffic counts is different on working days, Saturdays and Sundays. Mostly temperature is selected as a linear term without an exponent. However, in case of motorbikes also higher order terms are selected, indicating a more complex functional relationship.
Cloud cover seems to have an important effect on motorbike counts, in particular on federal roads. But also in case of cars on federal roads, cloud cover is selected at 45% of all stations. Wind speed and precipitation are selected at the majority of federal road station in case of motorbikes, but not in case of cars. In case of trucks, meteorological variables are rarely selected.

Skill scores
For each vehicle type at each traffic station the crossvalidated MSESS of MET is computed, with NO_MET as the reference. The MSESS quantifies the improvement of the model predictions that results from including meteorological predictor variables in the regression models. It should be noted that due to the setup of the model selection procedure no negative MSESS values occur, because predictors are only added to the model, if they improve the MSESS. The largest improvements due to meteorological variables occur in models for motorbike counts on federal roads with a median MSESS of 0.35, which corresponds to a reduction of the MSE by 35% compared to a model without meteorological variables (Fig. 2). At 25% of all federal road stations the MSESS for motorbike counts is larger than 0.42, which constitutes a considerable reduction of the model error due to the inclusion of meteorological information in the model. The median MSESS, and thus the improvement against the model without meteorological variables, is about 3 times larger on federal roads than on highways. The median MSESS of car counts is 0.04, which is considerably smaller than the MSESS of motorbikes. However, at individual stations MSESS of cars values reach more than 0.3. For vans on federal highways the MSESS are almost as large as for cars. For vans on federal roads and for truck in general the improvements due to weather predictors is zero or negligibly small, which is consistent with the previous observation that in most cases no meteorological predictor variables were added to these models.
The spatial distribution of the MSESS values of motorbike counts shows that in case of stations on federal highways the largest MSESS occur in areas with high population density, like Berlin, Munich or the Ruhr area, Cologne and Bonn (Fig. 3a). On federal roads the spatial distribution is more homogeneous (Fig. 3b). In case of cars most stations show a relatively low MSESS, but some stations with considerably larger MSESS values stand out, which are closely linked to touristic regions. For example, MSESS values of more than 0.2 are found on routes from cities like Hamburg and Bremen towards seaside resorts at the North Sea and Baltic Sea (Fig. 3c, d). The largest MSESS for cars of about 0.3 is found on the highway from Munich towards touristic areas in the Bavarian Alps (Fig. 3c). In case of cars on federal roads, stations with large MSESS values are located at roads leading to recreation areas and nature parks like Sauerland, Eifel, Swabian Alb and Franconian Switzerland (Fig. 3d).
The visual inspection of the spatial distribution of MSESS values of models for motorbike counts indicated a larger relevance of weather in densely populated areas. To quantify this relationship, the Spearman correlations between the MSESS values and the population density within a radius of 10 km around the specific traffic stations is computed ( Table 5). The largest correlation of 0.53 is found in case of motorbikes on highways, indicating that in regions with high population densities meteorological predictor variables lead to the largest improvement of models for motorbike counts. In case of cars the correlations are smaller in magnitude and negative, indicating that in regions with low population densities meteorological predictor variables improve the prediction of car counts.     For a more detailed analysis of weather impacts on model performance, the cross-validated MSESS is computed separately for the hours of the day, the days of the week and the months of the year. In case of motorbike counts on federal roads the largest MSESS values occur during daytime, in particular in the afternoon hours, where median MSESS values of almost 0.4 are reached (Fig. 4). Between 0 and 5 AM the MSESS show almost no improvement, at some stations even negative MSESS. Furthermore, on Saturdays and Sundays the MSESS values are generally higher than on workdays. The largest improvements in the course of the year is found during the transitional seasons, in particular in March and October, where the median MSESS reach almost 0.5. In contrast, in the winter months December and January the median MSESS values are almost zero. This is likely due to the effect, that in winter the conditions for motorbiking are generally bad due to low temperatures and therefore the addition of weather predictors to the model brings no benefit compared to simply providing the the information of climatological low temperatures by using the month of the year. However, in the transitional months the weather can change frequently between fair and adverse conditions and climatology given by the month of the year is not a good predictor. Thus, the availability of weather predictors in the models is beneficial to differentiate between these situations. In case of motorbike counts on federal highways the patterns are similar, but the MSESS values are smaller compared to federal roads.
In case of models for car counts, the MSESS values are again smaller than for motorbikes. However, at some stations a considerable improvement is evident during weekends and in the afternoon, with maximum MSESS values of more than 0.4 (Fig. 5). An interesting difference compared to motorbikes are the relatively high MSESS values in January and low values in April. It could play a role here that a car, as a sheltered mode of transport, can be easily used at low temperatures and otherwise fair weather conditions. Motorbike rides at low temperatures, however, might be unpleasant, or seasonal licenses, which are common in Germany, might prohibit the use of motorbikes in winter.

Functional relationships
The iterative predictor selection procedure chooses from a set of relevant meteorological parameters with different exponents. This allows non-linear functional relationships between the meteorological predictor variables and traffic flow. To study these functional relationships, one specific meteorological predictor variable is varied, while all other variables are held constant (see Table 6 for details). The variables are chosen to represent weather situations typical for the summer season. This is done separately for Mondays, Saturdays and Sundays to assess the differences between the functional relationships on working days and weekends. Tuesday to Friday are comparable to Mondays and are therefore not shown here. To compare the model predictions of traffic counts at the different stations, the traffic counts are rescaled, so that 0 and 1 correspond to the average daily minimum and maximum hourly traffic flow at the specific station. For visualization of the functional relationships the modeled rescaled traffic counts of all stations are averaged (thick colored lines in Fig. 6). Additionally the 0.1 and 0.9 quantiles are computed to show the variability between the different stations (shaded areas in Fig. 6).
In case of motorbike counts on federal roads, the station-average traffic counts are highest on Sundays, Note that for one station a meteorological variable can be selected multiple times, e.g. with different exponents. Rows with italic font show the fraction of stations for which a specific meteorological variable was selected at least once with any exponent or in an interaction term  followed by Saturdays and Mondays (Fig. 6a-d).
This indicates that motorbikes are often used for leisure activities. The traffic flow as a function of daily maximum temperature shows that motorbike counts decrease strongly at lower temperatures (Fig. 6a). The maximum is reached at about 25°C. Higher temperatures lead to a reduction of motorbike counts. Motorbike counts as a function of hourly precipitation show highest values when there is no precipitation (Fig. 6b). During hours without precipitation motorbike counts are almost 5 times larger on Sundays compared to Mondays. An increase of hourly precipitation leads to a sudden drop in motorbike counts and an almost asymptotic flattening of the curve where precipitation exceeds 2 mm/h. Precipitation of 2 mm/h leads to reduction of motorbike counts by 50% compared to hours without precipitation. This reasonable non-linear functional relationship between precipitation and motorbike counts is established by using 1/k as an exponent for precipitation, with k = {1, 2, 3, 4} (see Table 2). One could argue that a sharp break between "no precipitation" and "precipitation", which could be introduced by using a binary variable, would be more appropriate. However, the smooth transition better reflects the uncertainties related to the precipitation data and the model formulation. For example, due to the lack of an unambiguous relationship between radar echo and the actual precipitation amount, RADLOAN data may show precipitation, although there was no precipitation on the ground. Also a potential time-lagged effect of onsetting precipitation on motorbike counts is not included in the model. The relationship between motorbike counts and daily average cloud cover reveals particularly large motorbike counts in cloud-free situations and a reduction of motorbike counts with increasing cloudiness (Fig. 6c). On cloud-free Sundays the motorbike counts are almost twice as large as on cloudy Sundays. However, one should be aware of the correlation between cloud cover and precipitation and between cloud cover and temperature, which could affect the results. Furthermore, the variability between the different traffic stations is relatively large at low cloud covers. Motorbikes are especially vulnerable to strong wind speeds and one can expect that motorcyclists avoid trips under windy conditions. This is also reflected by the models. Increasing daily maximum wind gusts lead to a strong reduction of motorbike counts (Fig. 6d). Extreme wind gusts of more than 25 m/s lead to the lowest motorbike counts, also when compared to the effects of the other meteorological parameters. Such wind speeds occur, for example, in summer during thunder storms or in winter in conjunction with extratropical cyclones.

Discussion
While previous studies have addressed weather impacts bicycle, car or truck traffic, there was little research on the direct effect of weather on motorcycle traffic. This study presents evidence that motorcycles, as a nonsheltered mode of transport, is strongly depending on weather conditions. The findings that motorcycle flow increases with temperature and decreases with precipitation is in line with a number of studies addressing bicycle travel behavior [5,6] and outdoor activities in general [2,3]. Cloudiness and wind speed are mostly not considered in studies of traffic and outdoor activity. We showed that low cloud cover and low wind speeds coincide with a higher motorbike traffic flow, which is in line with the general findings that fair weather increases open-air activity [1]. We could also show that high temperatures above 25°C lead to a decline in motorbike counts, which is similar to bicycle usage [8,9] and outdoor activities in general [7].  Table 6). Thick lines represent the mean value of all traffic stations. The shaded area indicates the 0.1 and 0.9 quantile of the values of all traffic stations. The gray dashed lines at 0 and 1 indicate the average daily minimum and maximum of motorbike counts, which is used for the rescaling procedure Results of previous studies, which have analyzed individual traffic stations, suggested that traffic flow in recreational areas is more dependent on weather compared to urban areas [17]. By analyzing a large number of stations and specific vehicle types, we can confirm that this is particularly true for car traffic. At the majority of traffic stations we found that the improvement of prediction of hourly car counts by including meteorological variables is small. However, traffic stations along routes towards seaside resorts and nature parks showed a substantial improvement and thus a pronounced dependence on meteorological variables.
As suggested by previous research [1], we established non-linear relationships between meteorological predictor variables and traffic flow by choosing from meteorological predictor variables with different exponents in an automatic selection procedure. Another methodology specifically designed to describe non-linear relationships are generalized additive models (GAMs), which have been applied for example to predict crash frequencies [49,50]. GAMs use smooth function like cubic splines to find the optimum functional relationships between predictor and target variable. As a test we have also applied GAMs to our data and found that it leads to unrealistic behavior at the extreme ends of the distributions. Also the strong drop of traffic flow with onsetting precipitation lead to considerable overshooting behavior of the splines. It appears to be unsuitable to apply GAMs in an automatic procedure to a large number of stations, where a detailed evaluation of each individual model is infeasible. However, it may be suitable to apply GAMs to individual traffic stations in a detailed study, where fine-tuning of the model is possible.
Böcker et al. [1] suggested to consider interactions between different meteorological variables. For example, the impact of wind speed on motorbike counts may be different on days with precipitation compared to days without. Under rainy conditions motorcyclist already refrain from making trips, so that additional strong wind speeds make no difference. We have included the interaction of precipitation as a categorical variable with the other meteorological predictor variables. However, in general no major improvement of the model was found. The changes of the MSE less than 1% in most cases. Therefore the results were not included in this paper. Due to the increasing complexity of the models when using interactions, future research in this direction could focus more on individual stations, which have been shown do be strongly affected by weather. The Poisson regression model assumes an equality of mean and variance of the count data. In our case this assumption does not hold due to an overdispersion of the data. We have tested if the use of a negative binomial regression model would lead to an improvement of the predictive skill, but that was not the case. Instead, the predictive skill decreased, in particular at hours with high traffic volume. Therefore, we decided to use the Poisson model, which is acceptable, because the overdispersion mainly the estimation of standard errors, which were not the focus of this study.

Conclusions
We have shown that the use of meteorological predictor variables can substantially increase the predictive skill of models for hourly traffic flow, although the magnitude of the improvement depends strongly on vehicle type and location of the traffic station. A particular result was that motorbike counts are strongly weather-dependent and showed a highly non-linear relationship to the meteorological variables. Mean squared errors of motorbike counts could be reduced by up to 60% by including meteorological variables in the models. This is reasonable, since motorbikes are a non-sheltered transportation mode, frequently used for leisure activities and less frequently for commercial purposes. In case of cars the analysis showed mixed results. As a sheltered mode of transportation, which is used for commuting, leisure activities as well as commercial purposes, car counts showed the tendency to be less sensitive to weather in urban areas, but strongly weather dependent in touristic regions like seaside resorts and nature parks. Lastly, counts of delivery vans and trucks, which are mainly used for commercial purposes, showed only low weather dependence.
These findings open up several potential applications of such models. First, analyses of weather impacts on crash probabilities can be improved by including weather-related variation of traffic flow as a predictor variable. Second, taking into account weather effects in traffic flow predictions could improve route planning in navigation systems and could assist in traffic management systems to compensate or redistribute high traffic volumes, in particular in touristic regions. Furthermore, prediction of traffic volumes taking into account weather forecasts would allow day-ahead planning of visitor numbers in touristic areas.