A battery electric bus energy consumption model for strategic purposes: Validation of a proposed model structure with data from bus fleets in China and Norway

In this paper, an energy model for battery electric buses (Ebus) is proposed. The model is developed based on established models for longitudinal dynamics, using event-based low-frequency data. Since the energy model is able to provide relatively accurate estimation of Ebus energy consumption with limited input requirements, it can be easily applied for future bus route planning. In addition, we have introduced a comprehensive model of the auxiliary systems, which contributes significantly to the total energy consumption ofa bus. The model for auxiliary systems includes heating, ventilation, air conditioning, and other electrical components. To evaluate the model, data was collected from 3266 trips with Ebuses operated in China and Norway. The results show that the model is able to predict the energy consumption on a trip level comparison.


Introduction
The transport sector is currently in an energy transition phase with clear incentives for switching to renewable energy carriers to reach climate goals related to CO 2 -emissions (Du et al., 2019). With combating climate change as one of the UN Sustainable Development Goals, the European Union has quantified their contributions to include at least a 40% decrease in greenhouse gas emissions (GHGs), a 32.5% increase in energy efficiency and at least a 32% share of renewable energy sources by 2030 (European Commission, 2018).
A substantial share of global GHGs stem from the transport sector, and phasing out conventional fuel vehicles for electric vehicles can therefore contribute to reducing GHGs if the electricity is generated by renewable sources. Mahmoud et al. (2016) argue that Ebuses are a better solution than alternative powertrains to provide net zero emissions. The market for electric cars has reached a high level of maturity where manufacturers provide vehicles that are in the same functional and economical class as conventional vehicles. However, the corresponding situation for heavier vehicles and other modes of transport remains in an earlier phase. One major obstacle is the limitation in energy densities of batteries, as heavier electric vehicles such as Ebuses typically require large batteries to enable them to operate along tight schedules. Hence, electric power may not be suitable for every bus route before battery technologies have been further advanced.
The paper is outlined as follows. In the next section, we provide the theoretical formulation of the model. The datasets and the methods used to analyse and verify the model are presented in 3. Results from the estimation and validation are presented in 4, where we also present a sensitivity analysis and discussion of the results. Finally, in 5 we give some concluding remarks.

Theoretical basis for the longitudinal dynamics model and auxiliary power
The total power needed for an Ebus is typically separated in two parts: a) the power needed for vehicle motion P ldm and b) the power needed for auxiliary systems P aux , P tot = P ldm + P aux . (1) Here, P ldm is mainly dependent on the distance and route travelled, whereas P aux mainly depends on time.
The power required for a vehicle to drive at a specific speed v can be represented by the following LDM equation (Wu et al., 2015;Asamer et al., 2016;Gao et al., 2017;Lajunen, 2018), where m (kg) is the gross mass of the vehicle, g = 9.81 m/s 2 is the standard acceleration of gravity, α is the slope, C r is the coefficient of rolling resistance, ρ = 1.2 kg/m 3 is the density of air, A front (m 2 ) is the frontal area of the vehicle, and C d is the drag coefficient.
The auxiliary power is a combination of the HVAC system and other components, such as lighting and electronics, P aux = P HVAC + P other = P heating + P ventilation + P other .
Here P heating is the power required for heating and cooling, P ventilation is the power required to drive air through the ventilation system, and P other represent lighting and other auxiliary electronics. As a first-order approximation, the power for heating can be estimated as where c p = 1.005 kg K is the specific heat capacity of air, Q ventilation and Q doors represent the volume flow rates of the air exchange between the bus and its exterior, H doors is a Heaviside step function that is 1 if the doors are open and 0 otherwise, and ΔT = T bus − T exterior . In the following, we will assume that P ventilation = 0.5kW. We will further assume that the ventilation system exchanges the air in the bus eight times per hour based on Engineering ToolBox (2005), which entails that Q ventilation = V bus /450s.
Air exchange through various orifices such as doors has been studied extensively. The following model is based on a correlation proposed by Wilson and Kiel (1990), where Q is the volume flow rate, K = C dis (1 − C m ) is the orifice coefficient, A o is the orifice area, H is the height of the orifice, and g ′ = gΔρ/ρ a ≃ gΔT/T a is the effective acceleration of gravity. ρ a and T a denote the average density and temperature, respectively. C dis is the discharge coefficient, which varies substantially with the type of orifice. C m is an interfacial mixing coefficient, which accounts for exiting indoor air that flows back inside. Wilson and Kiel (1990) state that the orifice coefficient K is typically between 0.4 and 0.6 for normal doorways. The airflow is also affected by the passing of passengers through the doors. To account for this, we add a contribution based on Kalliomäki et al. (2016), who studied airflow patterns through doors in hospitals, where ṅ is the passage rate at which passengers leave or enter the bus and V passage = 0.25m 3 is the exchanged volume due to each passage. In the following, we assume a passage rate of ṅ = 0.25s, which corresponds to five people boarding or alighting through one set of bus doors during a period of 20 s. With K = 0.5, we find Combined, we obtain the following expression for the auxiliary power needed for the Ebus: For a specific time interval and trajectory, we may calculate the total energy consumption as where η ebus represents the total losses during conversion of energy from the battery to vehicle motion. As shown by Hjelkrem et al. (2020), the efficiency may be represented as a function of P ldm . However, as the efficiency for electric motors is relatively constant (Hjelkrem et al., 2020), and because of a lack of relevant data, the efficiency will be treated as constant in this study. Since the auxiliary power is provided directly from the battery pack and not converted to motion as for P ldm , we represent the efficiency for this process with η battery .
In practical applications, the vehicle operational description may be of a discrete character, e.g. based on GPS data, and Eq. (9) can be approximated by a sum of elements: As the time resolution decreases, the inaccuracy of a direct application of Eq. (10) increases. This is due to eventual fluctuation in speed during a specific time interval. During a time period of one second, the acceleration is relatively constant, and the required power can be derived from the LDM. However, as the time interval increases to e.g. 20seconds, the speed profile will most likely be far from linear in urban areas. As shown in Qi et al. (2018), the energy consumption as a function of average speed varies greatly. Thus, consideration must be taken when applying Eq.
(2) on data with low temporal resolution.
In the case of a negative value of P ldm,i , e.g. during negative accelerations and/or negative gradients, and when the Ebus is equipped with recuperative brakes, kinetic energy from the bus can be converted to electric energy in the battery through recuperative braking. However, a minimum speed is required for recuperating due to low electromotive force. Asamer et al. (2016) recommends a value of 15 km/hour for the minimum speed. Therefore, we add the following constraint to the model formulation: where v i is the average speed in the ith time interval.

Methodology and datasets used for model verification
As mentioned in the introduction, the main purpose of the model is to predict energy consumption from an Ebus in urban areas with limited information about the route and operation. The required input for the model are the parameters and coefficients in Eq. (10) and its parts in Eq. (2) and (8), combined with timestamped trajectory data that contains at least the distance traveled, speed, and a door status indicator.
In contrast to other model types, in particular black box models, our model requires limited calibration. The parameter values could be fitted to observed data by statistical methods, but we chose to determine the parameter values from similar vehicles found in the literature. This implies that we favor usability at the expense of more accurate and case-specific parameter values, in the sense that the model framework can be readily applied with a relatively low effort. To assess and validate our approach, two datasets with varying degree of sparsity have been collected. Dataset 1 is collected from buses operating in China between 2017 and 2018, while dataset 2 is collected from buses operating in Norway between 2019 and 2020. A description of the raw data can be found in Table 1.
Dataset 1 includes time of registration, speed, longitude, latitude, percentage changes in state of charge (SOC), and instantaneous measurements of voltage V and current I from motor and battery. The Chinese bus data was logged systematically at 10second intervals. The inside setpoint temperature is not known and is assumed to be 22 • C.
Dataset 2 includes the speed, odometer with a 10 m precision, geographical coordinates, altitude, battery voltage, and event descriptions. As opposed to the systematic logging frequency in dataset 1, the Norwegian data logging was event-based, which means that data was recorded only when certain events occurred. These events included 1) stopping (when speed reaches 0km/hour), 2) starting (when speed reaches a threshold of 5km/h), 3) doors opening and closing, 4) gear shifting, 5) percentage changes in SOC, 6) harsh braking, and 7) entering a geo-referenced zone such as charging zone or bus depot. The inside setpoint temperature is reported by the bus operator to be 18 • C during winter and 22 • C during summer. Table 2 shows an overview of bus characteristics for the buses that were used in China and Norway, respectively. The areas of the frontal area and door openings are estimated from scaled drawings, while the other properties are provided by the bus manufacturer.
An overview of the parameters required for the model (9) and their values is presented in Table 3. The values are estimated from literature. The rolling resistance coefficient (C r ) represents the amount of friction between the tyres and ground. In good conditions, C r can be as low as 0.006, but the value is usually set to 0.01 in city-bus energy consumption models (Lajunen, 2014;Gao et al., 2017;Holmberg et al., 2014;Vepsäläinen et al., 2019). The rolling resistance can be expressed as a function of e.g. vehicle speed (Wang and Rakha, 2016a), but is here taken to be a fixed coefficient with a value of 0.01. Similarly, the drag coefficient (C d ) usually lies between 0.6 and 0.8 in the literature (Lajunen, 2014;Gao et al., 2017;Holmberg et al., 2014;Wang and Rakha, 2016a), and we therefore set it to 0.7. As mentioned in 3, the efficiency of the battery and motor are treated as constants, although high voltages might induce higher losses, thus reducing the efficiency. Based on Asamer et al. (2016), we set the battery efficiency to 0.9 and the powertrain efficiency to 0.82. The efficiency of the recuperated energy is similar to the efficiency of the powertrain, and these are here assumed equal.

Data preparation
The datasets were prepared by adding model inputs that were not available in the original data and by upsampling the data frequency. Both datasets were augmented with observed ambient temperature and a synthetic speed profile, while dataset 1 was further augmented with altitude, bus-stop event information, and passenger weight. In the following, we present the methodology used to augment the data and to increase the frequency.

Data fusion
The coordinates in dataset 1 were matched to a raster layer with altitude values on a 100 m × 100 m grid. For short bus routes in hilly areas, this could result in large errors in the energy estimation, as the gradient resistance can be a significant factor in energy consumption. However, the bus routes covered by the dataset are operated in relatively flat areas, and are assessed to be sufficiently long to include most of the gradients.
None of the datasets included ambient temperature, but this is readily available historical data. Each row in the datasets were matched to hourly data for weather condition data that included the temperature. The Norwegian data covered four days that were chosen to reflect a range of different outdoor temperatures: 2019-08-28, 2019-10-22, 2019-11-22 and 2020-02-27. The average daily temperatures (between 07:00 and 23:00) for these days were 21 • C, 8 • C, 3 • C, and − 7 • C, respectively, as reported by the Norwegian Meteorological Institute. Likewise, the trips in dataset 1 cover a temperature range between 9 • C and 36 • C.

Trip identification
Dataset 1 was provided as a semi-continuous time series where individual bus trips were not trivially distinguishable. To identify the individual bus trips we applied a sequence of filtering algorithms. First, data for days when buses deviated substantially from the regular route were excluded from the analysis. This happened once or twice a month, likely as buses were undergoing maintenance  work. For the remaining days, the data points were split into segments corresponding to consecutive trips (i.e. from first to last bus stop of the scheduled route, northbound or southbound). This was done by filtering out periods of time where buses stood still (zero speed) for a period of 20 ten-second intervals (3 min 20s). These intervals were assumed to correspond to breaks between consecutive trips, and the choice of 20 intervals was determined experimentally by analysing the data. As such, the first instance of each of these identified idle periods marked the end of an undertaken trip. Consequently, the first subsequent data point with speed > 0 marked the beginning of a new trip. In some cases, the pause time between two consecutive trips lasted less than the 20-interval threshold, implying that two trips were essentially undertaken back-to-back The cause of this may be voluntary, e.g. tightly scheduled trips, or involuntary, e.g. a bus running late due to congestion. This resulted in some trips lasting twice or thrice longer than average. These trips were filtered out by only selecting trips that lasted between 3090 min. The final number of trips available for analysis was 2022. Individual trips had to be identified also for the Norwegian data. This was done by filtering out time periods between charging. Because of imprecisions in the geofencing process, each stopover at the charging station often led to multiple "charging zone: enter" respectively "charging zone: exit" events. To avoid including minor trips in the vicinity of a charging station, a filtering algorithm was applied so that each new trip started at the last such "exit" event. Similarly, a trip was assumed to end at the first "enter" event.
The SOC is reported in steps of 1% in both datasets. However, for dataset 2, the time span between data points can be quite large, which means that the SOC uncertainty can introduce a relatively large error. This uncertainty is reduced by restricting the trips in dataset 2 to subsets of the data where the SOC is well defined on the start and end of the trip.

Bus stop identification
The data in dataset 1 did not include information on whether the doors were open or closed during the trips. Therefore, an algorithm was developed to determine the status of the doors during the trips (open = 1; closed = 0). First, coordinates for the physical bus stops along the route were gathered manually from the Chinese map service BaiDu. Fig. 1 illustrates the selected routes and their bus stops.
Since the bus route differed depending on the direction (northbound or southbound), an additional step consisted in determining if each individual bus trip was bound north or south by crosschecking the coordinates of the trip starting point with the coordinates of the bus stops. Then, instances where the bus speed was equal to zero were filtered out, assuming doors only opened when buses stood still. The coordinates of these instances were compared to the coordinates of the identified physical bus stops along the identified bus route (northbound or southbound). If the distance between the physical bus stop and the data points where the speed were zero was less than 50 m, the bus was assumed to be stopped at the bus stop with open doors (a value of 1). Else, the value was set to 0. The threshold of 50 m was determined ad hoc by visually studying the bus position and assuming the probability that a zero-speed event corresponded to a physical bus stop. A threshold of at least 50 m was chosen due to an observed volatility in the coordinates as well as physical bus stops  Hjelkrem et al. Transportation Research Part D 94 (2021) 102804 in the specific area being relatively long. The distance between two points was calculated using the haversine formula: where ϕ 1 and λ 1 are the latitude and longitude of point 1 and ϕ 2 and λ 2 are the latitude and longitude of point 2, and r is the radius of the Earth (approximated to 6373 km).

Passenger flow estimation
A significant part of the energy required to operate the Ebus arises from the passenger flow. First, the total weight of the bus will change according to the number of passengers on board, and therefore affect the power demand according to Eq. (2). In dataset 2, this is accounted for since the measured axle weights at each data event is given. However, the passenger flow in dataset 1 is unknown. Here, the gross weight of the bus is calculated by adding the passenger weight to the curb weight of the bus, such that: where m c is the curb weight of the bus provided by the manufacturer, n is the number of passengers on board and w is the average weight of a passenger. We set the average weight of passengers to the default value of 68 kg (Xu et al., 2015). The number of passengers is estimated to be 75% of passenger capacity, as we assume the buses operate with high occupancy in the trips reported in the dataset. Secondly, the passenger flow will affect the energy required to the HVAC, as shown in Eq. (7). None of the datasets include the passenger flow rate, and we therefore estimate this value to be 0.25/s, as previously mentioned.

Speed profile estimation
Both datasets included instantaneous speed of the buses. However, the time intervals between two consecutive registrations are often too long for calculating acceleration or deceleration needed for the energy consumption model. Although advanced methods exists for constructing fully synthetic speed profiles , we developed a simple algorithm to construct a continuous speed profile based on registered speeds and the average speed between data points.
First, the algorithm reads instantaneous speed at the beginning of the interval and assumes that it increase/decreases with a steady acceleration/deceleration rate until it reaches the average speed. Maximum acceleration/deceleration rate for buses is set to 1.5 m/s 2 which is slightly above what is considered the maximum comfortable rate for public transport vehicles (Martin and Litwhiler, 2008;Hjelkrem and Foss, 2016). The algorithm dynamically adjusts the average speed such that calculated distance driven matches the registered distance in the datasets. Finally, the speed starts increasing/decreasing again with a steady rate until it reaches the registered instantaneous speed at the end of the interval. An example of a constructed speed profile is shown in Fig. 2.   Fig. 2. Constructed speed profile compared to registered instantaneous speed and the average speed between two observations. O.A. Hjelkrem et al. Transportation Research Part D 94 (2021)

Methodology for model performance
Reflecting the novel contributions of this paper, we aim to assess the model by firstly examining the model for auxiliary load, and then by analyzing the predictive power of the total model in the case of event-based low frequency data.
None of the datasets contained explicit information about the auxiliary load. However, dataset 1 included instantaneous readings of voltage V and current I from both the motor and battery pack. Theoretically, the difference between electric power P observed at the motor and battery pack is the electric power used by auxiliary systems. We therefore base the validation of P aux on this difference. A possible drawback by this approach is the low resolution of observations (10s), as the values for I and V can fluctuate substantially between observations, at least for the readings from the motor. For the auxiliary load, we assume that it will be relatively constant between observations, and therefore suitable for comparison purposes.
Because of the low resolution of the energy consumption reported in the datasets, the comparison is mainly done on a trip level. With n = 3266 number of trips in total, this provides a solid base for assessing the proposed model framework. To determine the goodness of fit for the model, we focus on investigating the residuals and sum of square errors: Here ŷ i represents the model outcome and y i represents the reference observed outcomes. In addition to exploring the distribution of residuals to assess varaiations, we use the normalised residuals for quantifying the goodness of fit and discussing the validity through sensitivity analyses.
As the main objective of the paper is to introduce a white-box model of the auxiliary load, we also compare the results to a black-box model designed for this purpose. In this model, the contribution from longitudinal dynamics is identical to our model, as shown in Eq.
(2). However, the auxiliary load contributes to a fixed share of the total energy losses. This share is estimated to be 25%, as reported by , which gives the following estimate for the energy consumption of a single trip:

Results and discussion
For all trips in the datasets, the energy consumption was calculated using the model presented in Eq. (10). Since the main objective is to evaluate the performance of a model based on predefined input, we focus on the deviation between observed and predicted values. For each dataset, we start with a trip level validation, and thereafter present an aggregated comparison of all recorded trips.

Single trip validation
A random trip from dataset 1 was selected to demonstrate the difference in energy consumption required for auxiliary systems and propulsion only. The latter is observed through voltage and current readings from the electric motor, while the former is observed through the difference between readings from the battery and the motor. Thus, the auxiliary power is here defined as the power required for everything but the electric motor, in line with Eq. (1). The selected trip has a length of 16.8 km and a duration of 48 min, and the bus has a total of 10 stops along the trip. The average ambient temperature was 27.45 • C. As shown in Fig. 3, there are some differences in predicted and observed auxiliary energy consumption. The main difference is in the fluctuation over time. The observed energy consumption varies significantly, whereas the predicted energy consumption is relatively constant. The latter can be explained by a low difference between inside and outside temperature, leaving only the constant term in Eq. (7), P ventilation , even though the doors open at bus stops. This might be an indication of some variations not included by our proposed model. Another detail about the fluctuations in observed energy consumption is the occurrence of negative values. This might indicate some delay or inaccuracy in the measurements, especially regarding the difference in readings from the motor and battery pack.
For the same trip, Fig. 3 shows a comparison of the predicted energy consumption on a specific trip to the observed energy consumption from sensor readings of the voltage and current. Although the trip level difference in energy consumption for auxiliary systems is low, there is some deviation in the energy consumption from propulsion. This also explains the difference in total energy consumption. The deviation in energy consumption for VSP is likely due to the sampling rate being too low to account for the power fluctuations over a 10s period. Hence, we will use the reported SOC as a measure for total energy consumption by the Ebuses in dataset 1 in the rest of the paper.
In addition, a random trip was selected from dataset 2 (Norway). The selected trip has a length of 25.4 km and a duration of 69.3 min, and the bus has a total of 32 stops along the trip. The ambient temperature was 25 • C. The profiles of the cumulative observed and predicted energy consumption are shown in Fig. 4. The observed reference values are changes in SOC, and as can be seen, these are reported with a low frequency. The graph shows that, as for Fig. 3, the shapes of the curves representing the total estimated and observed energy are similar. However, for dataset 2, the energy use is underestimated.

Energy model goodness of fit
As for the single trips in the previous section, we calculate the predicted energy consumption for all trips in the two datasets. These are compared to the observed energy consumption to assess the goodness of fit of the energy model. First, we evaluate the model for auxiliary energy, which is only explicitly observed in the dataset from the Chinese buses. Fig. 5 shows the predicted and observed auxiliary energy consumption on a trip level, where each dot represents one trip. The figure is colour coded by temperature to  Hjelkrem et al. demonstrate the effect of outside temperature. Here we see that the cloud of observations is somewhat skewed, which implies an underestimation of energy consumption for auxiliary systems in high temperatures.
In Figs. 6 and 7, the total energy consumption for all trips are shown. Fig. 6 shows a scatter plot of the predicted energy consumption per trip compared to the estimated energy consumption from voltage and current data. For the Chinese dataset, the normalized sum of squared residuals is 12.8. The value for the Norwegian dataset is 115.0. Fig. 8 shows histograms of the residuals from Fig. 6 and 7, respectively. In both cases, we see that the shape of the histograms resemble a normal distribution, with a slight skewness in the histogram in Fig. 8b. The peak of the histogram is [ − 2.4, − 1.8] for dataset 1 and [3.9, 6.6] for dataset 2, which confirms that the model is overestimating energy consumption for the Chinese buses, and  underestimating the energy consumption for the Norwegian buses.

Sensitivity Analysis and Discussion of model applicability
The results show that it is possible to predict the energy consumption of a battery Ebus based on a theoretical approach within a reasonable error margin. When studying the distribution of residuals, which appear to be close to normally distributed, the average energy consumption is overestimated for dataset 1 and underestimated for dataset 2. This suggests that the model is able to replicate the studied phenomena with a good accuracy. The calculated normalized sum of squared residuals suggest that the model has a better goodness of fit for dataset 1.
A part of the deviation between observed and estimated energy consumption can be explained by issues with the collected data, specifically the logging frequency. Although the size and level of detail of the datasets render them highly valuable for scientific purposes, some approximations had to be done. In the case of speed profiles, the approach we adopted is a best-case scenario, thus leading to underestimated energy consumption. In real-life driving, a speed profile will probably have a higher degree of fluctuations from interaction with e.g. other vehicles in the traffic flow. Additionally, the absence of passenger flow data and door opening events   Hjelkrem et al. increases the level of synthesis in dataset 1 (China). When studying the results, it is possible to explain some of the deviations from an overestimation of passenger occupancy. A lower occupancy rate would decrease both total passenger weight and number of times the bus stops to open its doors, thus reducing the estimated energy consumption. Aggregated for all trips, we see that the average share of estimated energy demand from the auxiliary systems are 25% and 36% for dataset 1 and 2, respectively. This shows that HVAC is a significant contribution to energy consumption, and should therefore be properly included in Ebus energy models. For dataset 1, where we are able to separate energy consumption for auxiliary systems from total energy consumption, the average observed share is 44%. As shown in Fig. 5, the underestimation of auxiliary energy consumption is predominantly for high ambient temperatures, suggesting that the energy consumption for cooling is not sufficiently handled in Eq. (8).
One of the novel contributions from this work is the representation of vehicle dynamics and auxiliary systems in equal shades of grey in the white-box/black-box model sphere. In this regard, an effort was made to increase the degree of physical models while limiting the number of input parameters. However, the set of values initially chosen in this study appears to be less than optimal. This reflects the situation of any application of the model in a planning situation, where a significant part of the input parameters are

Table 4
Sensitivity analysis of various parameters such as passenger occupancy, rolling resistance, and power demand from lighting and other electrical components. For each parameter value, the normalised sum of square residuals is calculated (Eq. (14)). uncertain. Therefore, a sensitivity analysis was done where the impact of parameter value is studied, as shown in Table 4. The sensitivity analysis serves two purposes, with the first being a display of which parameter values that could be optimized, and the second showing the impact of setting the wrong parameter value. It is apparent that some parameters have less impact than others, and specifically the parameters concerning airflow from passenger flow (ṅ,K, and V passage ). In addition, the power needed for ventilation has a low impact. However, for both P ventilation and P other , the results suggest that the initial values for the bus in dataset 1 are too high, and too low for the bus in dataset 2. The results for the setpoint temperature T bus are only calculated for dataset 1, as this is explicitly stated by the operator of the buses in Norway. Although lowering the value to 18 • C for dataset 1 will decrease the goodness of fit, the effect is limited. The final, and most influential parameter values related to HVAC are the airflow from ventilation Q ventilation and power needed for other auxiliary systems P other .
The parameter values linked to vehicle dynamics are C r , C d and passenger occupancy. Of these, the drag coefficient has the lowest impact. Regarding the rolling resistance coefficient, an inaccurate value will have large consequences in both cases. For dataset 1, a C r in the lower range will provide a better fit, while the initial value for dataset 2 could be increased. The passenger occupancy value will directly affect the total vehicle weight, and should therefore have a large impact on the total energy consumption. As seen for dataset 1, this is also the case, and could help to explain some of the observed overestimation in Figs. 6 and 7. For dataset 2, the exact weight of the bus is known, and this parameter is therefore not used.
For the efficiency parameters η ebus , η battery and η recup , the battery and recuperation efficiencies have the lowest impact. For η ebus , a low parameter value will significantly reduce the goodness of fit for dataset 1. Similarly, for most efficiency values, the initial values seem to underestimate the actual efficiency, with an exception for the recuperation efficiency in the Chinese buses and the η ebus in dataset 2. This also suggest that a constant value of the η ebus is not optimal, and should be replaced with an efficiency function, as mentioned in the introduction.
As introduced in 3.2, an alternative model is introduced for comparing our white-box model of the auxiliary load to a white-box representation where the auxiliary load contributes to 25% of the energy losses. In Table 5, the values for SSR in each dataset is presented. For dataset 1, the proposed method has similar accuracy as the model developed by . However, for dataset 2, the results are significantly improved.
The literature suggests that the relationship between energy efficiency of EVs and ambient temperature has a U shape, where energy efficiency increase with temperature up to a point, and then start decreasing if it exceeds the optimal temperature (Wang et al., 2017, Fig . 1). In Fig. 9 we plot the share of energy consumption for HVAC as a function of ambient temperature. This concurs with the findings from , and further underlines the need for a proper HVAC representation when planning for Ebuses in areas with a large variation in ambient temperatures.

Concluding remarks and recommendations for further research
We have introduced an energy model for Ebuses which includes a detailed representation of the auxiliary systems in addition to the energy demand from propulsion. The main area of the model is to predict route-level energy consumption for planning purpose by using limited information that are expected to be available in a planning phase, such as route details, outside temperature, battery capacity, passenger flow. A key success factor for a further increase in share of Ebuses is to reduce the uncertainty about operational characteristics, such as delays due to unexpectedly long charging sessions, low passenger thermal comfort from energy saving, and difficulties with optimizing the use of a mixed bus fleet. Hence, a priori knowledge about the expected performance of the Ebuses will be important when purchasing new vehicles, concerning battery pack size and properties, weight profile (including passenger occupancy), and operational properties of the passenger comfort parts of the auxiliary system. In addition, the placement and characteristics of charging infrastructure needs to be harmonized with the planned bus fleet and schedule. This includes both the geographical placement, as well as the available power, number of outlets, and charging technology.
In terms of auxiliary use, we see that the HVAC energy consumption can be a significant share of the total energy consumption. This demonstrates a potential in energy savings through reducing the amount of air exchange through fewer and shorter door openings, or investing in technology for reducing heat transfer when doors open.
A comparison to observed energy consumption in 3266 trips shows that the proposed model is able to give reasonable estimates on a trip level energy consumption. However, there are some areas where the model can be further improved. From the sensitivity analysis, it is obvious that the impact of passenger occupancy requires better models for estimating passenger flow in urban areas. While the approach chosen in this paper relies on a constant occupancy, more detailed heuristics and time-dependent behavior could be implemented to refine the model. Similarly, the estimation of speed profiles could be further refined by introducing methodologies for capturing the effects of route-specific disturbances, such as traffic flow, pedestrian crossings and driver-induced fluctuations. As this will add to the share of accelerations and decelerations in the speed profile, our approach is relatively conservative regarding estimation of the added energy consumption due to driver behavior. We expect that the recent increase of Ebuses in the market will lead to an increase in datasets with lower sparsity and higher level of detail. This will enable further research on energy models for Ebuses, for both validation purposes and the introduction of more detailed energy models.