Distance-based emission factors from vehicle emission remote sensing measurements Science of the Total Environment

power that uses commonly measured or easily obtainable vehicle information such asvehicle speed, acceleration and mass. Wetest the approach against 55 independent comprehensive PEMS measurements for Euro 5 and 6 gasoline and diesel vehicles over a wide range ofdriving conditionsand ﬁ ndgoodagreementbetweenthemethodand PEMSdata.Themethodisappliedtoin-dividual vehiclemodeltypestoquantify distance-based emission factors.Themethod will be appropriateforap- plication to larger vehicle emission remote sensing databases, thus extending real-world distance-based vehicle emissions information.


H I G H L I G H T S
• A method for deriving distance-based (g/km) emission factors from vehicle emissions remote sensing has been developed. • The method has been comprehensively evaluated against independent PEMS data. • Applications to several remote sensing campaigns are demonstrated. • While demonstrated for CO 2 and NO x , the method is applicable to any pollutant species

G R A P H I C A L A B S T R A C T
a b s t r a c t a r t i c l e i n f o

Introduction
Road vehicle emissions contribute significantly to a wide range of air pollution problems, particularly in urban areas. The European Environment Agency estimates that in 2017 86% of its monitoring stations which reported NO 2 concentrations above the World Health Organisation Air Quality Guidelines were traffic stations (EEA, 2019). Important primary combustion products from vehicles include NO x (NO + NO 2 ) and particulate matter (PM). Additionally, emissions of NO x act as an ozone precursor and are an important contributor to secondary particulate formation. Emissions of these species have been shown to have considerable deleterious effects on human health (Mannucci et al., 2015;Kar Kurt et al., 2016;An et al., 2018), with premature deaths in Europe having been attributed to poor air quality owing to exceedances of road transport type approval tests (Jonson et al., 2017;Chossière et al., 2017Chossière et al., , 2018. Recently, Schraufnagel et al. (2019) suggested that air pollution could deal chronic damage to potentially every organ in the human body.
Robust emissions data are required to ensure that policies aiming to mitigate air pollution are effective. In the case of road vehicle emissions, robust quantification poses considerable challenges. Vehicle emissions vary by manufacturer, vehicle model, emission standard, engine size and fuel typeand many other factors. Even nominally identical vehicles which share all these characteristics can vary in their mileage, their levels of maintenance, driver behaviour, the added weight of their passengers and cargo, the auxiliary systems being employed, and the ambient conditions in which they are driven. With tens of millions of road vehicles in the United Kingdom alone, it is challenging to robustly quantify the contribution of road transport to air quality.
In recent years there has been an increased focus on emissions under "real-world" conditions in addition to laboratory-based quantification. Historically, testing vehicles for Type Approval regulations has been solely conducted under controlled laboratory conditions on chassis dynamometers over drive cycles such as the New European Driving Cycle (NEDC). Originally introduced in 1996, the NEDC is criticised for poorly reflecting real driving conditions. To replace the NEDC, the Worldwide Harmonised Light Vehicles Test Procedure (WLTP) was introduced in Europe starting in 2017, which is more representative of real-world driving, alongside the Real Drive Emissions (RDE) test. The RDE test is conducted on roads in real traffic, with vehicles being measured with Portable Emission Measuring Systems (PEMS) undergoing a specified variety of driving conditions (urban, rural, and motorway) (Mock, 2017).
Remote sensing is in many ways complementary to PEMS. PEMS has some clear benefits: the full journey of a single vehicle can be measured under almost any driving conditionidling in traffic through to motorway driving. However, it can be expensive and time consuming to measure a large number of vehicles in this way and capture important variations due to ambient conditions, vehicle age profiles and the potential effects of vehicle deterioration. Moreover, it is also challenging to measure a broad range of vehicle types, including urban buses and the wide range of heavy duty diesel vehicles (HDV) that exist. The growing databases of PEMS measurements are strongly dominated by measurements of passenger cars.
On the other hand, vehicle emission remote sensing cannot measure an entire drive cycle; only measuring a snapshot (typically 0.5 s) of a given vehicle's journey. Nevertheless, an important advantage of remote sensing comes from the much larger sample size measured in a short space of time, full fleet coverage with little selection bias, and the unobtrusive nature of remote sensing. Applications of the technique have included the instantaneous identification of potential highemitters (Huang et al., 2018;OPUS, 2019) and investigations into longer term trends in fleet emissions (Bishop and Stedman, 2015;Carslaw et al., 2016). Remote sensing data has also been used to analyse realworld conditions which can influence vehicle emissions, two examples being altitude (Bishop et al., 2001) and ambient temperature (Grange et al., 2019).
A key limitation of remote sensing in terms of emission factor development, however, is that only a molar ratio of a pollutant to CO 2 is measured. This is a consequence of measuring in a dispersing plume in the atmosphere rather than measuring emissions directly at the tailpipe. The concentrations of pollutants in a plume may change as it dilutes, but their ratios to CO 2 should remain the same for unreactive pollutants (Bishop and Stedman, 1996). With a few basic assumptions about the combustion of hydrocarbon fuels, it is straightforward to calculate fuel-based emission factors, most commonly expressed as grams of emission per kg of fuel burnt (Burgard et al., 2006).
Fuel-based emission factors have been argued to vary less with engine load than distance-based equivalents (Stedman et al., 1994;Singer and Harley, 1996). Lee and Frey (2012) went as far to suggest that remote sensing site-specific fuel-based emission factors could be representative of area-wide emission rates if the distribution of vehicle specific power (VSP) values were similar between the measurement site and routes in the area of interest. However, the vehicle emissions type approval process and emission factors used in the development of emissions inventories instead express emissions as distance-based factors i.e. grams per mile or kilometre.
Previous studies have already attempted to generate distance-based emission factors from remote sensing data. Carslaw et al. (2011) used UK emission factor estimates of CO 2 in g km −1 and measured NO x : CO 2 ratios to generate NO x g km −1 emission factors; a major assumption being the accuracy and representativeness of the CO 2 estimates. Similarly, Bernard et al. (2018) combined average fuel-based emission factors, the carbon content of fuel, and distance-based CO 2 emission factors estimated based on type-approval information contained in number plate information, augmented by the reported consumer fuel economy average experience in real-world conditions. The authors note that this method is to be used with caution due to the real-world variance of CO 2 g km −1 values not reflected in the type-approval values.
More commonly, fuel consumption is used directly to transform fuel-based emission factors into speed-based ones. In some cases, the approach relies on preexisting measurements of fuel consumption. Aguilar-Gómez et al. (2009) estimated fuel consumption based on fuel economy databases available from maintenance programs in Mexico where their study took place, and Zhou et al. (2014) relied on fuel consumption information derived from an earlier PEMS study by Wang et al. (2014). A natural drawback of methods such as these is the restriction of remote sensing to locations where these external data sets exist and are publicly available.
Other studies have chosen to model fuel consumption based on roadside measurements. For example, Chan and Ning (2005) used work presented by Tong et al. (2000) to model fuel consumption based on instantaneous vehicle speed. Later, Zhou et al. (2007) modelled fuel consumption based on both binned vehicle specific power (VSP) and vehicle speed to better reflect real-world driving conditions, with each binned fuel consumption value adjusted by vehicle mass. Only four vehicles were used in the fuel economy testing to feed into this model, however, limiting its applicability.
The primary focus of this work is the development and validation of a method to estimate the instantaneous fuel consumption of a vehicle measured using remote sensing, which can then be used to estimate distance-based emission factors. To estimate fuel consumption, vehicle specific power (VSP) is first estimated using kerbside measurements and vehicle technical data, and is then used to model fuel consumption through relationships established using the Passenger Car and Heavy Duty Emission Model (PHEM). The derived distance-based emission factors are compared to PEMS data of 55 Euro 5 and 6 passenger cars and light duty vans. The comparison is made between the emissions of NO x measured over a real-world driving test (similar to an RDE test) and emissions derived using the emissions model based on remote sensing data.
In order to demonstrate the methods in this work, certain assumptions have been madefor example relating to the power demands on vehicle engines, or the molecular formula of fuel. The methods are sufficiently modular such that if more specific values are known or if alternative assumptions are preferred, they can be used in the place of those assumptions presented here.

Calculation of vehicle power
The aim of the emissions model is to estimate the instantaneous fuel consumption of a vehicle at the time the remote sensing measurement is made. The approach is based on the estimate of the vehicle power demand at a particular point in time coinciding with when a remote sensing measurement is made. To calculate the VSP (Jimenez-Palacios, 1998), it is necessary to sum the power demands for a vehicle, given in Eq. 1. These include the power to accelerate the vehicle (P accel ), to overcome rolling resistance from the road (P roll ), to overcome air resistance (P air ), to climb the road gradient (P grad ) and to operate auxiliary devices (P aux ), accounting for power losses in the transmission (P trans ).
The total vehicle power demand (in Watts) is given by Eq. 2. The terms used in Eq. 2 and subsequent equations are defined in Table 1.
To arrive at Eq. 2, the following assumptions were made: the power to accelerate rotational accelerated mass is equivalent to 4% of the power for translational accelerated mass; the power losses in the transmission are equal to 8% of the power at the driven wheels; and the power demand of auxiliaries is taken to be a fixed value of 2.5 kW (Borken-Kleefeld et al., 2018;Hausberger, 2003). g is taken to be 9.81 m s −2 and ρ to be 1.2 kg m −3 , the density of air at 20 ∘ C and 1 atm of pressure.
To calculate VSP in kW t −1 , Eq. 2 is divided by mass to arrive at Eq. 3.
Coefficients R 0 , R 1 and C d A are provided in Table 2 on a per-vehicle segment basis, as well as for average cars, vans and both cars and vans. The segmentation used is that of the European Commission (1999), with vehicle segments defined to group vehicles with similar characteristics together and make the analysis tractable. Vehicle segments are each given letters and names, with A corresponding to minis, B small cars, C medium-sized cars, D large cars, E executive cars, F luxury cars and J sports utility vehicles. VanI-III refer to increasing sizes of van. Segmentation is inexact, being based on factors such as price and accessories as well as vehicle size and shape; in principle, no segmentation is required, but it is especially useful for grouping vehicles with similar drag coefficients, where there is an absence of individual vehicle measured values of C d .

Modelling instantaneous fuel consumption
The Passenger Car and Heavy Duty Emission Model (PHEM), simulates fuel consumption and emissions from vehicles in any driving situation based on engine maps and vehicle longitudinal dynamics simulation (Hausberger, 2003). PHEM is able to model fuel consumption values over a range of driving conditions. For the purposes of estimating the fuel consumption of vehicles measured by remote sensing, it provides relationships between fuel consumption and engine power. This relationship can be normalised by dividing through both variables by vehicle mass, effectively creating a relationship between normalised fuel consumption in (g h −1 ) t −1 and VSP. VSP can therefore be converted to fuel consumption using Eq. 4, where M and C are the dimensionless parameters of the linear relationship. These parameters are provided in Table 2 on a per-vehicle segment basis, as well as for average cars, vans and both cars and vans.
A consequence of using a linear equation such as Eq. 4 to model fuel consumption are negative modelled fuel consumption values, which are set to zero due to having no physical basis. Using Eq. 5 fuel consumption can be converted from grams per hour driven to grams per kilometre travelled through division by vehicle speed in kilometres per hour.
With access to modelled instantaneous fuel consumption from Eqs. 4 and 5, Eqs. 6, 7 and 8 allow for the creation of emission factors by combination with remote sensing data. First, fuel-based emission factors are generated using pollutant ratios through Eq. 6, where P Table 1 Definitions of terms, including units.

Term
Definition Unit Table 2 Generic coefficients (R 0 , R 1 , C d A) and dimensionless parameters (M, C) to be used in Eqs. 3 and 4. The coefficients are average values taken from the test data base used for the Handbook Emission Factors for Road Transport (HBEFA) v3.3. The parameters were determined from characteristic fuel flow curves for different engines calculated using PHEM, again using the HBEFA 3.3 test data base and the Common Artemis Driving Cycle (CADC) (Hausberger, 2003;Keller et al., 2017;Borken-Kleefeld et al., 2018). Fuel flow curves all showed excellent linearity (R 2 N 0.99). corresponds to the pollutant being measured (NO x , CO 2 , HC, etc.). The molecular formula of petroleum-derived fuel, MW fuel , is taken to be the molecular weight of CH 2 (14 g mol −1 ).

Segment Diesel Petrol
The fuel-based emission factors from Eq. 6 can then finally be combined with the modelled fuel consumption from Eqs. 4 and 5 to create duration and distance-based emission factors using Eqs. 7 and 8.
Modelling fuel consumption is not necessary for PEMS data. As PEMS instruments report the flow rate of the exhaust, it is straightforward to calculate emission factors. Eq. 9 demonstrates a method to calculate a duration-based emission factor, and Eq. 10 a transformation from duration-to distance-based emission factors. V m is taken to be 24.1 L (molar volume at a temperature of 20 ∘ C and pressure of 1 atm).

Journey average emission factors
For a vehicle completing a drive cycle of a known distance, the average distance-based emission factor can be determined from a 1 Hz PEMS data set via the sum of all duration-based emission factors divided by the distance covered in the journey in kilometres, shown in Eq. 11.
While Eq. 8 is a simple way to transform remote sensing g s −1 emission factors into g km −1 ones, there are potential issues with these g km −1 factors being biased due to remote sensing typically measuring vehicles under load. Large parts of journeys taken by vehicles, particularly in urban centres, may involve idling and brakingconditions in which remote sensing is not suited to measure.
To overcome this issue, relationships between snapshot g s −1 emission factors and VSP may be determined from remote sensing and then, in principle, used to predict emissions over any drive cycle where VSP can be estimated. Generalised Additive Models (GAMs) can be used for this purpose. GAMs offer several advantages in this respect in that they are 'data-driven' and handle non-linear relationships between variables. GAMs relating NO x g s −1 to VSP were fitted using the gam function in the mgcv R package (Wood, 2017) using remote sensing data constrained to positive i.e. non-zero NO x g s −1 values. The default parameters of the gam function were used throughout. In this study the drive cycle used to predict emissions over is taken from a PEMS test, described further in Section 2.4.
Predicting g s −1 factors for VSP values outside of the range of measured VSPs requires extrapolation of the GAM, which can lead to unreliable predictions. For these reasons, GAMs are only fitted using a VSP range between 0 and the 99th percentile of remote sensing VSPs, and then only used to predict over elements of the on-road drive cycle within the same VSP ranges. For elements of the drive cycle above the 99th percentile of the remote sensing data, the emissions and distance covered were disregarded in calculations, effectively truncating the drive cycle as a whole, to ensure a like-for-like comparison. With larger remote sensing data sets that cover a greater range of VSPs, truncating drive cycles should not be necessary.

Portable Emissions Measurement System (PEMS) data
The UK Department for Transport, prompted by the Volkswagen emissions scandal, started an investigation into commonly used diesel vehicles in 2015 (DfT, 2016). The Vehicle Emissions Testing Programme focused on three different types of measurements. First, in-lab testing using variations of the New European Driving Cycle (NEDC). Second, track testing using PEMS instrumentation, attempting to replicate the NEDC as close as possible, and third, on-road testing on a test route approximating the then-not fully defined Real Driving Emissions (RDE) test, including urban, rural and motorway driving. The third data set is used in this study.
After being augmented with the similar Vehicle Market Surveillance Unit Programme in 2017, the full PEMS data set contained 19 Euro 5 diesel cars, 17 Euro 6 diesel cars, 14 Euro 6 petrol cars, 4 Euro 5 diesel vans and a single Euro 6 diesel van, for a total of 55 vehicles in all (DVSA, 2017). Vehicles were each tested only once for an average of 95 min, with the shortest test being 90 min and the longest 106 min. The PEMS equipment was validated against a laboratory emissions measurement system. More detailed information about the ways the PEMS tests were conducted is available from the Department for Transport and Department web pages (DfT, 2016;DVSA, 2017).
We considered the effect of applying a time offset to the PEMS data to check whether any time synchronisation between variables such as CO 2 , NO x , vehicle speed and acceleration was necessary. A range of time offsets were applied to seek the best agreement between the PEMS CO 2 and that predicted by the developed method. The agreement between PEMS and modelled data was judged using the correlation coefficient, r, and the root mean squared error (RMSE); seeking maxima and minima, respectively. Additionally, we also considered applying a rolling mean of 3 to 5 s to the data to reduce the effect of any time offsets. However, the best overall agreement was found by not applying time offsets for the data sets considered.
The DfT route can be split into urban, rural and motorway driving by changes in the speed profile of the vehicle as it continues through its journey, mainly changes in maximum speeds and frequency of braking. Each vehicle is driven over a similar trip, so an example for just one is provided in Fig. 1.
The PEMS data sets already include the majority of required variables for the calculation of instantaneous fuel consumption, but some required additional processing. Vehicle speed was estimated based on the measured distance throughout the test. An on-board GPS provided second-by-second altitude in metres. A cubic smoothing spline was fitted using the default parameters of the smooth.spline function of the R stats package (R Core Team, 2019) to remove noise from the GPS altitude signal, and was divided by the second-to-second difference in distance to derive the road gradient. Acceleration was taken to be the second-by-second difference in the speed of the vehicle. Ratios of pollutants to CO 2 required for Eq. 6 were calculated using the instantaneous measured concentrations of each in %/ppm.
The PEMS data set provided measurements of carbon monoxide, CO 2 , water vapour, and NO x , as well as the individual components nitric oxide and dioxide) but did not provide measurements for total hydrocarbons. This means that the HC: CO 2 ratio in Eq. 6 is omitted from the final calculations. This omission is likely to have a negligible effect on calculated emissions for diesel vehicles due to their low emissions of hydrocarbons (Reşitoʇlu et al., 2015), and studies have shown that even the newest petrol vehicles emit little HC relative to other carboncontaining pollutants (Wang et al., 2014).
The only variables that could not be estimated from data within the PEMS data sets were the masses of the vehicles, their vehicle segments and the road load and aerodynamic drag coefficients. Masses and vehicle segments were found using online research tools intended for car buyers, such as the Parker's Car Guides, with each mass having 150 kg added to approximate the added weight of the driver and PEMS instrumentation. The coefficients, alongside the M and C parameters, were taken from the data outlined in Table 2 on a per-segment basis.
Two sets of emission factors were then calculated. First, Eqs. 3-8 were applied to the PEMS data set to generate emission factors through modelling fuel consumption. Second, Eqs. 9 & 10 were applied to generate emission factors including the fuel consumption data contained within the PEMS data set. These two sets of emission factors facilitate comparisons and therefore validations of the fuel consumption model. To do so, GAMs were fit to create smooth trends in CO 2 emission factors according to both the PEMS and modelled fuel consumption values through both speed and VSP values. Owing to the additional asymptotic effect of low speed values on distance-based emission factors (i.e. as speed tends towards 0, fuel consumption per unit distance and therefore emissions per unit distance tend towards infinity) very low speeds are filtered out for the g km −1 figure. In practice this meant that these models used data which corresponded to VSP values of 0 to 30 kW t −1 for both GAMs, speeds of 0 to 111 km h −1 for the g s −1 GAM and speeds of 5 to 111 km h −1 for the g km −1 GAM.
Eq. 11 was also applied to each vehicle in the PEMS data set for comparisons with distance-based emission factors calculated from the remote sensing data set. These g km −1 factors were calculated for the journey as a whole as well as the individual urban, rural and motorway components.

Remote sensing data
To demonstrate an application of the duration and distance-based emission factor generation method outlined in Eqs. 3-8, remote sensing data were used. The data was acquired using the Fuel Efficiency Automobile Test (FEAT) instrument, the remote sensing (RS) device developed by the University of Denver. Its principles of operation have been described in detail elsewhere (Bishop and Stedman, 1996;Burgard et al., 2006), but a brief overview is provided here.
The FEAT instrument consists of a UV/IR light source and detector for the measurement of exhaust gases, a set of laser-based speed bars for the measurement of speed and acceleration, a camera for photographing number plates, and a control computer. On the kerbside is positioned the UV/IR detector and the detecting speed bar, with the light source and emitting speed bar positioned directly opposite across a single lane carriageway. Pollutants in the exhaust plumes of passing vehicles interact with the collinear beam of non-dispersive IR and dispersive UV light produced by the source, permitting the measurement of CO, CO 2 , hydrocarbons (HC), SO 2 , NH 3 , NO, NO 2 and a background reference. Based on the blocking and unblocking of the two parallel lasers, the speed bars allow for the speed and acceleration of the vehicle to be calculated. Number plate photographs are cross referenced with vehicle databases to obtain further vehicle technical information, in this case obtained from a commercial supplier (CDL Vehicle Information Services Limited).
The remote sensing data set combines data from measurement campaigns in two UK cities, York and London, conducted in 2017 and early 2018, with earlier measurements made in 2012/2013 (Carslaw and Rhys-Tyler, 2013;Carslaw et al., 2018). The data set consists of 37,421 measurements of Euro 5 and 6 light duty vehicles. The number of relevant measurements contained within the remote sensing data set are summarised in Table 3 alongside some statistical information pertaining to VSP, speed and road gradients.
Eqs. 3-8 were applied to the remote sensing data set to generate emission factors. As the model is designed to be used with remote sensing data, its application is straightforward as most of the variables are already present in the data set. The mass of vehicles measured using remote sensing is also unknown but is estimated by adding 150 kg to the unladen weight of the vehicle, which is provided in the vehicle technical data. This uncertainty is explored further in Section 3.2.
One omission in the remote sensing data used is a lack of market segment information, which was overcome with simple regression tree modelling based on the manually assigned market segments of the vehicles in the PEMS data set. Fig. 2 shows the distributions of the vehicle frontal surface area (approximated simply through multiplying vehicle Fig. 1. The speed profile of one of the passenger cars undergoing the Department for Transport's on-road test. The journey has been partitioned into motorway, urban and rural based on clear changes in the speed profile, including maximum speeds and frequency of braking.

Table 3
The numbers of measurements (n) in the remote sensing data set by vehicle type, fuel type and Euro classification, alongside some measurement statistics. The data set contains measurements of vehicles with different Euro classifications, different vehicle types (e.g. HDVs and hybrid vehicles), which are not used in this study so were not included when generating these statistics.

Vehicle
Fuel height times width) and mass for each vehicle segment in the UK Department for Transport PEMS data set. While vehicle dimensions are commonly available in remote sensing data sets, in this case the same online research tools used to find the vehicle segments and masses were used to determine width and height. Also shown is a simple decision tree for the segmentation of vehicles generated through the rpart R package, which utilises the Classification and Regression Trees (CART) algorithm to generate trees (Therneau and Atkinson, 2019). The decision tree presented in Fig. 2 is based on a relatively small set of vehicles, albeit vehicles chosen for their high market share, so may be further refined by the addition of more vehicle data. However, it does demonstrate that partitioning vehicles into market segments is viable with a relatively simplistic method and, as discussed previously, the availability of aerodynamic drag coefficients for individual vehicles would largely avoid the need to consider vehicle segments anyway.
One of the benefits of using vehicle emissions remote sensing data for estimating aggregate (e.g. Euro class, fuel type, vehicle model) emissions is that an uncertainty can be calculated. When aggregating the g kg −1 emissions derived directly from individual vehicle emission measurements, the 95% confidence interval in the mean can be calculated. To account for the non-normal nature of vehicle emissions distributions, the 95% confidence interval is robustly estimated using bootstrap resampling approaches using the openair R package (Carslaw and Ropkins, 2012). The calculated uncertainties encompass many sources of variation including the uncertainty of the measurement itself but also issues related to the sampling conditions, such as sample size, ambient conditions and variation in vehicle dynamics remain.
The estimated uncertainties also provide a guide to whether two populations are statistically different from one another. For example, when considering the differences between individual vehicle manufacturer or vehicle models, the uncertainty helps to determine whether there is evidence or not for clear differences in the emission performance of vehicles. Such information is difficult to determine using PEMS and uncertainty information is rarely provided.
Uncertainty estimates can also be derived through GAM models relating the VSP to the emissions of NO x . In this case, the estimated uncertainty in the GAM itself can be used to express an emissions uncertainty when applied to drive cycles over which predictions are made. The benefit of this approach is that where the original data have poor coverage e.g. owing to a lack of measurements over high VSP conditions, the corresponding uncertainty estimated as part of the GAM development will also be higher. Consequently, the uncertainty in the prediction of emissions over different drive cycles will reflect the coverage of the original measurement data.
While our analysis does not explicitly include hybrid vehicles, the remote sensing measurements do provide insight into their operation. A vehicle plume is only considered valid if there is a measurement of CO 2 . The absence of valid CO 2 plumes provides some indication of whether a hybrid vehicle was using an internal combustion engine or not. The data suggest that for all hybrid passenger cars, 27% of the measurements do not have a valid CO 2 plume, compared with only 2% of conventional vehicle measurements of CO 2 . The data suggests that hybrid vehicles operate in battery mode approximately 25% of the time based on the remote sensing measurements. In principle it would be possible therefore to apply the methods developed in this study to a proportion of hybrid vehicle measurements only where there is a valid plume measurement and assume zero emission otherwise.

Validation with a PEMS data set
Vehicle emission factors are typically not expressed at an individual vehicle model level but are aggregated in some way. For example, COPERT's emission factors separate passenger cars by Euro standard, fuel type and broad engine size. For simplicity, the vehicles studied were aggregated into three categories: Euro 5 diesel, Euro 6 diesel and Euro 6 petrol (there being no Euro 5 petrol vehicles in the PEMS data set). GAMs of the two sets of emission factors calculated using the PEMS data set are overlaid in Fig. 3, with the lines labelled "PEMS" showing the factors calculated using Eqs. 9 & 10 and "Modelled" showing the factors calculated using modelled fuel consumption detailed in Eqs. 3-8.
CO 2 emissions in g km −1 are shown as a speed-emission curve. In general, the emission factors generated from the modelled fuel consumption data correspond well with those generated from the PEMS fuel consumption, particularly in the case of the Euro 6 diesel vehicles. When using both curves to predict over a sequence of speeds from 5 to 110 km h −1 , the RMSE values between the two sets of predicted values was 28.2 (Euro 5 Diesel), 11.6 (Euro 6 Diesel) and 50.4 (Euro 6 Petrol). The modelled values in the Euro 5 diesel and Euro 6 petrol vehicles show some underestimation at lower speeds, though the gap rapidly shrinks and is closed by around 15 km h −1 in both cases; indeed the RMSE values drop to 18.5 and 14.3 respectively when only 15 to 110 km h −1 values are predicted over. There is slight underestimation at higher speeds seen in the Euro 5 diesel also. CO 2 emissions in g s −1 are shown as a linear power-emission relationship, which demonstrates the overall concurrence between modelled and PEMS fuel consumption. A shared characteristic in all three of these curves is some deviation between the methods at higher engine powers, around 40 kW. There are fewer data at higher engine powers which may explain this observation. The curves were used to predict a sequence of engine powers from 1 to 70 kW (diesel vehicles) and from 1 to 50 kW (petrol), giving RMSE values of 0.293 (Euro 5 Diesel), 0.579 (Euro 6 Diesel) and 0.514 (Euro 6 Petrol). Fig. 2. Box plots showing the range of surface areas and masses for each vehicle segment present in the UK Department for Transport PEMS data set, with a simple decision tree which could be used for the segmentation of vehicles based on kerb weights (mass, in tonnes), frontal surface areas (area, in m 2 ) and UNECE type approval categories (type, for which passenger cars belong to the M1 classification and vans N1). Note that there are no E-or F-Segment vehicles in the PEMS data set, reflective of their niche status in the UK fleet, so these segments are not featured.

Model sensitivity
In practice, the application of the methods outlined in Sections 2.1 & 2.2 depend on several assumptions concerning the vehicles measured using remote sensing. There are variables needed by the model for which direct measurements are not available. The mass of an unladen vehicle is obtainable from vehicle databases, but the true laden mass of a vehicle is unknown and will depend on factors such as number of passengers and cargo. The auxiliary power component is entirely estimated. While vehicle and acceleration can be measured accurately with speed bars, there will be some uncertainty over the location that is best suited to make the measurements (Jimenez-Palacios, 1998; Rushton et al., 2018).
To examine the sensitivity in emission factors related to the uncertainty in individual model parameters, a single vehicle was taken from the UK Department for Transport PEMS data set. A single vehicle was judged to be sufficient for this analysis as it is expected that the sensitivity of the model will be roughly consistent regardless of the vehicle to which it is being applied. The chosen vehicle was a D-Segment Euro 6 diesel passenger car, chosen for having a very good agreement between measured and modelled journey average CO 2 g km −1 values (calculated using Eq. 11). The model outlined in Eqs. 3-7 was applied to this vehicle repeatedly to produce 1 Hz CO 2 g s −1 emission factors, with variations in the following parameters: C d A, R 0 /R 1 , auxiliary power, acceleration, speed, road gradient and mass. The impact on journey average CO 2 g km −1 values for the vehicle is visualised in Fig. 4.
Auxiliary power has been shown to vary considerably in on-road driving (Carlson et al., 2016). The range of auxiliary powers investigated here (0.25 to 3 kW) induces a large change in estimated emissions of CO 2 , particularly in urban driving. This behaviour is expected for urban driving conditions where there is a greater proportion of driving in lower power conditions, meaning that the auxiliary power accounts for a greater proportion of the total power consumption of the engine.
Uncertainty in vehicle mass also has a greater effect under urban driving conditions, which can be understood by the greater amount of acceleration and deceleration in urban driving. The opposite trend is seen in the air resistance parameter (C d ), with very little change observed in urban driving conditions. This behaviour is expected owing to the lower vehicle speeds under urban driving conditions, with P air being proportional to the cube of vehicle speed. A similar but less extreme trend is seen for R 0 /R 1 .
A different trend is seen when varying the road gradientlittle change is seen in both urban and motorway conditions, but a large effect is seen in hillier rural driving. The overall influence of gradient uncertainty in this analysis is relatively small compared to other parameters, but it would likely be greater and therefore more important for vehicle emission measurements taken in hillier regions.
Focusing on urban-type driving conditionswhere vehicle emissions remote sensing measurements are most commonly madethe variables to which estimated CO 2 emissions are most sensitive are seen to be vehicle mass, speed, acceleration, and auxiliary power demand.
An alternative way to consider uncertainty rather than the uncertainty of individual parameters is the misattribution of vehicle segments. Assuming inaccessibility of market segment information and the use of a decision tree similar to that which is described in Section 2.5, there will be unavoidable misattribution for vehicles that are uncharacteristically heavy or light for their market segment, or have an atypical frontal area. On an aggregate level this is not be a cause for concern; conversely this may be of benefitan atypically shaped vehicle's 'true' C d A, R 0 and R 1 values may be closer to those given for the segment to which it has been incorrectly assigned. Fig. 3. Generalised additive models (GAM) of CO 2 emissions (g s −1 and g km −1 ) taken from the PEMS data set as functions of both power demand and speed. "PEMS" refers to emission factors calculated using Eqs. 9 & 10 and "Modelled" the factors calculated using modelled fuel consumption detailed in Eqs. 3-8. Table 4 summarises the effect of both misattributing the segments and applying the 'average car' parameters to the D-Segment vehicle. The greatest difference is seen when attributing the vehicle an E, F or J Segment, corresponding to an increase of 22 CO 2 g km −1 relative to a correct D-Segment attribution.

Method application to remote sensing data
These methods can be used to estimate emission factors in g s −1 based on remote sensing data, which can be then directly compared with those of other measurement techniques, such as PEMS. Fig. 5 illustrates that similar relationships between NO x g s −1 emission factors and VSP are seen in both remote sensing and PEMS, for example both showing increasing NO x emissions with engine load. Fig. 6 shows truncated journey average g km −1 emission factors from remote sensing determined using the GAM fitting methods outlined in Section 2.3, and truncated journey average g km −1 emission factors determined using PEMS. To ensure a fair comparison, only vehicles present in both the PEMS and remote sensing data sets were used in GAM fitting (43 vehicles -14 Euro 5 diesel cars, 11 Euro 6 diesel cars, 12 Euro 6 petrol cars, and 4 Euro 5 diesel vans). This corresponds to 7939 remote sensing measurements. For this purpose, a 'vehicle' is defined by its make, engine size, fuel type, Euro classification and type approval category. Note that, for fairer comparison, the PEMS data set was constrained to the same VSP range over which the GAMs were fitted.
Overall, there is good agreement between the emission factors from PEMS and remote sensing for the passenger cars. Note also, that the error bars showing 95% confidence intervals overlap for all columns in Fig. 6. There is a much larger disparity seen in the emissions of the vans, however, particularly in urban driving. This disparity may be a consequence of having relatively few vans in the PEMS data set, as well as vans likely being more laden in real-world use as opposed to the PEMS RDE test. There are instances in which the relative order of the driving conditions differs alsoin Euro 6 diesel cars, for example, remote sensing suggests that motorway driving has the lowest emission factor whereas PEMS suggests that it is rural driving.
Journey average NO x g km −1 values can also be calculated for individual vehicle models, shown in Fig. 7. In this instance only urban and rural driving conditions were considered i.e. similar conditions to those experienced for the remote sensing measurements. However, the Emission Detection And Reporting system (Edar) (HEAT, 2017) shows promise for use in motorway conditions (Ropkins et al., 2017). Of the diesel vehicles, the root mean square error (RMSE) between the PEMS and remote sensing (RS) emission factors varies from 0.230 (Euro 6 cars) to 0.616 (Euro 5 vans). A low RMSE is not necessarily expected; each RS emission factor reflects over a hundred individual vehicles whereas the PEMS data represents single vehicle measurements over a single drive cycle. Other work has shown significant variance in PEMS emission measurements for single vehicles tested multiple times, partly due to variance in testing conditions and procedures (Baldino et al., 2017).
A strength of remote sensing is its ability to measure large numbers of vehicles non-obtrusively in a short space of time. In practice this means that even in a relatively modest remote sensing data set there is likely a sufficient range of measurements over a large enough range of VSPs for GAMs to be fitted on an individual manufacturer or vehicle basis. Fig. 8 shows urban-rural journey average g km −1 values from the remote sensing data set for individual vehicles, with a vehicle defined in the same way as in Figs. 6 and 7. Only vehicles with at least 100 measurements were used to ensure sufficient data to fit a GAM relating the NO x emission and VSP. Fig. 4. The percentage uncertainty in the D-Segment Euro 6 diesel passenger car CO 2 g km −1 induced by changes in model parameters. C d A, R 0 and R 1 were changed by ± 10%, acceleration by ± 5%, gradient by ± 20% and speed by ± 2 km h −1 . The range in mass is the kerb weight (lower) to the kerb weight plus 400 kg (higher). The range in P aux is 250 W (lower) to 3 kW (higher). Percentage changes are relative to the base case, defined as the g km −1 factor determined using correct generic parameters for a D-segment diesel vehicle, unaltered speed, acceleration and gradient, kerb weight plus 150 kg, and a P aux of 2.5 kW.  Fig. 8 demonstrates the wide variation in individual vehicle emissions even within a single Euro class. In the Euro 5 diesel cars category, for example, the cleanest vehicle is associated with a 0.55 g km −1 emission, 0.79 g km −1 lower than the highest at 1.34 g km −1 . Similarly for the Diesel Euro 6 Cars category, the cleanest vehicle is at 0.17 g km −1 and the highest at 0.80 g km −1 , a range of 0.63 g km −1 . The vans show similar variation, both for Euro 5 (0.69 to 1.93 g km −1 ) and Euro 6 (0.23 to 1.32 g km −1 ). The variation shown in NO x emissions provides an indication of the extent to which emissions could be reduced if 'best in class' emissions performance was achieved. Furthermore, the differences observed between vehicle manufacturer and model provides information that is useful for understanding the expected variation in NO x emissions resulting from different vehicle fleet compositions.

Conclusions
Remote sensing data offers large data sets of road vehicle emission measurements with good fleet coverage and little selection bias. However, without a measurement of instantaneous fuel consumption it is difficult to transform fuel-based to distance-based emission factors. As the vehicle type approval process and emission inventory development both rely on distance-based emission factors, this difficulty presents a limitation for the use of remote sensing data. Furthermore, comparisons with other commonly used road transport emission measurement techniques (chassis dynamometers, PEMS, etc.) are more limited without expressing emissions in this way. Fig. 5. Trends in NO x g s −1 emission factors as a function of vehicle specific power taken from the whole PEMS and remote sensing (RS) data sets. The g s −1 factors from PEMS are calculated from 1 Hz measurements, and those from RS are taken from individual snapshot measurements. A normalised VSP density of a VSP-based Urban-Rural RDE drive cycle is shown in grey, used later in Fig. 8. A method to model fuel consumption from kerbside measurements and vehicle technical data was developed, and is sufficiently general to be applied to any emission species measured using remote sensing and indeed any point-sampling measurement method that provides a pollutant to CO 2 ratio. In the current work, a relatively modest data set of remote sensing data was used to develop and demonstrate the method. However, there has been a considerable increase in the number of vehicle emission remote sensing data campaigns in recent years (The Real Urban Emissions Initiative, 2014;Bernard et al., 2019;Ropkins et al., 2017). Large databases such as these would enable the methods outlined in this study to be used to calculate g km −1 emissions for a large range of vehicle models and driving conditions.
Arguably the main benefit of the approach is that it can in principle be applied to any vehicle drive cycle. This development is of importance for the analysis of vehicle emission remote sensing data where measurements tend to be made of vehicles mostly (but not always) under load. The potential to re-calculate emissions for more representative full drive cycles therefore addresses the potential issue of remote sensing site selection bias, where measured emissions would on average be higher than a typical full drive cycle. Indeed, with the increasing amounts of drive cycle data available, there is the potential to apply the method to large databases of actual vehicle activity over a large range of conditions.
A common shortcoming of current remote sensing data sets is a lack of measurements under high speed and VSP conditions, making its

Diesel Petrol
Diesel Petrol Fig. 7. A comparison between truncated journey average NO x g km −1 values derived from remote sensing (RS) and PEMS data. Each point represents an individual vehicle, defined as being a unique manufacturer-engine size combination with at least 100 measurements in the remote sensing data set. The solid grey line shows the 1:1 relationship. The remote sensing (RS) factors are taken from the predictions of GAMs relating NO x g s −1 to VSP over a VSP-based drive cycle taken from an Urban-Rural real driving emissions (RDE) test. The PEMS factors are the truncated journey average emission factors for the corresponding vehicle. The error bars show the 95% confidence interval of the mean for the remote sensing emission predictions.
* Fig. 8. NO x g km −1 values generated from the predictions of remote sensing (RS) fitted GAMs over a truncated urban-rural on-road drive cycle on a per-vehicle basis. Vehicles have been anonymised, but each is taken to be a unique manufacturer-engine size combination with at least 100 measurements. Error bars show the 95% confidence interval. Blue dashed lines show the mean NO x g km −1 values in each vehicle category. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) application to motorway portions of vehicle drive cycles inappropriate. As remote sensing technology advances, however, this gap in measurements should decrease and allow for emissions to be modelled over full drive cycles.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.