Using snapshot measurements to identify high-emitting vehicles

Policy makers have long been interested in detecting ‘high-emitters’, a supposedly small fraction of vehicles that make disproportionally large contributions to total fleet emissions. However, existing identification schemes often exclusively rely on snapshot measurements (i.e. emissions within less than a second), and thus simply identify vehicles with high instantaneous emissions, instead of vehicles with high average emissions over a driving period as regulated by emission standards. We design a comprehensive scheme to address this challenge by combining fleetwide remote sensing measurements with detailed second-by-second emission measurements from individual vehicles. We first determine the trip-average NO x emission rates of individual vehicles in a Euro-5 diesel fleet measured across European locations; this allows, second, to calculate the fraction and emission contributions of high-emitters based on trip-average emission. We demonstrate that the identification of high-emitters is quite uncertain as long as it is based on single snapshots only; but 80% of the high-emitters can be identified with over 75% precision with five or more repeated measurements of the same vehicle. Compared to the conventional detection schemes, our scheme can increase the identified high-emitters and associated emission reductions by over 140%. Our method is validated and shown to be superior to the conventional interpretation of snapshot measurements.


Introduction
Motor vehicles are a key contributor to air pollutant emissions, most notably nitrogen oxides (NO x ), which poses substantial threat to air quality, public health, and climate change worldwide [1][2][3][4][5][6]. It has been previously suggested that a small fraction of vehicles (i.e. 'high-emitters') dis-proportionally contribute to a large fraction of total fleet emissions, due to lack of maintenance, tampering, and/or usage of defeat devices [7][8][9][10][11]. Therefore, identification and then repairing or removing these high-emitters is often viewed as a cost-effective strategy to reduce onroad vehicle emissions [12][13][14]. However, the existing inspection programs in many countries often fail to detect high-emitters in the real world [15][16][17][18][19][20][21]. Increasing effort has been invested in measuring onroad emissions from in-use vehicles (e.g. the in-use surveillance tests by US Environmental Protection Agency [22]). However, these tests only cover a limited number of vehicles, as testing every in-use vehicle constitutes a significant logistical and financial burden to both vehicle owners and authorities.
Remote sensing (RS) offers great potential for identification of high-emitters, as RS devices can remotely measure the emission rates of a large number of vehicles during pass-by without interrupting traffic [23][24][25]. However, only a few countries/regions have used or proposed to use onroad RS measurements to assist the identification of high-emitters (e.g. mainland China [26], Hong Kong [27], Scotland [28], and certain states in the US [25]).
One key issue is that RS devices only take snapshot measurements as vehicles pass by the sensor (equivalent to emissions within less than a second), and therefore cannot characterize the average emission level of individual vehicles over a representative driving period [29][30][31]. Existing identification programs have used excessively high identification thresholds (e.g. 1500 ppm for NO emission for heavy-duty trucks in China, ∼9× the China 5 standard [32,33]); then high instantaneous emission rates do indeed suggest the identified vehicles have high average emissions, but such high identification thresholds may result in a large number of undetected potential high-emitters.
Most previous studies define high-emitters and estimate their contributions to total fleet emissions purely based on snapshot measurements derived from RS. These studies find highly-skewed distributions of the instantaneous emissions and calculate that the dirtiest 10% vehicles contributed to over 50% of the total pollutant emissions, mainly focusing on gasoline fleets [8,11,30,[34][35][36][37]. However, the distributions of instantaneous emissions measured by RS, have shown to be very different from the distributions of average emission factors calculated with high-frequency emission measurements from chassis dynamometer or portable emissions measurement system (PEMS) [31]. Contrary to RS measurements, measurements from chassis dynamometer or PEMS consist of continuous records of vehicle emissions with resolutions of one measurement per second for an extended period of time (e.g. half an hour), but are only available for a very limited number of vehicles due to the measurement cost [38].
Only a few studies define high-emitters based on average vehicle emissions and combine highfrequency measurements and snapshot measurements to identify them. Huang et al define highemitters as vehicles whose average emission factors are above two times the emission limits, and then use the 99 th percentile of the instantaneous emissions from chassis dynamometer tests of the vehicles that passed tests as threshold for high-emitter identification [39]. Researchers have also used statistical models such as neural network methods to predict highemitters using snapshot measurements, speed, vehicle age, and meteorology information [40,41]. However, these approaches do not establish a relationship between snapshot emission and average emission, and often just focus on the binary label (whether a vehicle is a high-emitter or not). They also fail to explore the advantage of using repeated instantaneous measurements, and lack rigorous evaluation of the uncertainty and effectiveness of the identifications. To date, very little is known about the fraction of high-emitters (defined based on their average emission factors) in real-world fleets, their contributions to total emissions, and the associated precision and effectiveness of identifying these high-emitters with snapshot measurements.
Here, we develop a comprehensive scheme which combines second-by-second PEMS and chassis dynamometer data with a large RS dataset, to identify vehicle with high average emission factors. We focus on NO x emissions from diesel passenger cars certified to the Euro 5 emission standards (EU-5D), due to the importance of NO x from diesel vehicles for air quality and relatively low measurement error of the RS devices [42]. Applying our method to over 130 000 RS measurements of EU-5D passenger cars from nine European cities, we estimate the distribution of average emission factors and calculate the fraction of high-emitting and clean vehicles for each fleet. We then quantify the accuracy and uncertainty of identifying individual high-emitting vehicles with a few snapshot RS measurements, as a function of the number of repeated RS measurements and the classification rules. Finally, we present a versatile method to validate our algorithm for estimating the average emission factors, demonstrating that our algorithm is significantly superior to the conventional interpretations of snapshot measurements. We conclude by discussing the implications to policy makers and practitioners on using RS snapshot measurements to identify high-emitters.

Method
Throughout our analysis, a high-emitting vehicle is defined as having an average emission factor higher than an absolute threshold value (here, usually two times or more the emission limit value for EU-5D). To identify them with RS measurements, we first design an iterative algorithm that can estimate the distribution of vehicle-level average emission factors in the measured fleet. To do this, we use an extensive set of second-by-second PEMS/chassis measurements to establish the relationship between instantaneous emissions and the average emission of the test cycle, i.e. the variability of instantaneous emissions around the mean. With these variability relationships, our algorithm then iteratively estimates the distribution of the average emission factors that can reproduce the set of observed instantaneous RS measurements. We then use the derived distribution of average emission factors to simulate a set of instantaneous emissions with known average vehicle emissions. Using this simulation dataset, we validate our algorithm and quantify the precision and effectiveness of identifying individual high-emitting and clean vehicles, under different numbers of repeated RS measurements and classification rules. We further perform several sensitivity analyses to explore how varying algorithm assumptions and modifications of the second-by-second measurements influence our results (see supplementary methods available online at stacks.iop.org/ERL/17/044045/mmedia).

On-road remote sensing (RS) measurements
We use a collection of 131 284 RS records of the EU-5D passenger cars measured in 9 cities (Antwerp, Basel, Bruges, Ghent, Gothenburg, London, undisclosed cities in Spain, Stockholm, and Zurich) during 2011-2019 [24]. Due to the strong association between ambient temperature and vehicle emissions [43], we differentiate RS records into three categories based on ambient temperature: Below 10 • C, 10 • C-20 • C, and above 20 • C. To ensure the comparability between RS measurements and emission limit values established in laboratory test environments, we only focus on vehicles whose instantaneous vehicle specific power (VSP, calculated following [44]) are in the range of 3-22 kW per metric ton. We calculate fuelbased NO x emission factors (unit: g NO x kg −1 fuel) as the product of NO x /CO 2 ratio (unit: ppm/ppm), the molecular weights of CO 2 and NO x , and CO 2 intensity of diesel (3.13 kg CO 2 kg −1 diesel following [45]). We omit the highest and lowest records (0.1% each) to remove outliers of instantaneous measurements. Our final sample consists of 79 576 RS records. A summary of the testing conditions and characteristics of measured fleets can be found in table S1. For each test cycle, we calculate the fuel-based NO x emission factors in a way similar to the RS data. Average NO x emission factors of test vehicles (over the test cycle) range from 2.2 g NO x kg −1 fuel to 25.2 g NO x kg −1 fuel, covering both clean and highemitting vehicles. Following recommendations from researchers directly working with these data, we calculate the three second moving averages of NO x and CO 2 emissions to address the potential issue of misaligned NO x and CO 2 measurements. The highest and lowest (0.5% each) emission factors are omitted for PEMS/chassis measurements to remove the outliers of the measurements.

Decomposition of RS emission records
For a vehicle i, an observed RS record of instantaneous emission can be decomposed into the sum of its (unknown) average emission factor and the variability around its average emission: where RS i denotes the instantaneous emission of vehicle i measured by RS. AE i denotes the unknown average emission factor of vehicle i over some average driving conditions (characterized by the test cycles in our database). f i is the underlying instantaneous variability of this measurement. f i can be viewed as a draw from an underlying distribution of the instantaneous variability of vehicle i (F i , f i ∼ F i ), conditioned on the VSP value. The mean value of F i is always zero, but the shape of F i describes the variability of instantaneous emissions around the average emission factor of vehicle i. F i is difficult to derive from RS data, as most vehicles were only measured a few times and the small number of repeated measurements (usually below 5) is insufficient to constrain the variability distribution. Therefore, we use PEMS/chassis measurements to derive information of the variability distributions.

Iterative algorithm to estimate the distribution of average emission factors
We design an iterative algorithm to estimate the distribution of average emission factors using a set of RS measurements in which every vehicle is measured only once. Given a set of variability distributions derived from PEMS/chassis measurements, our algorithm estimates the distribution of average emission factors which produces a set of instantaneous emission records whose distribution is indistinguishable from the original distribution of the RS records. In this section, we only provide a high-level summary of the algorithm, with more details reported in the SI.
The main idea of the algorithm is to match each RS record (and therefore each vehicle in the RS data) with a second-by-second measurement profile (i.e. a test cycle). The matched second-by-second profile characterizes the potential relationship between instantaneous emissions and the average emission of the matched vehicle measured in RS data. The average emission factor of the associated second-bysecond profile is used to estimate the average emission factor of the matched vehicle in the RS data. The algorithm performs the matching process in an iterative manner. The algorithm starts with an initial estimate of the average vehicle emission factor for each vehicle in the RS data. At each iteration, the algorithm matches a vehicle in the RS data with one second-by-second measurement of similar average emission factor. After the matching step at each iteration, the algorithm simulates a set of instantaneous emission measurements by sampling one instantaneous emissions record from the matched second-by-second data. Only the instantaneous emission record with a similar VSP value as the original RS record is sampled from the second-by-second measurement profile. The algorithm compares the distribution of the simulated instantaneous emission with the distribution of the RS data, and then updates the estimated average emission factor of each vehicle in the RS data, to reduce the distance between the distribution of simulated instantaneous records and the distribution of RS records. The algorithm terminates if no further improvement could be made to reduce the distance between these two distributions.

Validation of the algorithm
We design the following experiment to validate our algorithm. We create a set of instantaneous emission records by sampling from the PEMS/chassis measurement data, as proxies for RS measurements. By doing this, the average emission factors associated with these instantaneous emissions are known to us. We then apply the iterative algorithm to the simulated test set to estimate the distribution of average emission factors and compare the results with distributions of true average emission factors of the test set. We randomly select half of the PEMS/chassis measurements (that cover both clean and high-emitting vehicles) to generate the test dataset and use the other half as inputs of our algorithm.

Identification of individual high-emitting and clean vehicle
To identify individual high-emitting and clean vehicles, we first simulate a hypothetical fleet using the estimated distribution of average emission factors and the PEMS/chassis measurements. We illustrate the idea with the fleet measured in Zurich with ambient temperature above 20 • C due to the largest number of records available. The average emission factor of each vehicle in the hypothetical fleet is randomly drawn from the distribution of the estimated average emission factors for Zurich (derived from our iterative algorithm). Each vehicle is then matched with a PEMS/chassis cycle with a similar average emission factor. For each vehicle, we then simulate ten independent instantaneous emission records with ten independent random draws from the matched PEMS/chassis cycle. In practice, the repeated instantaneous emission measurements for one vehicle can be obtained if a vehicle either drives through one device multiple times (which occasionally occurred in realworld measurement campaigns), or is measured by multiple deployed RS devices (in a future campaign).
We focus on the identification of 'super highemitters' , defined here as vehicles with average emission factors more than five times the type approval limit (5 × 3.5 = 17.5 g NO x kg −1 fuel), since more than 90% of EU-5D passenger cars already have average on-road NO x emissions above the type approval limit. We convert the distance-based emission limit value over the test cycle (here the NEDC for the Euro 5 cars) to a fuel-based emission limit using the cycle average fuel economy, which has been measured well for classes of vehicles. For each potential cut-off threshold of RS measurement, we calculate the resulting identification precision (i.e. correctly identified high-emitters/identified high-emitters) and the identified fraction (i.e. correctly identified highemitters/actual high-emitters, or 'recall'). We also apply the similar idea to the identification of 'clean vehicles' with average on-road emissions below the type approval limit (3.5 g NO x kg −1 fuel).
With repeated instantaneous measurements of one vehicle, we evaluate two simple classification rules. The first rule classifies a vehicle as a highemitter/clean vehicle if the average of repeated instantaneous emissions are above/below a threshold ('average rule'). The second rule counts all measurements and classifies a vehicle as a high-emitter/clean vehicle only if all repeated measurements are above/ below some threshold ('count all rule'). The two rules are selected based on their simplicity to operate. The 'count all rule' can be overly stringent under a large number of repeated measurements (e.g. very few vehicles may have all five repeated measurements above certain threshold), but the high precision levels might be appealing to policy makers as it puts a light burden on drivers of normal vehicles. Figure 1 shows the second-by-second measurements and the derived distributions of instantaneous variability for two vehicles with different average emission factors, as illustrative examples. Over an extensive driving period, we find that the instantaneous emissions are highly skewed towards lower values. On average for the 163 test cycles, 65% of the instantaneous records are below the average emission rate of the respective trip. Therefore, it is more likely than not that a random snapshot RS measurement has an emission rate less than the average value of the same vehicle. At the same time, the few extremely high instantaneous emission records are also not representative for the average behavior of the vehicle; the 99th percentile of the instantaneous emissions is 1.6-9.8 times the average emission factor. We also find large differences in the shape of variability distributions between clean and high-emitting vehicles. The variability distributions of high-emitting vehicles are flatter with a more negative mode and a longer tail, despite all distributions having zero mean. This highlights the importance of having an extensive set of second-by-second measurement data which can differentially characterize the variability of highemitting and clean vehicles.

Determining the distribution of average emission factors
Differences between instantaneous and average emission shown in the second-by-second measurements are reflected in the analysis performed on EU-5D fleets measured across Europe. Figure 2 shows the distributions of instantaneous RS measurements and the derived average emission factors for each city in our RS dataset. Consistent with previous literature on EU-5D vehicles, we observe excessively high instantaneous NO x emissions with mean RS emission rates of 9.2-20.1 g NO x kg −1 fuel (2.6-5.7 times the type approval limit for EU-5D) across different cities and temperature conditions. These high instantaneous emissions are associated with vehicles with excessively high average emission factors. However, distributions of the estimated average emission factors (solid lines in figure 2) differ substantially from distributions of the RS emission rates (dashed lines in figure 2). The distribution of average emission factors peaks at larger values (approximately 10 g NO x kg −1 fuel) compared to the distribution of RS records (approximately 5 g NO x kg −1 fuel); in addition, there are significantly less extreme values (both low and high). This suggests that in fact there are substantially less vehicles with extremely low or high average on-road emissions than previous estimates purely based on single instantaneous RS records. For example, we estimate that only 2.2% of the measured fleet in Zurich have an average emission factor below the type approval limit, while 11% of the instantaneous RS records are below the same limit value (an overestimation of 413%). Across nine cities, only 0.1%-8.7% measured vehicles have an average emission factor below the type approval limit, while 8%-20% of the RS records are below the same limit value. We observe similar differences between the estimated average emission factors and instantaneous measurements across all temperature conditions and ages of the measured vehicles (see figure S1). These results are largely independent of the algorithm assumptions and treatment of PEMS/chassis measurements (see supplementary methods).
There is substantial variability of the estimated average emission factors of individual vehicles within each city and between cities. The average emission factors decrease as the ambient temperature increases (see figure S1); this reflects the 'thermal window' mechanism reported earlier that the emission control devices are optimized for the temperature conditions of the type approval testing procedures (around 24 • C) [43]. There is no evident relationship between average emission factors and ages of the vehicles, extending earlier analysis [47]: Newer EU-5D vehicles do not have lower emission rates than older vehicles (see figure S1). We also find distinctly different emission rates of different vehicle brands (figure S2). For example in Zurich, the cleanest brand (BMW, n = 2373) has a median average emission factor of 11 g NO x kg −1 fuel, while the dirtiest brand (Renault, n = 952) has a median average emission factor of 31 g NO x kg −1 fuel (or 2.9 times the cleanest brand). However, there is large variability of vehicles' average emission factors within each brand as well, with most vehicles emitting excessively high NO x over a representative test cycle.

Identification of individual high-emitting and clean vehicles
Next, we evaluate the uncertainty and effectiveness of using RS snapshot measurements to identify individual high-emitting vehicles with inferred average emission factors. As an illustration, we focus on the Zurich fleet measured with ambient temperature >20 • C. Our algorithm estimates that 28% of the measured vehicles in Zurich are super high-emitters with average emission factors more than 5 times the type approval limit, which accounts for 76% of the total emissions of the fleet (calculated with their average emission factors). The identification of highemitting vehicles is highly uncertain if based on one RS record alone. For instance, if the cut-off threshold is set at 17.5 g NO x kg −1 fuel (i.e. the car will be naively classified as a super high-emitter based on this one record), there is only a chance of 63% that this car is actually a super high-emitter ( figure 3(A)). Increasing the cut-off threshold will increase precision to detect actual super high-emitters. But surprisingly, the identification precision of a super high-emitter levels off at about 77%, even if the cut-off threshold is increased further. This is because vehicles with relatively low average emissions can also have quite high instantaneous emissions, making it impossible to distinguish between super high-emitters and vehicles with less high levels of average emissions based on only one instantaneous measurement.
The precision increases substantially, however, if the same vehicle is measured several times; this is shown by simulations of multiple independent instantaneous emissions based on RS and PEMS/ chassis data, not from the rather limited repeated measurements in the RS data. Here, we evaluate two classification rules-one based on the average of repeated instantaneous emissions ('average rule' , solid lines in figure 3) and the other that counts all measurements ('count all rule' , dashed lines in figure 3).
Already with two repeated RS records that are both five times above the type approval limit value, the precision of identifying super high-emitters increases to 83%; with five RS records the identification is 99% accurate ('count all rule'). An increasing number of repeat RS records also allows lowering the RS threshold while maintaining the precision level. For instance, if 75% precision is required for detection of super high-emitters, the cut-off threshold for a single RS record would need to be 30 g NO x kg −1 fuel (or 8.6 times the type approval limit value). The threshold could substantially decrease to 14 g NO x kg −1 fuel or 8.7 g NO x kg −1 fuel respectively with two or five repeated records (using the 'count all rule'). Under the same RS thresholds and same number of repeated measurements, the 'average rule' is less stringent than the 'count all rule' , and hence results in a lower precision level for the same threshold (the difference between solid and dashed lines in figure 3(A)).
Similar relations hold for the identification of clean vehicles (figure 3(C)): Only 11% of vehicles with an instantaneous RS record below the emission limit are actually clean vehicles with average emissions below the type approval limit. To ensure a 75% precision with a single record, the RS threshold would need to be set as low as 0.2 g NO x kg −1 fuel; this low value is however not operational as it is at the detection limit of common RS devices. Using higher cut-off thresholds (e.g. 1 g NO x kg −1 fuel) yet reduces precision to 34%. However, the precision value increases to 85% or 99%, if two or five RS measurements for the same vehicle are both below 1 g NO x kg −1 fuel respectively. Figures 3(B) and (D) show the trade-offs between the identification precision and identified fraction with varying cut-off thresholds, for different numbers of repeated measurements. With one measurement, one can only identify 29% of the total super high-emitters and 36% of the total clean vehicles at a 75% precision level. With repeated measurements, less stringent RS thresholds can be adopted for the same precision level and more super highemitters or clean vehicles could be identified. At 75% precision level (using 'average rule'), one can identify 40% of the total clean vehicles with two repeated measurements and 49% of the total clean vehicles with five repeated measurements. Improvements for super high-emitter identification are even more substantial-one can identify 58% of all super high-emitters with two repeated measurements and 80% of super high-emitters with five repeated measurements. A higher fraction of identified super highemitters leads to substantial increases in NO x emission reductions. If all identified super high-emitters at the 75% precision level were replaced with vehicles with average emissions at the type approval limit, an identification program based on single measurement can reduce total NO x emissions (of the Zurich fleet) by 15%, while a program based on five repeated measurements could reduce total NO x emissions by 36% (with the 'average rule') or 31% (with the 'count all rule'). We observe very similar performances offered by either classification rule in clean vehicles identification, but meaningful differences between the two classification rules in high-emitter identifications. Using 'count all rule' can achieve almost perfect identification precision at the cost of a smaller fraction of super high-emitters identified, while using 'average rule' can identify almost all super highemitters if a lower precision level is allowed (see figure 3(B)).

Algorithm validation
Our algorithm estimates the fraction of clean and high-emitting vehicles of measured fleets based on instantaneous RS emissions. The ideal way to validate a high-emitter identification scheme would be to do confirmatory testing of these vehicles with PEMS/ chassis tests, and calculate their average emission factors and share of high-emitters in the fleet, preferably on the spot. In practice, this is so expensive and time-consuming that it is completely impractical on a large number of vehicles. Here, we propose a simulation method validating any such algorithm without a life experiment.  )) and high-emitting vehicles (panel (B)) estimated with our iterative algorithm (black dots) and the naïve approach which interprets instantaneous emissions as average emissions (orange triangles). The estimated fraction is shown on the y axis and the true fraction in the test data is shown on the x axis. The black dashed line indicates the 1:1 line. The error bar of the iterative algorithm estimates shows the 95% confidence interval estimated with the 20 random implementations of the algorithm. Clean vehicles are defined as vehicles whose average emission factors are below 3.5 g NOx kg −1 fuel. High-emitters are defined as vehicles whose average emission factors are above two times the emission limit (7.0 g NOx kg −1 fuel).
Using the PEMS/chassis measurements, we simulate a test dataset of instantaneous emissions as proxy for RS measurements; the average emissions of the simulated vehicles are exactly known to us. Figure 4 compares the fraction of high-emitting and clean vehicles estimated by our algorithm with the true fraction in the test dataset. As shown in figure 4, our algorithm performs well in estimating the fraction of high-emitting and clean vehicles in the test datasets, and significantly outperforms the naive estimates that treat instantaneous emissions as the average emission factors. Across fleets with different fractions of high-emitting and clean vehicles, the naive approach estimates a higher fraction of clean vehicles and a lower fraction of high-emitting vehicles by 10%-30%, while the biases of our algorithm estimates are only 5%-10%. Our algorithm offers the biggest improvement when applied to the dirtiest fleet (which is closest to the real-world conditions of EU-5D). The improvement offered by our algorithm slightly decreases but remains significant as the test simulated fleet becomes cleaner. Figure S3 shows the estimated and true distribution of average emission factors. Across all simulated fleets, we observe that the distributions of estimated average emission factors are substantially more similar to the distributions of true average emission factors, compared with the distribution of the instantaneous measurements; the Kolmogorov-Smirnov distances between the two distributions (a standard metric that measures the distance between two distributions) reduce by 40%-80%.

Discussion
Here we develop a scheme to combine instantaneous RS records with second-by-second measurements from PEMS or chassis dynamometer tests to identify vehicles with high average emission factors. With only one snapshot measurement for each vehicle, our algorithm can successfully determine the distribution of average emission factors of fleets measured by RS and subsequently calculate the fraction of vehicles as below/above a chosen 'clean' or 'dirty' threshold, respectively. Our algorithm aims to identify vehicles with high average emission factors of varying thresholds, whether the fraction of highemitters is over 50% as for EU-5D, or just a few percentages as for gasoline cars or EU-6D. In its application to EU-5D passenger cars, we find the instantaneous measurements significantly overestimate the fraction of vehicles in compliance. This is likely the case for all other vehicle classes and pollutants as well, since instantaneous emissions are highly skewed towards zero. Furthermore, the contributions of the dirtiest 10%-20% vehicles are overestimated if one directly uses instantaneous measurements (see figure  S4). This suggests that targeting only the dirtiest few percentages of EU-5D fleets is misleading but that rather the top 50% should be targeted as their average emission factors are not much lower than the top 10%, at least in the case of NO x emissions from diesel cars.
Our analysis is highly relevant to the policy discussions on the RS-based identification programs globally. The trade-off between precision and identified fraction of such programs is quantified for the first time. It provides a rigorous way to design these programs by choosing the RS thresholds, number of repeated measurements, and classification rules depending on desired precision, identified fractions, and program budgets. Local decision makers can choose any point on the trade-off curves based on local priorities. Our analysis suggests the popular high-emitter detection based on single measurement is highly uncertain and ineffective for the current EU-5D fleets, while identification programs with repeated measurements of the same vehicle can yield substantially higher precision and would detect more high-emitters (or clean vehicles).
Joining RS data and high-frequency measurement data from PEMS or chassis dynamometers has great potential. Linking the RS data to detailed highfrequency measurements can help researchers analyze RS data down to individual vehicles, which can provide a much more comprehensive understanding of the vehicle emitting behaviors of the measured fleet beyond instantaneous emissions. In fact, one can also try to understand, for example, extreme emissions or emissions under certain driving conditions with a similar framework. Vice versa, RS data and the estimated distribution of average emission factors provide context for the detailed measurements of individual vehicles with PEMS or chassis dynamometers, for example, to understand whether the measured vehicle is representative of the local fleet.
Our analysis has several limitations. The iterative algorithm introduced here estimates a distribution of the average vehicle emission factors, which can recover the distribution of the measured RS records when combined with the variability distributions derived from second-by-second PEMS/chassis measurements. However, multiple solutions may exist and our algorithm is only able to find one of them. The impacts seem limited in our case, as our results are robust to many variants of the approach, including different ways of processing the PEMS/chassis measurement data, number of candidate profiles, and different ways of algorithm initialization (see supplementary method). More research is needed to better understand how different assumptions can influence the algorithm performances under different contexts. Like most previous studies analyzing the RS data, we only calculate the fuel-based emission factor and do not consider the potential heterogeneity in fuel economy. This issue is likely small in our case for EU-5D, as the biggest deterministic factor of average vehicle emission is not fuel economy but whether a vehicle has installed a defeat device or not. For future analysis, our algorithm could be extended to incorporate the fuel economy heterogeneity among vehicles by adopting identification threshold values as a function of vehicle sub-class or fuel economy.
Our framework can be further applied to gasoline cars and Euro-6 diesel (EU-6D) cars. With a reduction in overall NO x emissions and a lower fraction of super high-emitters, identification of highemitters among gasoline and EU-6D cars may become more rewarding despite greater challenges. One thing that needs to be checked is whether the current RS and PEMS/chassis measurements are accurate enough at lower emission values to detect highemitters with low absolute emission levels. We may expect that in such situations a few more repeated measurements may be needed to achieve the same precision compared with the EU-5D fleet studied here. While second-by-second test data have been limited in the past, this is no obstacle any more as manufacturers are required to provide such data publicly as part of the Euro 6 legislation in Europe. This framework can also be easily adopted to study different pollutants with available RS and secondby-second measurement data, e.g. hydrocarbons.
Extending the algorithm to heavy-duty vehicles and emissions of particulate matter would be highly desirable for air quality management in the transportation sector.

Conclusion
Our research establishes a framework of using RS and high-frequency measurement data for developing the link between instantaneous and average emissions and identifying high-emitters, at the example of NO x emissions of EU-5D passenger cars. Our method shows that interpreting instantaneous emissions as the average emitting behavior can be highly misleading, while a combination of RS and second-bysecond test data can help address this gap. Compared to the currently-used conventional method of high-emitter detection based on a single measurement, we demonstrate that programs based on 2-5 repeated measurements capture 98%-174% more high-emitting vehicles with the same precision, and lead to 88%-140% increase in NO x emission reductions. Combining RS and high-frequency measurement data, we present a simulation approach that can allow testing and verification of algorithms without (costly or complicated) life experiments.

Data availability statement
The code and sample data that support the findings of this study are openly available at the following URL/DOI: https://doi.org/10.5281/zenodo.6341957. The original data of remote sensing and PEMS/chassis measurements are available upon request. and innovation programme, as part of the CARES project under Grant Agreement No. 814966.

Author contributions
M Q developed the approach and analyzed the data; J B K designed the research; both wrote the paper together.