Development of a Driving Cycle for Fuzhou Using K-Means and AMPSO

/e driving cycle is a speed-to-time curve, a fundamental technique in the automotive industry, and also a basis to set standards for fuel consumption and emissions of vehicles. A driving cycle is developed based on firsthand driving data collected from fieldwork. First, bad data in the original dataset are preprocessed, the time-series standard smoothing algorithm is used to smoothen the data, and Lagrange’s interpolation is used to realize data interpolation. Next, the rules for kinematic fragment extraction are set to divide the data into kinematic fragments. Last, an evaluation system of kinematic fragment feature parameters is built. On that basis, theK-means clusteringmethod is used to cluster the dimensionally reduced data, and the adaptive mutation particle swarm optimization (AMPSO) algorithm is employed to select the optimal fragments from candidate fragments to develop a driving cycle. /e experiment result shows that the developed driving cycle can represent the kinematic features of the experiment car and provides a basis for the development of a driving cycle for Fuzhou.


Introduction
Energy consumption and exhaust emissions are two major issues the automotive industry needs to address [1]. Exhaust emissions account for a major cause for air pollution and have drawn worldwide attention [1][2][3]. A car driving cycle is a curve representing the speed of a vehicle versus time; it is a common fundamental technique that the automotive industry employs to simulate actual traffic conditions [4], assess energy economy [5], and exhaust emissions [6]. Currently, typical driving cycles in the world include the Worldwide Light-duty Test Cycle (WLTC) [7], the Federal Test Procedure (FTP), the New European Driving Cycle (NEDC), and the Japanese JC08 Cycle.
NEDC is currently used in China to assess energy economy and emission levels, but it is a modal cycle constructed based on ideal conditions that are unachievable in reality [9], consisting of fragments of constant-speed modes, constant acceleration modes, and constant deceleration modes [10]. Figure 1 shows the NEDC. e emission levels and fuel consumption data obtained by using this ideal cycle will be far removed from reality. André et al. [11] found that to use one single cycle in different driving conditions would lead to inaccurate assessment results. Research by Lin and Niemeier [12] revealed that regional driving differences would result in significant driving cycle differences. erefore, using one single driving cycle for emission estimation would be defective. erefore, in order to construct a driving cycle that accords with a city's actual conditions, it is necessary to base the research on the actual driving data of that specific city. Many existing studies on driving cycles are based on local driving conditions. Ho et al. [13] developed a Singapore driving cycle (SDC) for roads and traffic conditions in central business districts of Singapore. Fotouhi and Montazeri-Gh [14] developed a driving cycle for Tehran using K-means clustering; other studies have developed driving cycles for different regions such as Toronto waterfront area [15], Chennai in India [16], Hamburg [17], Tianjin [18], Edinburgh, and Abu Dhabi [19].
Two major methods are currently used for the development of driving cycles. e first is the clustering method [20], which divides the original data into kinematic fragments (microtrips), clusters these fragments, selects fragments with the least deviation from the general condition, and combines these fragments into a driving cycle [21,22]. Typical clustering methods for driving cycle development include K-means clustering [23], hierarchical clustering [24], and fuzzy clustering [25]. e central step in clustering is the selection of proper microtrip fragments. Clustering requires less computation than random selection methods, but the determination of the value of k (the number of clusters) relies on previous experience. e other driving cycle development method is the Markov chain-based vehicle speed prediction method [26]. is method considers the vehicle speed changes as a random process and builds a vehicle state transfer matrix to predict the vehicle's speed. However, this method entails large amounts of data, the accuracy of prediction relies on the number of iterations of the Markov algorithm [27,28], and the kinematic fragments lack continuality [29].
To develop a car driving cycle for Fuzhou, we analyzed the local driving features using the clustering method. We then constructed a microtrip fragment selection method that would identify fragments with the least deviation from the average feature parameters and selected the kinematic fragments using the adaptive mutation particle swarm optimization (AMPSO). We developed a typical car driving cycle based on driving data collected in Fuzhou. e statistical features of the driving cycle were also analyzed and compared to other cycles. e algorithm used in this study can provide a reference for driving cycle development, and the research results will provide a basis for the design of vehicle estimation standards.

Methods
e sample data used in this study are obtained from field surveys on roads in Fuzhou by a university to research on the driving cycle of lightweight cars. us, the sample data can reflect the actual road and traffic conditions in Fuzhou. e dataset was used in the 2019 China Postgraduate Mathematical Modeling Competition, the top competition in mathematical modeling in China. erefore, we believe that this dataset is reliable and can reflect the real situation of the road. e sample consists of three data files with data collected by the same car at different time periods. e sampling frequency is 1 Hz, and the three files contain 185,726, 145,826, and 164,915 pieces of data, respectively. e collected data include the time, GPS vehicle speed, longitude and latitude, engine speed, engine torque percentage, and instantaneous fuel consumption. In order to develop a vehicle driving cycle, only the GPS speed is analyzed in this study. Other information including the acceleration and driving distance is obtained by differential operation and integral operation. Because of minor problems of the sampling device and the vehicle, there are errors in the collected data. When the sample data are processed, the following five types of data will be removed: first, discontinuous data caused by the loss of vehicle speed data; second, data of abnormal acceleration and deceleration; third, data collected when the vehicle is under a long-time halt state; fourth, abnormal data collected when the vehicle moves at a low speed for a long time; and last, data collected during an idling event longer than 180 s. Figure 2 shows the steps of the driving cycle's construction.

Hypotheses
(1) We assume all the sampled data, except the aforementioned five types of abnormal or invalid data, are valid (2) If the vehicle moves at a speed below 10 km/h in an on-and-off state for over 30 s, it is considered that there is a traffic congestion, and this situation is regarded as an idling event (3) Data on a speed above 180 km/h are considered as abnormal data (i.e., the speed should not be 150% higher than the set maximum) (4) If a data segment remains incomplete after data interpolation, the segment of data till the next starting time point of an idling event will be removed and not included in the kinematic fragments (microtrips)

Data
Preprocessing. GPS speed data are used as an indicator for driving cycle development in this study. e initial value of each data file at the starting time is set as 1. Figure 3 shows part of the speed-time chart obtained by the preprocessed data. Some bad data still exist in the preprocessed data files, such as discontinuous data, abnormal acceleration/deceleration, long-time halt, long-time traffic congestion, and long-time idling events. ese bad data can be classified into two groups: abnormal data and missing data.  [30]. Data on idling events that last longer than 180 s are deleted.
With regard to short-time missing data that are few in the data files, interpolation is performed to fill the gap between data that have an interval less than 3 s and nonzero values on both sides of the interval (given idling situations). e data set with lots of continuous missing data are deleted. Data of idling events are divided into the following two groups: long-time traffic congestion is considered as an idling event (speed set as 0), and data of long-time idling events are deleted. According to the definition of long-time congestion, data of driving at a speed lower than 10 km/h for no less than 30 s are searched and reset as 0.
e three data files after data preprocessing have 178,753, 143,535 and 159,777 pieces of data.

Kinematic Fragment Selection Methods.
A kinematic fragment (also called "microtrip") is an excursion between two successive time points at which the vehicle is stopped, i.e., a trip between two idling events. A complete kinematic fragment includes the four fragments: an idle fragment, an acceleration fragment, a cruise fragment, and a deceleration fragment. According to the standards of the WLTC, the following fragment extraction rules are set: (1) One fragment should last more than 10 s In accordance with the four rules, the preprocessed driving data are segmented into kinematic fragments, as shown in Figure 4. e numbers of fragments for the three sample data files are 1476, 1020, and 1224, respectively.

Extraction and Computation of Feature Parameters.
e major objective of this study is to develop a vehicle driving cycle with a 1200-1300 s speed time series. e vehicle driving features reflected by the cycle should be representative of corresponding features of the source data. As shown in Table 1, 16 feature parameters are chosen in this study. We choose these parameters based on the common practice of DC constructing.
In Table 1, parameters of speed and time can be collected directly, and the collection frequency is 1 Hz. For any given time point i, t i+1 − t i � 1, so the value of acceleration can be obtained by using the following equation:

Journal of Advanced Transportation
where a i,i+1 is the acceleration from the i th second to the (i + 1) th second (m/s 2 ), v i is the speed at the i th second (km/h), and t i is the time point of the i th second (s).
Parameters including v max , v m , v me , and v std can be calculated by using the following equation: Other parameters including a max , a min , a a , and a d can be calculated by using the following equation: For each fragment, the starting speed and the ending speed are 0, so the average acceleration is 0. us, the acceleration standard deviation is formulated as the following equation: T is the time length of a complete fragment, Ta is the time duration when Table 2 presents the feature parameters for the validity assessment of the kinematic fragments.
Parameters including P i , P a , P b , and P c can be calculated by equations (5)- (8): Other parameters including P 1-10 , P 10-20 . . .P 80+ can be obtained by using the following equations: After data cleansing, fragmentation, and parameter extraction, a total of 3720 pieces of 28-dimension data were obtained at last. e clustering method is used to cluster the microtrips (kinematic fragments), but the high dimension of data reduces the computation efficiency and undermines the clustering effect.
us, the PCA method is employed to reduce the dimension of data.

Data Dimension Reduction Using PCA and Factor Analysis (FA)
(1) Principal component analysis: Principal component analysis is a method that transforms a set of complex variables into a few principal components by adding several variables to reduce the number of variables while minimizing the loss of information. is method simplifies data and obtains results with more effective information [31]. e eigendecomposition method is used in this study.
(2) FA is an extension of PCA [32]. FA is a method that describes relationships between individual variables of a dataset. When FA is employed for decomposition, the common factors f � (f 1 , f 2 , . . . , f n ) ′ and the special factors ε � (ε 1 , ε 2 , . . . , ε p ) ′ are obtained, and then the original variables are modeled as linear combinations of common factors.

Fragment Selection Based on the Minimum Deviation from the Average Feature Parameters
(1) After the kinematic fragments are divided by using the clustering algorithm, all fragments are uploaded into a library of k-category fragments. e number of feature parameters of a microtrip is i, and c i is the number of fragments in the category of i � (1, 2, . . . , k). First, standardized processing is performed on the feature parameters of fragments, as shown in the following equation: where x pq,n is the q th parameter value in the n th fragment in the n category, max x q,n and min x q,n refer to the maximum and minimum values of the q th parameter in the n category, respectively, and d pq,n is the parameter value after standardized processing.
(3) e average of all nondimensionalized parameters in the n (n � 1, 2, . . . , k) category is calculated and summed up, as shown in the following equation: where x q,n is the average of the q th parameter of all fragments in the n category and Z n is the sum of all nondimensionalized feature parameters of all fragments in the n category. (4) e time percentage of each fragment in the cycle is calculated, as shown in equations (13) and (14): where P n1 and P n2 are the time percentage of the ncategory fragments in the driving cycle to be obtained and T p,n is the time duration of the p th fragment in the n category. T e1 � 1200 and T e2 � 1300 are the time range of the final driving cycle. (5) e difference between the sum of all feature parameters in the i category and the sum of all average parameters in the i category d p,n is calculated, and its absolute value is adopted. e candidate driving cycle fragments are sequenced in the order of the size of the interpolated values, and the fragments are selected one after another in this order till the time duration of the selected fragment exceeds the value of P n in equations (13) and (14). (6) e fragments selected from each category are combined to construct a representative driving cycle.

Fragment Selection Using AMPSO.
In the process of preexperiment, we found that the traditional PSO algorithm would lead to premature algorithm or fall into local minimum. After comparing the traditional PSO, AMPSO, and hybrid differential evolution particle swarm optimization (DEPSO), it was found that AMPSO has the best global search ability. erefore, AMPSO algorithm was selected. AMPSO follows the fundamental steps of PSO [33]: first, the swarm is initialized, a particle is randomly chosen in the feasible solution space, and the state of the selected particle is represented by three indicators-position, speed, and fitness. After that, the optimal solution of the selected particle and the optimal solution of the whole swarm are tracked to update through continuous iteration. When all particles stop searching in a dimension, the adaptive mutation strategy is triggered, the dimensional activity factor is introduced, and the quality particles are maintained for subsequent iterations to avoid precocity and local minimum problem. In Section 2.6, we proposed a selection criterion, and in this section, we proposed a specific implementation algorithm for selecting. e optimization function of the algorithm is shown in the following equation: where q k is the sequence number of selected fragments in the k category, Z q k k is the average feature parameter of the q k th fragment in the k category, and ε 1 and ε 2 are constant coefficients (both are set as 0.5).

Analysis of Feature Parameters Using FA and PCA.
e values of the 15 feature parameters for the 3720 fragments obtained in Section 2.3 are calculated. PCA is performed using SPSS (SPSS Inc., 16.0) on these 15 feature parameters. Typically, correlation test should be performed before PCA analysis.

Correlation Test.
e FA method can find common factors and latent representative factors from the group of variables. Correlation among variables is a premise for factor analysis, so correlation tests must be made before factor analysis. Table 3 shows the correlation between feature parameters. Only six feature parameters are listed here. Table 4 shows the results of the KMO and Bartlett's test. e KMO value in Table 4 is 0.961 (>0.7), and Sig. is 0, which indicates strong correlation between the feature parameters, and thus, factor analysis is feasible.

Extraction of Principal Components and Common
Factors. Table 5 shows the result of PCA.
In the data used to develop the driving cycle, the components numbered 1, 2, 3, and 4 have an eigenvalue above 1 and can explain 31.651%, 20.236%, 13.220%, and 9.786% variance, respectively, and the percentage of cumulative variance explained reaches 74.893%, which means much information is explained. us, the components 1, 2, 3, and 4 are extracted as the principal components, and other components are dispensed with because they contain little information. Figure 5 shows the scree test result. en, the component matrix is extracted, as shown in Table 6. e common factors are stored for subsequent operations, and regression is performed to obtain the component score coefficient matrix. e method of varimax with Kaiser normalization is used for factor rotation. New variables generated by using SPSS are placed at the end of the sample data as values for the common factors. e output factor coefficients are shown in Table 7. Table 7 shows the component matrix after fragment selection. Factor rotation minimizes the number of variables of high loadings in each factor and better explains the common factors. Tables 6 and 7, the first principal component represents the cruise time, idle time, driving distance, maximum speed, average speed, and average driving speed. e second principal component reflects the maximum deceleration, average deceleration, and acceleration standard deviation. e third reflects the fragment duration, and the fourth reflects the acceleration time.

As shown in
e four principal components cover all the 15 constructed feature parameters and hence can fully reflect the features of the fragments. By multiplying the normalized driving speed data matrix by the principal component matrix, we obtain the principal component score matrix. Equation (16) shows how the principal components are calculated. Scores of the four principal components are taken as the research object for clustering.

Clustering Result Analysis.
e scores of four principal components are clustered, and the number of condensation points is set as 3. e K-means clustering algorithm is realized by software to divide the 3720 kinematic fragments      Figure 6. e driving features reflected in the microtrips in each category are further analyzed, and the parameters and equations in Section 2.4 are used to obtain the composite feature parameters of these categories, as shown in Table 8.
On the basis of Table 8, Figures 7-9 are obtained. As shown in Figure 8, there are patterns in the distribution of composite feature parameters. According to these patterns, category 1, 2, and 3 are considered as a low-speed, medium-speed, and high-speed category of fragments, respectively. e following patterns can be concluded: (1) With regard to the number of kinematic fragments, the number of medium-speed fragments is the highest, reaching 1957, followed by 1176 low-speed fragments and 577 high-speed fragments. (2) With regard to the driving time percentage, the lowspeed fragments have the highest idle time percentage at 58.98%, and the high-speed fragments have the lowest idle time percentage at 4.39%. e constant-speed time percentage reaches the highest at 35.16% in the high-speed fragments while hitting the lowest at 9.28% in the low-speed fragments. (3) In terms of speed distribution percentage, the percentage of speed at 0-10 km/h reaches the highest at 72.08% in the category of low-speed fragments, and the percentage of speed over 80 km/h reaches the highest at 17.09% in the category of high-speed fragments.

Development of the Vehicle Driving Cycle and Analysis.
A proper vehicle driving cycle is usually between 1200 and 1300 s. e time duration of fragments that should be selected in each category is obtained based on the clustering result, and the AMPSO is employed to find fragments with the minimum deviation from the composite feature parameters. As the optimal fragments obtained by using the AMPSO differ every time the algorithm is performed, multiple experiments are conducted, and fragments with the minimum deviation from the composite feature parameters are selected so that driving cycle 1 is obtained, as shown in Figure 10. Moreover, the same number of experiments is conducted using the random selection method, which leads to driving cycle 2, as shown in Figure 11. ese two cycles are compared. eir time lengths are 1216 s and 1261 s, respectively.

Constructed Driving Cycles vs. Existing Driving Cycles.
To better display the comparison result, Figures 12-14 are produced based on Tables 9 and 10.
As shown in Figures 12-14, driving cycle 1 has smaller errors than driving cycle 2 in terms of four parameters-average speed, average driving speed, average acceleration, and average deceleration. Driving cycle 1 is closer to the result of preprocessed experiment data than driving cycle 2 in terms of the acceleration percentage, deceleration percentage, constant-speed percentage, and idle percentage, as well as the driving speed frequency.
Figure15 shows the final driving cycle. Comparison among the NEDC, FTP75, WLTC, experiment data, and the final driving cycle developed in this study is performed, as shown in Table 11.
As the result shows, the method proposed in this study improves the effect of the random experiment method and solves the problem of low reliability. Compared to typical driving cycles abroad, the driving cycle developed herein better suits the actual road and traffic conditions of Fuzhou urban district in terms of the average speed, average acceleration, and percentages of driving modes. e vehicle driving features of the developed driving cycle are closer to the data sources and are thus representative.

Conclusion
e driving cycle is an important benchmark for vehicle design and evaluation. A driving cycle is built in this study based on actual driving data in Fuzhou. Analysis of vehicle usage and driving features in Fuzhou reveals that the actual driving cycle in Fuzhou differs much from standard driving cycles. erefore, it is necessary to develop and use specific driving cycles for the evaluation of car emissions.
With driving data collected by a minibus, this study preprocessed the raw data, performed K-means clustering on the fragments, screened the fragments using AMPSO, and developed a driving cycle in Fuzhou. e developed driving cycle is compared to existing driving cycles. e developed driving cycle can reflect the actual driving conditions. e acceleration percentage under the four driving modes shows the largest deviation of 1.5%. e developed driving cycle can also be used for the evaluation of fuel consumption and emissions of vehicles in Fuzhou.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.