Predictive modeling for the quantity of recycled end-of-life products using optimized ensemble learners

The rapid development of machine learning algorithms provides new solutions for predicting the quantity of recycled end-of-life products. However, the Stacking ensemble model is less widely used in the field of predicting the quantity of recycled end-of-life products. To fill this gap, we propose a Stacking ensemble model that utilizes support vector regression, multi-layer perceptrons, and extreme gradient boosting algorithms as base models, and linear regression as the meta model. The k-nearest neighbor mega-trend diffusion method is applied to avoid overfitting problems caused by a small sample data set. The grid search and time series cross validation methods are utilized to optimize the proposed model. To verify and validate the proposed model, data related to China ’ s end-of-life vehicles industry from 2006 to 2020 is used. The experimental results demonstrate that the proposed model achieves higher prediction accuracy and generalization ability in predicting the quantity of recycled end-of-life products.


Introduction
In recent years, traditional supply chain management has transformed into sustainable supply chain management due to growing ecological awareness and legal regulations (Masoumi et al., 2019).Efficient management of the reverse supply chain plays a significant role in sustainable management (Lenort et al., 2021).To design an effective system for reverse supply chain management, it is important to manage end-of-life (EOL) products (Rashid et al., 2021).EOL products can be considered a vital source of secondary raw materials through recycling, reusing, and remanufacturing (Numfor et al., 2021).In fact, the amount of EOL products generated each year is increasing at an alarming rate.For example, the world currently generates around 50 million tons of waste electrical and electronic equipment (WEEE) (Andeobu et al., 2021) and 2.01 billion tons of municipal solid waste (MSW) (Namoun et al., 2022) yearly.It is challenging to manage EOL products due to their uncertainty in quality, quantity, and return time (Hao et al., 2018).If the quantity of EOL products can be predicted in advance, accurate information will help decision-makers and practitioners effectively regulate the resources necessary for designing the reverse supply chain.
There have been considerable conventional methods designed for predicting the quantity of EOL products, including the population balance model (Lin et al., 2018), the market supply method (Jain and Sareen, 2006), the distribution delay method (Polák and Drápalová, 2012), the structural equation model (Agrawal and Singh, 2019), the time series model (Ochotnicky et al., 2017), the gray model (Ene and Öztürk, 2017), the graphical evaluation and review technique (Zhou et al., 2016), etc. Machine learning (ML) methods currently stand out for their superior accuracy in predicting EOL products due to their superior performance with unstable nonlinear data samples and large feature sizes (Ni et al., 2021).In the field of ML, ensemble models have attracted much interest and have proven to be highly predictive in a variety of applications (Cui et al., 2021).However, the Stacking ensemble model is relatively underutilized in predicting the quantity of recycled EOL products.
To address the gap, we propose a novel Stacking-based ensemble model to predict the quantity of EOL products.Our approach combines multiple machine learning algorithms to improve prediction accuracy and generalization ability, leading to better management of sustainable reverse supply chains and increased sustainability of recycling industry.
The paper is organized in the following way.Section 2 includes a detailed literature review of related research.Section 3 proposes a Stacking-based prediction model.Section 4 details the empirical study of the proposed model and the experimental results.Finally, the conclusion and future research directions are shown in Section 5.

Literature review
In this section, the ML-based predictive methods for the quantity of EOL products are reviewed.The EOL products in this research refer to end-of-life vehicles (ELVs), medical waste (MW), MSW, and WEEE.Besides, the ML methods applied to predict EOL product recycling mainly include artificial neural network (ANN), support vector regression (SVR), k-nearest neighbor (KNN), decision tree (DT), gradient boosting regression tree (GBRT), extreme gradient boosting (XGBoost), and random forest (RF).

Support Vector Regression (SVR)
The sample is mapped to highdimensional space by the kernel function, and the hyperplane is used for regression.This algorithm has a high learning capacity for highdimensional small sample data, but it is overly reliant on the kernel function (Zhang et al., 2022).A hybrid model of fuzzy information granulation (FIG), GA, and SVR was proposed to predict the MSW generation per capita for Hubei province in China (Dai et al., 2020).Further, the SVR optimized by the wavelet transform (WT) was used to forecast weekly MSW in Tehran and Mashhad (Abbasi et al., 2014).
K-Nearest Neighbor (KNN) As a nonparametric and instance-based lazy learning algorithm, KNN is known for its stability in the presence of noise.The weighted KNN algorithm has been developed and successfully applied to forecast MSW generation in Australia (Abbasi and El Hanandeh, 2016).
Decision Tree (DT) As a supervised ML algorithm, DT consists of root nodes, internal nodes, and leaf nodes.This algorithm shows great interpretability.DT has been employed to evaluate MSW generation in the city of Bogota, providing a possible decision-making strategy for waste disposal (Kannangara et al., 2018).
Gradient Boosting Regression Tree (GBRT) Based on the boosting strategy, the GBRT algorithm makes a joint decision by iterating multiple trees; that is, each tree gets its predicted value by learning the conclusions and residuals of all previous trees.This algorithm has strong robustness to outliers but is unsuitable for high-dimensional sparse data (Lu et al., 2022).The combination of GBRT and ANN was applied to predict building-level MSW generation in New York (Kontokosta et al., 2018).
Extreme Gradient Boosting (XGBoost) The XGBoost algorithm generates a tree according to feature splitting and continuously adds trees to fit the residual of the last prediction, so as to obtain new functions and improve model performance through gradual iteration.This algorithm can prevent overfitting effectively but is unsuitable for processing highdimensional feature data and unstructured data (Zhang et al., 2022).
Random Forest (RF) RF is one of the classification and regression tree (CART) models based on Bagging integration.This algorithm has high accuracy in training results and good parallelism, but it performs poorly on small data sets (Nguyen et al., 2021).
Ensemble Model Ensemble learning is a type of hybrid ML model in which different or the same type of algorithm can be added multiple times to form a more powerful prediction model (Dasarathy and Sheela, 1979;Tan et al., 2019).Ensemble learning has three main ensemble models, namely Boosting, Bagging, and Stacking.Boosting has a strong dependence between individual learners and a serialization method that must be generated sequentially; that is, the next learner needs to delete a learner to learn, which cannot be parallelized (Freund and Schapire, 1997).Bagging is a parallelization method that can be generated simultaneously without strong dependence between individual learners (Breiman, 1996).Stacking is a parallel, phased ensemble method that adds a meta model layer to multiple heterogeneous base models and then outputs the prediction results.A decomposition-ensemble-based model integrating variational model decomposition (VMD), an exponential smoothing model (ESM), and GM was proposed for e-waste quantity prediction (Wang et al., 2021).Moreover, an ensemble voting regression algorithm based on RF, gradient boosting machine (GBM), and adaptive boosting (AdaBoost) was developed to predict the medical waste for Istanbul in Turkey (Erdebilli and Devrim-˙Içtenbas ¸, 2022).
Bagging and Boosting often choose the same model as the base models.The correlation between the models is greater, and the overfitting problem is easy to occur.In contrast, Stacking selects different models as base models to capture the correlation between the predicted results and the actual data more effectively.However, Stacking is less widely used in the field of sustainable reverse supply chains.Thus, to solve the disadvantage of a single model with weak generalization ability in the recycling field, this research proposes a novel Stacking ensemble model to predict the quantity of EOL products.

Method
The proposed method is described in this section.First, socioeconomic influence factors for EOL products are summarized from previous studies.The historical data for these variables is processed by z-score standardization and data augmentation (Section 3.1).Second, the proposed optimized Stacking ensemble model is developed (Section 3.2).Finally, three evaluation metrics, namely mean absolute error (MAE), mean square error (MSE), and R-squared (R 2 ), are used to evaluate the prediction performance of the proposed model (Section 3.3).Anacondabased Python programming (version 3.8) is used to analyze data and build ML-based predictive models.

Data preprocessing
To eliminate the influence of the data's various attributes, the original values x and y will be standardized based on the mean (μ) and standard deviation (σ), as shown in Eqs. ( 1) and (2).

√
(1) To avoid overfitting of a small sample data set, data is augmented using the K-nearest neighbor mega-trend diffusion (KNNMTD) method (Sivakumar et al., 2022).By increasing the number of samples and expanding the data set, the limited data can be effectively utilized to the maximum extent, and the generalization of the ML model can be improved.
Consider the data point X (i,j) , which means the instance i has j attributes.First, the KNN algorithm iteratively finds the nearest neighbors of X (i,j) , which serves as the input for mega-trend diffusion (MTD).Then, to obtain the subsample domain ranges, the diffusion coefficient is calculated as Eq.(3).
Where the superscript (i,j) represents the MTD parameter values that correspond to the jth attribute of the ith instance, ŝ2 x represents the sample variance, and k represents the sample size.
The estimated range of the diffused sample set is shown as Eqs.( 4)-(8). Where is the number of data points that are smaller than u is the number of data points that are larger than u set , and the minimum and maximum value of the neighboring subsamples of (i,j)th instance are represented by min (i,j) and max (i,j) , respectively.
When a and b exclude the minimum and maximum values, the lower bound (LB) and upper bound (UB) are calculated as Eqs.( 11) and ( 12). ) The membership function (MF) is calculated as Eq. ( 13).

MF
To measure the performance between actual data and artificial virtual data, the pairwise correlation difference (PCD) is calculated using the Frobenius norm as Eq. ( 14).
Where X r is the actual data matrices, X s is the artificial virtual data matrices, and corr is the Pearson correlation matrices of X r and X s .

Model building
As a parallel ensemble learning strategy, Stacking contains multilayer learning structures.It is essentially about training different ML algorithms on data from various data spaces and data structure perspectives.The Stacking ensemble model structure consists of two learning layers: the first one is the base model, comprising multiple heterogeneous ML models, while the second one is the meta model.The first layer employs the entire training set to train various base models and obtain the predicted values.On the other hand, the second layer trains the true values and predicted values obtained by the base models.Stacking can resolve the insufficient upper limits of a single model's learning ability, avoid the redundancy of the prediction model, and ensure prediction accuracy.
The goal of parameter optimization is to find a set of parameters that brings the model's generalization error as close to zero as possible.The generalization of a model can be negatively affected if it is too complex, resulting in overfitting and a high generalization error, and vice versa.In this research, a combination of time series cross validation and the grid search method is used to find the optimal parameter group of the Stacking model.Firstly, the grid search method enumerates all the model parameter combinations through the set of parameter values.Then, the model parameter combination with the highest average generalization ability score value is output by using time series cross validation.
(1) Grid search The commonly ML-based method usually adjusts parameters use random search, Bayesian optimization, and grid search.Random search allows for manual control over the number of searches, but each search may yield different results.Bayesian optimization can record the previous search results for the next search, but it is easy to fall into the trap of local optimization instead of global optimization.In comparison, grid search, although the most time-consuming, can be exhaustive of all possible results, and the results are the same every time.
To improve the prediction accuracy, grid search method is selected to adjust parameter for Stacking model in this research since the experimental data are not very large.The steps of the grid search method are as follows: Step 1: Initialize the mesh size, set the step distance, and define the parameter initial values; Step 2: Loop through each set of parameter combinations; Step 3: The parameter values of each parameter combination are used to train the Stacking model in combination with the time series cross validation, respectively.The R-squared value of the model is obtained, and the parameter value of the parameter combination is defined as the best; Step 4: If a better combination of hyperparameters is found, replace the previous best; Step 5: Combine the best hyperparameters to obtain the optimal parameter set, train the final model, and output the optimal Stacking model.
(2) Time series cross validation (TSCV) To prevent overfitting and improve generalization ability, while also considering the temporal sequence of the dataset, the base model of the Stacking model should be trained by combining time series cross validation (see Fig. 1), and then the output results will be used to train the meta model.The steps are as follows: Step 1: Assume that the original data set is (X, Y), the training set is F = (X_train, Y_train), and the test set is T = (X_test, Y_test).Firstly, the original training data set F was split into five consecutive and nonoverlapping subsets: F i (i = 1,2,… 5) based on time order; Step 2: One of F i is the test set, and the remaining four subsets are the training set for training the base model M i .The trained model M i is obtained, and the model M i is used to predict the test set F i to get the result P ii (i = 1,2… 5), and the prediction result of the base model M i to the original test set T is denoted as R i (i = 1,2… 5); Step 3: The obtained prediction results P ii are then concatenated in chronological order to obtain the training data set P of the second layer meta model, which has the same number of samples as the original training data set F; Step 4: To predict the result R i and calculate the mean value to get the test set R of the meta model.

Statistical measures for model evaluation
This research uses the following three metrics to evaluate the effect and prediction error of the proposed model, which is mean absolute error (MAE), mean square error (MSE), and R-squared (R 2 ), as shown in Eqs. ( 15)-( 17).
y i and y i are the original value and average value of variable Y, respectively, and ŷi represents the predicted value of variable Y.

Empirical study
To verify and validate the effectiveness of the proposed model, we use the data related to China's ELVs industry from 2006 to 2020.This research involves simulating recycled ELVs generation using seven different ML models, namely SVR, XGBoost, light gradient boosting machine (LGBM), RF, MLP, GBRT, and DT, in order to find the best predictive base models with less correlation.Before commencing the modeling process, we use grid search and time series cross validation to determine the best structure for each base model by obtaining model parameters.These parameters vary according to each model theory as discussed above.We develope Stacking ensemble models by combining the optimal base models with the meta model.Our main objective is to validate the prediction performance of our proposed Stacking model through empirical analysis, and compare it to other proposed models.The overall framework of this empirical study comprises five steps, as shown in Fig. 1.

Data collection
In general, to build a prediction model or make decisions for a problem, ML algorithms develop the relationships between input variables and output variables based on empirical data (Erkinay Ozdemir et al., 2021).Based on previous studies (Hao et al., 2018;Hu and Kurasaka, 2013;Ochotnicky et al., 2017;Xin et al., 2018;Yano et al., 2015), eight socioeconomic factors that influence the quantity of recycled ELVs are selected, including the number of auto production, passenger turnover, population, vehicle drivers, recycled material price, income of per urban resident, highway mileage, and the number of ELVs enterprise.These historical data on a monthly basis were extracted from the China Association of Automobile Manufacturers, the China National Resources Recycling Association, and the China National Bureau of Statistics.

Data augmentation
Augmented data are generated by integrating the original data set with artificial virtual samples in order to improve the generalization ability of ML models and prediction performance for small sample data sets.The artificial virtual samples are generated using the KNNMTD method.The artificial virtual sample size is set at 100 (Li et al., 2013).This is because an unreasonable increase in the artificial virtual sample size may lead to irrational virtual samples.The PCD with varying values of k = [3, 10] is calculated, and the appropriate k value is 4. The evaluation results of MAE, MSE, and R 2 predicted by ML models with and without the use of KNNMTD method are presented in Table 1.

Stacking ensemble model
Based on the literature review analysis, the selected base models of Stacking mainly include SVR, XGBoost, LGBM, RF, MLP, GBRT, and DT.The optimal combination of parameters for these single ML algorithms with the use of the KNNMTD method is found by using grid search and time series cross validation, as shown in Table 2.
The Stacking ensemble model requires that the base model select heterogeneous single ML algorithms with excellent learning performance.This is because the smaller the correlation between the base models, the lower the variance of the Stacking model.The meta model of Stacking model requires strong robustness and generalization ability.To prevent model overfitting and improve prediction accuracy, the linear regression (LR) algorithm is selected as the meta model.
Besides, Table 1 shows that these single ML algorithms have a strong learning ability to predict the quantity of ELVs.Even though XGBoost, LGBM, GBRT, DT, and RF have different algorithm principles, they are all tree-based models with a similar data processing method.Thus, these tree-based models have a high correlation with each other.SVR and MLP are fundamentally different from these tree-based models, so the correlation between SVR, MLP, and other models is low.Therefore, the base model of the Stacking ensemble model is developed by SVR, MLP, and five other tree-based models, respectively.The final prediction performance of each Stacking model is shown in Table 3 and Fig. 2.
Considering the accuracy and difference in ML models, this research selects the Stacking 1 model, namely SVR, MLP, and XGBoost, as the base models and LR as the meta model to construct the Stacking ensemble model.
To further evaluate the performance of the proposed model, we utilize the learning curve to identify potential overfitting.In general, the learning curve plots the model's performance on both training data and testing data at different training set sizes.The learning curve usually consists of two lines representing loss of training data and loss of testing data, which is measured by the value of the MSE in our research.When drawing the learning curve, the training examples are set as the horizontal coordinate, and the MSE of the training set and verification are set as the vertical coordinate, as illustrated in Fig. 3.After data preprocessing, the generalization ability of our proposed Stacking model without using KNNMTD method is shown in Fig. 3(a), while the generalization ability of the proposed Stacking model after data augmentation is demonstrated in Fig. 3(b).

Discussion
The aim of this research is to develop a Stacking-based ensemble model to predict the quantity of recycled EOL products for a sustainable   First, the performance evaluation of seven ML models, namely SVR, XGBoost, LGBM, RF, MLP, GBRT, and DT, shows that they achieved good results in predicting the quantity of EOL products after using the KNNMTD method and optimizing parameters through grid search and time series cross validation (see Table 1).Note that GBRT and MLP perform better, which indicates that the relationship between the quantity of EOL products and its socioeconomic variables tends to be complex and nonlinear.
Second, the Stacking 1 ensemble model, which uses SVR, MLP, and XGBoost as the base models and LR as the meta model, performs best in predicting the quantity of EOL products.Table 3 shows the predictive performance of the Stacking model under different base learner combinations.The MAE and MSE of the Stacking 1 model are 0.0305 and 0.0016, respectively, which are lower than the other Stacking models.
Besides, the values predicted by the Stacking ensemble model overall are moving in the same direction as the real values, as demonstrated in Fig. 2. The predictions of some points with large fluctuations can also be accurately predicted.
Third, reduced error and R-squared, as seen in Tables 1 and 3, clearly advocate for the superiority of the proposed Stacking 1 model over a single base model.Compared with the worst single model SVR, the Stacking 1 model decreased MAE and MSE by about 0.0471 and 0.0050, respectively, and increased R 2 by 0.0864.This indicates that the proposed Stacking model integrates the strengths of single ML models to capture information, reducing the influence of a variable environment and multiple operating conditions and improving the overall prediction accuracy and generalization ability.Note that even with the introduction of SVR with slightly lower precision, the performance of the Stacking 1 model remains superior to other base models.There are three main reasons for this.Firstly, SVR has unique advantages in handling the regression problems with high dimensions and small samples.Secondly, XGBoost, as a single model, exhibits strong prediction performance, ensuring the prediction accuracy of the Stacking model.Finally, using algorithms with low correlation as the base models allows the Stacking 1 model to fully utilized the strengths of each algorithm, reducing the risk of falling into the local optimal, and providing robust prediction performance.
Fourth, for a small sample data set, data augmentation can help make more robust and accurate predictions.The learning curves in Fig. 3 demonstrate that the error value of the testing data set is higher than that of the training data set.However, the learning curve of the testing data set is far from that of the training data set, suggesting that the Stacking 1 model is slightly overfitting based on the original data set (see Fig. 3(a)).To address the issue, we use the KNNMTD method to generate artificial virtual data and integrate it with the original dataset to create a new expanded dataset for model training.After training on the expanded new data set, the training error and the testing error tend to converge and approach each other, indicating that the generalization ability of the ensemble learning prediction model is improved by the KNNMTD method, as shown in Fig. 3(b).This means data augmentation can effectively reduce the overfitting risk caused by small sample datasets.
In theory, this study contributes to prediction techniques for small sample data sets.This research proposes a novel Stacking ensemble model for predicting the quantity of EOL products, which uses SVR, MLP, and XGBoost as base models and LR as a meta model.This research conducts data augmentation using the KNNMTD method to avoid the overfitting caused by the small sample data set and improve the generalization ability of the proposed model.Likewise, this research makes contributions to prediction applications for industry.Accurate predictions of recycling quantity can help decision-makers and practitioners design sustainable reverse supply chain and production plans, which can reduce environmental pollution and improve the competitiveness of enterprises.More specifically, the proposed Stacking ensemble model can achieve greater prediction accuracy and generalization ability than single ML models, making it a valuable tool for decision-makers in the recycling industry.When dealing with small sample data sets, the application of the KNNMTD method is particularly significant.Overall, the research findings have substantial practical implications for the recycling industry's management and sustainability.
In addition, the limitation of this research is that it is not efficient to try different base model combinations manually.In the future research direction, we will focus on the following aspects.Firstly, we plan to conduct more feature engineering on the input variables to generate problem-specific features.Secondly, we aim to establish a base model learning library and integrate it with the intelligent optimization algorithm to build a more intelligent prediction system.Thirdly, we will compare the performance of the proposed Stacking ensemble model with other ensemble models.Fourthly, we intend to incorporate factors such as policy and consumer preferences into the prediction model as they influence the quantity of EOL products.Lastly, if the data set becomes larger in the future, we may adopt distributed computing methods to improve calculation speed.

Conclusions
The accurate prediction of the quantity of recycled EOL products is an important prerequisite for making effective short-term and long-term decisions and monitoring the overspill of hazardous waste.To improve the prediction accuracy and generalization ability, we propose a Stacking-based ensemble prediction model for the quantity of recycled EOL products.In the process of data preprocessing, data augmentation is used to avoid the overfitting problem caused by small sample data sets.
In the process of model training, based on the correlation and prediction abilities of the base models, a combination strategy of base models is proposed.In addition, the proposed model is validated with relevant data from China's ELVs industry.The results indicate that, compared with other Stacking ensemble models and single ML models, the Stacking 1 model proposed in this research has better performance in prediction accuracy and stability.
The application of the proposed Stacking-based model can be expanded to a global scope to examine the EOL product generation trends in different countries and regions.From a sustainability point of view, this research can be used by practitioners and decision-makers as the basis for the development of recycling programs, the construction of processing facilities, the optimization of resource allocation, as well as the establishment of waste management systems and sustainable reverse supply chains.Consequently, this research not only provides a new direction for predicting EOL product recycling but also adds economic, technical, and social benefits to sustainable environmental conservation and the circular economy.

H
.Xia et al.

Fig. 1 .
Fig. 1.The overall framework of the empirical study.
H.Xia et al.   reverse supply chain.The data related to China's ELVs industry from 2006 to 2020 is used to validate the proposed optimized Stacking model.Thus, the main research results are analyzed as follows.

Table 1
Prediction results of single ML algorithms.

Table 3
Prediction results of stacking models.