Next Article in Journal
Salicornia ramosissima Bioactive Composition and Safety: Eco-Friendly Extractions Approach (Microwave-Assisted Extraction vs. Conventional Maceration)
Previous Article in Journal
A Novel Finite Element Method Approach in the Modelling of Edge Trimming of CFRP Laminates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Telecom Churn Prediction System Based on Ensemble Learning Using Feature Grouping

Department of Computer Engineering, Chonnam National Unversity, Yeosu 59626, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(11), 4742; https://doi.org/10.3390/app11114742
Submission received: 20 April 2021 / Revised: 14 May 2021 / Accepted: 19 May 2021 / Published: 21 May 2021
(This article belongs to the Topic Industrial Engineering and Management)

Abstract

:
In recent years, the telecom market has been very competitive. The cost of retaining existing telecom customers is lower than attracting new customers. It is necessary for a telecom company to understand customer churn through customer relationship management (CRM). Therefore, CRM analyzers are required to predict which customers will churn. This study proposes a customer-churn prediction system that uses an ensemble-learning technique consisting of stacking models and soft voting. Xgboost, Logistic regression, Decision tree, and Naïve Bayes machine-learning algorithms are selected to build a stacking model with two levels, and the three outputs of the second level are used for soft voting. Feature construction of the churn dataset includes equidistant grouping of customer behavior features to expand the space of features and discover latent information from the churn dataset. The original and new churn datasets are analyzed in the stacking ensemble model with four evaluation metrics. The experimental results show that the proposed customer churn predictions have accuracies of 96.12% and 98.09% for the original and new churn datasets, respectively. These results are better than state-of-the-art churn recognition systems.

1. Introduction

Owing to fierce competition among telecom companies, customer churn is inevitable. Customer churn is the act of a customer ending a subscription to a service provider and choosing the services of another company.
Companies must reduce customer churn because it weakens the company. A survey showed that the annual churn rate in the telecom industry ranges from 20% to 40%, and the cost of retaining existing customers is 5–10 times lower than the cost of obtaining new customers [1]. The cost of predicting churn customers is 16 times lower than that for obtaining new customers [2]. Decreasing the churn rate by 5% increases the profit from 25% to 85% [3]. This shows that customer-churn prediction is important for the telecom sector. Telecom companies consider customer relationship management (CRM) an important factor in retaining existing customers and preventing customer churn.
To retain existing customers, CRM analyzers must predict which customers will churn and analyze the reasons for customer churn. Once the at-risk customers are identified, the company must perform marketing campaigns for churn customers to maximize the churn-customer retention. Therefore, customer-churn prediction is an important part of CRM [4].
The accuracy of the prediction systems used by CRM analyzers is important. If analyzers are inaccurate in predicting customer churn, no campaigns can be performed. Owing to recent advancements in data science, data mining and machine learning technologies provide solutions to customer churn. However, there are several limitations in existing models. For example, logistic regression, a common churn-prediction model based on older data-mining methods, is relatively inaccurate. Furthermore, feature construction [5] is neglected during modeling development. Therefore, a good churn prediction system is necessary.
This study proposes a new customer-churn prediction system and feature construction to improve accuracy, and the contributions of this study can be summarized as follows:
(1)
A new prediction system based on ensemble learning with relatively high accuracy is proposed.
(2)
New features derived from equidistant grouping of customer behavior features are used to improve the system performance.
The rest of this paper is arranged as follows: Section 2 presents literature review, Section 3 proposes the ensemble model and equidistance feature grouping, and Section 4 describes the prediction system and experimental results. Finally, Section 5 concludes the paper.

2. Literature Review

Many methods such as machine learning and data mining are used for churn prediction. The decision-tree algorithm is a reliable method for churn prediction [6]. In addition, a neural network method [7], data certainty [8], and particle swarm optimization [9] are used for churn prediction.
Moreover, an artificial neural network (ANN) and decision trees are compared for customer-churn predictions [10], and the literature review shows that the decision-tree algorithm is better than ANNs for customer churn prediction.
A.T. Jahromi [11] studied the effect of customer loyalty on customer churn in prepaid mobile phone companies. In this study, features were segmented, and multiple algorithms such as decision trees and neural networks were used to predict the processed data. Results showed that a hybrid approach is better than a single algorithm. The KNN-LR is a hybrid approach using logistic regression and. the K-nearest neighbor (KNN) algorithm [12]. Researchers surveyed KNN-LR, logistic regression and the radial basis function (RBF) network and found that KNN-LR showed the best performance. Y. Zhang [13] proposed a distributed framework for data mining techniques to predict customer churn. The framework improves the CRM quality of service.
Ruba Obeidat proposed a hybrid genetic programming approach [14]. This study used the K-means algorithm and genetic programming to predict customer churn. Sahar F. Sabbeh used AdaBoost based on the boosting algorithm [15], which summarized existing machine-learning techniques and proposed that AdaBoost showed the best results. Hossam Faris proposed a hybrid swarm intelligent neural network model [16], which proposed an intelligent hybrid model based on particle swarm optimization and feedforward neural network for churn prediction. The results show that the model can improve the coverage of churn customers.

3. Materials and Methods

3.1. Dataset Preparation

The customer churn dataset is an open-source dataset [17] that contains 21 features and 3333 observations. The feature ‘Churn’ shows customer churn or non-churn based on existing conditions. Approximately 14.5% of the ‘Churn’ is ‘T’ label, and 84.5% of ‘non-churn’ is ‘F’ label. Table 1 shows the data features. In this experiment, 80% (2666 instances) and 20% (667 instances) of the dataset are used for training and test datasets, respectively.

3.2. Proposed Method

This study proposes a new customer-churn prediction system consisting of feature construction, stacking model and soft votingas shown in Figure 1.
The original features with customers’ consumption and behavior are equidistantly grouped to construct new features. The stacking model consists of two levels with four algorithms: Xgboost (XGB), Logistic regression (LR), Decision tree (DT) and Naive Bayes classifier (NBC) to achieve better prediction accuracy. The third step consists of a soft voting. The results of the stacking model are input to the soft voting.

3.2.1. New Feature Construction with Equidistant Grouping

Feature engineering is important for data processing. Good feature selection and construction are essential for achieving high performance in machine-learning jobs [18]. Feature construction is the process of inferring or constructing additional features from original features, and it discovers missing information about the relationships between features. Feature construction transforms the original representation space to a new one to help better achieve data-mining objectives: improved accuracy, easy comprehensibility, truthful clusters, revealing hidden patterns, etc. [5].
Feature grouping correlates relevant features. Features from the same group are more related, compared to features from a different group. Therefore, it is possible to generate groups of correlated features that are resistant to sample-size variations [19].
Some features in the churn dataset have small–large integer numbers, for example, from 0 to 365. The customers with similar values have similar churn trends. The churn accuracy can be improved when divided by equidistance groups.
When customers have similar consumption–expense behaviors, they may have similar churn. In this study, an equidistant grouping method for features of consumption–expense is used to construct new features. The original observations in a feature are equidistantly grouped to new groups according to the range of the feature value. Customers with similar consumption–expense patterns can be characterized into a group, resulting in identical values of the new feature. The process for feature grouping is shown in Figure 2. The instances of the original feature are grouped equidistantly based on the number of corresponding groups. Instances in the same group get the same value in the new feature.
Sturges’ formula formulates a method of choosing the optimum number of bins in a histogram [20] for a normally distributed dataset.
As shown in Figure 3, the histograms of some features in the dataset have a shape of normal distribution.
This study uses Sturges’ formula to determine the optimal number of groups, which is given by
K = 1 + log ( n ) / log ( 2 ) = 1 + 3.322 log ( n )
where K represents the optimal number of groups and n is the largest feature value.

3.2.2. Stacking Model

(1)
Classifiers
XGB is a decision-tree-based ensemble machine-learning algorithm that uses a gradient boosting framework. It accurately predicts a target class by combining simple and weak models [21].
LR is a machine-learning method that solves binary classification problems and predicts classification possibility. LR has advantages such as simple implementation and strong explanatory power, and it is widely used in the industry [22].
Currently, DT is a mainstream prediction and classification technology. It is similar to human thinking. It adopts a recursive top-to-bottom method, carrying out internal node attribute comparisons and judging to split down from the node depending on attribute values, and concludes on the DT leaf [23].
In probability and statistics, the Bayesian rule is based on prior knowledge of events probability. In machine learning, NBC is a probability statistical classifier based on Bayes theorem. The classifier uses conditional independence assumptions and chooses a very likely category as the sample final category. The algorithm is simple, easy to implement, and less sensitive to missing data, showing small error and stable performance [24].
(2)
Stacking
The stacking model is a general method of using a higher-accuracy algorithm to combine lower-accuracy algorithms to achieve greater predictive accuracy. The best results are obtained when the higher-accuracy algorithm is in a first level and lower-accuracy algorithms are in a second level [25]. In this study, the stacking model consists of two layers as shown in Figure 4, level 1 and level 2. A higher accuracy model is used in the first layer (level 1), while lower-accuracy models are used in the second layer (level 2).
XGB with a kind of ensemble function got the best accuracy, which is chosen as a first-level classifier. And LR, DT and NBC have primary and different characteristics, and they can complement XGB for the mismatched samples, which are chosen in a second level. The results of XGB and data are combined to generate data 1 for the second level classifiers.

3.2.3. Soft Voting

Soft voting estimates the class probability with different algorithms having contrasting approaches to improve the accuracy of prediction. It assigns a larger weight to the important classifier, and the highest category is selected by summing the probabilities predicted by the model. The soft-voting classifier prediction is mathematically represented as follows:
y ^ = arg m a i x w j j = 1 m p i j
where argmax function is the function that outputs a maximum, and wj represents the weight associated with the classifier prediction; pij is the probability of the classifier in predicting a certain class. In the process of assigning weights for soft voting, high-confidence models are given more weight based on the importance and accuracy of the classifier [26].
The prediction ensemble system in this study uses soft voting to obtain the final prediction results. Soft voting usually requires different classifiers to compensate for their drawbacks. Based on the accuracy and algorithmic differences, this study chose LR, DT, NBC algorithms as level 2 classifiers, and the results of level 2 classifiers with different weights are used for soft voting.

3.3. Evaluation Measures

In this study, the proposed ensemble system for predicting customer churn is evaluated using accuracy, precision, recall and F1 score.
Equation (3) calculates the accuracy metric. It is defined as the ratio of the number of samples correctly classified by the classifier to the total number of samples for a given test dataset.
Accuracy   = ( T P + T N ) ( T P + T N + F P + F N )
In Equation (3) ‘TN’ is True Negative, ‘TP’ is True Positive, ‘FN’ is False Negative and ‘FP’ is False Positive. Equation (4) is the formula for Precision. It identifies that the part of the prediction data is positive.
Precision   = T P ( T P + F P )
The recall is another measure for completeness, i.e., the true hit of the algorithm. It is calculated by using Equation (5).
Recall   = T P ( T P + F N )
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contributions of precision and recall to the F1 score are equal. It is calculated by using Equation (6).
F 1 score = 2   Precision   Recall ( Precision + Recall )

4. Results

4.1. Feature Construction

As shown in the Table 1, 16 features with numerical value in the dataset can be applied to equidistant grouping. The features ‘account length’ and ‘area code’ are not the customer’s daily behavior and consumption features. The features ‘total intl charge’ and ‘customer service calls’ cannot be effectively grouped equidistantly because the value range is too small. However, the value range of the selected 12 features is relatively large and can be used to mine the hidden information between the data. Equation (1) is used to determine the number (K) of groups for 12 original features. The results are shown in Table 2.
For example, the feature ‘Total day calls’has numerical value, ranging from 0 to 165. The feature construction results using Equation (1) are shown in Table 3. The feature ‘Total day calls’ is divided into 8 groups equidistantly through Equation (1). There are instances in the same group, where the same value is assigned in the new feature.
The histogram of the new features also approximates the normal distribution of the new features as shown in Figure 5.
Feature construction is to improve accuracy by expanding feature space. If the original dataset and the new dataset are not combined, it cannot increase feature space and improve accuracy. As shown in Figure 6, 12 new features are derived from the original features through feature construction. The new dataset is generated based on 21 original features combined with 12 new features.
We use four machine learning algorithms with two datasets, namely, the original dataset and the new dataset to compare its performance. The results in Table 4 show that the new dataset shows better accuracy for all classifiers and that the proposed feature construction method improves the stacking model performance. The new feature grouping discovers missing latent factors from the churn dataset, regarding the relationship between features, and expands the feature space by creating additional features.

4.2. Stacking Model

The LR, DT, and NBC algorithms have relatively low accuracy compared to the XGB algorithm. However, by using high-accuracy XGB to stack the three relatively low-accuracy algorithms of LR, DT, and NBC, the model performance of these three machine-learning algorithms is improved. Table 5 shows that the accuracy of the stacked model for these classifiers with the new dataset increases by approximately 10%.
Table 6 shows that the stacking model accuracy of the new dataset increases by around 1% compared to the original dataset.

4.3. Soft Voting and Final Results

The stacked model outputs three different types of model results. These three results are input into soft voting. For better results, the models with high confidence and accuracy are given more weight in the soft-voting process [26]. Among the three models, LR has the best accuracy as show in Table 6 and is used as a main classifier and given more weight value assigned to 0.4. The accuracy of NBC and DT are slightly lower than that of LR, and their weight values are assigned to 0.3. Each weight for NBC and DT is lower than the main classifier LR, but the sum of their weights is bigger than the weight of LR. This can make up for the shortcomings of LR and improve the accuracy of soft voting. Soft voting outputs the label of highest probability. The results in Table 7 show that the accuracy of the proposed stacking ensemble system is 98.09% for the new dataset and 96.12% for the original dataset.

4.4. Comparison with Other Works

Table 8 compares the proposed ensemble system with other works. The proposed customer-churn prediction ensemble system shows the best accuracy.

5. Conclusions

Various machine-learning techniques have been used for customer churn in CRM. This study proposes a customer-churn prediction system based on a stacking ensemble of machine learning, which consists of XGB in level 1; LR, DT, and NBC in level 2; and soft voting. Feature construction is used to expand the feature space and discover latent information from implicit features. The proposed feature construction through feature grouping improves the prediction accuracy compared to the original customer-churn dataset. The result of proposed system showed the best accuracies of 96.12% and 98.09% for the old and new datasets respectively compared to other prediction systems. This proposed system can determine important factors affecting customer purchasing behavior in the telecommunications industry.

Author Contributions

Conceptualization, T.X.; Data curation, T.X.; Formal analysis, T.X.; Funding acquisition, K.K.; Methodology, T.X.; Project administration, K.K.; Resources, K.K.; Software, T.X.; Supervision, K.K.; Validation, Y.M.; Visualization, T.X. and Y.M.; Writing—original draft, T.X.; Writing—review & editing, Y.M. and K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository. The data presented in this study are openly available in (https://www.kaggle.com/becksddf/churn-in-telecoms-dataset, accessed on 21 May 2021) website download. This is a dataset in the book “Discovering Knowledge in Data” by Daniel T. Larose [17].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. García, D.L.; Nebot, À.; Vellido, A. Intelligent data analysis approaches to churn as a business problem: A survey. Knowl. Inf. Syst. 2017, 51, 719–774. [Google Scholar] [CrossRef] [Green Version]
  2. Borja, B.; Bernardino, C.; Alex, C.; Ricard, G.; David, M.-M. The Architecture of a Churn Prediction System Based on Stream Mining. Front. Artif. Intell. Appl. 2013, 256, 157–166. [Google Scholar] [CrossRef]
  3. Kotler, P.T. Marketing Management: Analysis, Planning, Implementation and Control; Prentice-Hall: London, UK, 1994. [Google Scholar]
  4. Ngai, E.; Xiu, L.; Chau, D. Application of data mining techniques in customer relationship management: A literature review and classifification. Expert Syst. Appl. 2009, 36, 2592–2602. [Google Scholar] [CrossRef]
  5. Motoda, H.; Liu, H. Feature Selection, Extraction and Construction; Communication of IICM (Institute of Information and Computing Machinery): Taipei, Taiwan, 2001; Volume 5, pp. 67–72. [Google Scholar]
  6. Edwards, R.A.H.; Šúri, M.; Huld, T.A.; Dallemand, J.F. GIS-Based Assessment of Cereal Straw Energy Resource in the European Union. Available online: http://citeseerx.ist.psu.edu/viewdoc/download? (accessed on 10 February 2020).
  7. Sharma, A.; Panigrahi, D.P.K. A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services. Int. J.Comput. Appl. 2013, 27, 26–31. [Google Scholar] [CrossRef]
  8. Amin, A.; Al-Obeidat, F.; Shah, B.; Adnan, A.; Loo, J.; Anwar, S. Customer churn prediction in telecommunication industry using data certainty. J. Bus. Res. 2019, 94, 290–301. [Google Scholar] [CrossRef]
  9. Vijaya, J.; Sivasankar, E. An efficient system for customer churn prediction through particle swarm optimization based feature selection model with simulated annealing. Clust. Comput. 2019, 22, 10757–10768. [Google Scholar] [CrossRef]
  10. Umayaparvathi, V.; Iyakutti, K. Applications of Data Mining Techniques in Telecom Churn Prediction. Int. J. Comput. Appl. 2012, 42, 5–9. [Google Scholar] [CrossRef]
  11. Jahromi, A.T.; Moeini, M.; Akbari, I.; Akbarzadeh, A. A Dual-Step Multi-Algorithm Approach for Churn Prediction in Pre-Paid Telecommunications Service Providers. J. Innov. Sustain. RISUS 2010, 1, 2179–3565. [Google Scholar] [CrossRef] [Green Version]
  12. Zhang, Y.; Qi, J.; Shu, H.; Cao, J. A hybrid KNN-LR classifier and its application in customer churn prediction. In Proceedings of the 2007 IEEE International Conference on Systems, Man and Cybernetics, Montréal, QC, Canada, 7–10 October 2007; pp. 3265–3269. [Google Scholar] [CrossRef]
  13. Reichheld, F.F.; Sasser, W.E. Zero defections: Quality comes to services. Harv. Bus. Rev. 1990, 68, 105–111. [Google Scholar] [PubMed]
  14. Obiedat, R.; Al-kasassbeh, M.; Faris, H.; Harfoushi, O. Customer churn prediction using a hybrid genetic programming approach. Sci. Res. Essays 2013, 8, 1289–1295. [Google Scholar] [CrossRef]
  15. Sabbeh, S.F. Machine-Learning Techniques for Customer Retention: A Comparative Study. Int. J. Adv. Comput. Sci. Appl. 2018, 9. [Google Scholar] [CrossRef] [Green Version]
  16. Faris, H. A Hybrid Swarm Intelligent Neural Network Model for Customer Churn Prediction and Identifying the Influencing Factors. Information 2018, 9, 288. [Google Scholar] [CrossRef] [Green Version]
  17. Larose, D.T.; Larose, C.D. Discovering Knowledge in Data: An Introduction to Data Mining; John Wiley & Sons: New York, NY, USA, 2014. [Google Scholar]
  18. Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef] [Green Version]
  19. García-Torres, M.; Gómez-Vela, F.; Becerra-Alonso, D.; Melián-Batista, B.; Moreno-Vega, J.M. Feature grouping and selection on high-dimensional microarray data. In Proceedings of the 2015 International Workshop on Data Mining with Industrial Applications (DMIA), San Lorenzo, Paraguay, 14–16 September 2015. [Google Scholar]
  20. Sturges, H.A. The choice of a class interval. J. Am. Stat. Assoc. 1926, 21, 65–66. [Google Scholar] [CrossRef]
  21. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef] [Green Version]
  22. Chatterjee, S.; Simonoff, J. Handbook of Regression Analysis with Applications in R. Logistic Regression; Wiley: Hoboken, NJ, USA, 2020; pp. 143–171. [Google Scholar] [CrossRef]
  23. Suzuki, J. Decision Trees. In Statistical Learning with Math and R; Springer: Singapore, 2020; pp. 147–170. [Google Scholar] [CrossRef]
  24. Larose, C.D.; Larose, D.T. NAÏVE BAYES CLASSIFICATION. In Data Science Using Python and R; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2019. [Google Scholar] [CrossRef]
  25. Ting, K.M.; Witten, I.H. Issues in Stacked Generalization. J. Artif. Intell. Res. 1999, 10, 271–289. [Google Scholar] [CrossRef] [Green Version]
  26. Zhou, Z. Ensemble Methods: Foundation Sand Algorithms; CRC Press: Boca Raton, FL, USA, 2012; ISBN 978-1-439-830031. [Google Scholar]
Figure 1. The proposed customer-churn prediction system.
Figure 1. The proposed customer-churn prediction system.
Applsci 11 04742 g001
Figure 2. Example of constructing a new feature by equidistance.
Figure 2. Example of constructing a new feature by equidistance.
Applsci 11 04742 g002
Figure 3. Histograms for data distributions of 4 features.
Figure 3. Histograms for data distributions of 4 features.
Applsci 11 04742 g003aApplsci 11 04742 g003b
Figure 4. The proposed stacking model.
Figure 4. The proposed stacking model.
Applsci 11 04742 g004
Figure 5. Histogram for the new feature ‘total day calls group’.
Figure 5. Histogram for the new feature ‘total day calls group’.
Applsci 11 04742 g005
Figure 6. The new dataset after feature construction.
Figure 6. The new dataset after feature construction.
Applsci 11 04742 g006
Table 1. Data-set description.
Table 1. Data-set description.
Feature NameDescriptionObject
StateCustomer StateObject
Account lengthAccount used daysInt64
Area codePhone area codeInt64
phone numberCustomer phone numberObject
International planWhether the customer starts international businessObject
Voice mail planWhether the customer starts the voice mail serviceObject
Number vmail messagesNumber of customer vmail messagesInt64
Total day minutesTotal minutes of talk during the dayFloat64
Total day callsNumber of calls in the dayInt64
Total day chargeCall charges during the dayFloat64
Total eve minutesTotal minutes of talk last nightFloat64
Total eve callsNumber of calls last nightInt64
Total eve chargeCharges for calls last nightFloat64
Total night minutesNight total call minutesFloat64
Total night callsTotal number of calls in the eveningInt64
Total night chargeTotal charge for calls at nightFloat64
Total intl minutesTotal minutes of international business callsFloat64
Total intl callsTotal number of international business callsInt64
Total intl chargeTotal charges for international business callsFloat64
Customer service callsThe number of calls for customer serviceInt64
ChurnIs the customer churnBool
Table 2. Twelve features and values for grouping.
Table 2. Twelve features and values for grouping.
FeatureValueK
Number vmail messages0–516
Total day minutes0–350.810
Total day calls0–1658
Total day charge0–59.646
Total night minutes0–363.710
Total night calls0–1708
Total night charge0–30.917
Total eve minutes0–39510
Total eve calls0–1758
Total eve charge0–17.775
Total intl minutes0–205
Total intl calls0–205
Table 3. The grouping results of the feature ‘Total day calls’.
Table 3. The grouping results of the feature ‘Total day calls’.
Instance (Index)Original Feature New Feature
11106
21237
31146
4714
51136
6985
7885
8794
...
33331136
Table 4. Model performance comparison on different datasets.
Table 4. Model performance comparison on different datasets.
ModelAccuracyPrecisionRecallF1-ScoreTime(S)
ODSNDSODSNDSODSNDSODSNDSODSNDS
LR0.84820.85860.79780.84120.85760.87450.80540.84120.03270.0606
DT0.85130.86050.81760.84870.80290.80780.81730.82870.04230.0463
NBC0.85140.85750.84450.87730.85730.86310.85450.86640.00480.0053
XGB0.95430.95540.94670.95470.95280.95480.95230.95540.58530.7207
(ODS: Original dataset, NDS: New dataset).
Table 5. Accuracy after stacking for three classifiers.
Table 5. Accuracy after stacking for three classifiers.
ModelAccuracy
Before StackingAfter Stacking
LR0.85860.9585
DT0.86050.9560
NBC0.85750.9535
Table 6. Accuracy of stacking model with ODS and NDS.
Table 6. Accuracy of stacking model with ODS and NDS.
ModelAccuracy
ODSNDS
LR0.94630.9585
DT0.94930.9560
NBC0.94710.9535
(ODS: Original dataset, NDS: New dataset).
Table 7. Soft voting results for different datasets.
Table 7. Soft voting results for different datasets.
ModelAccuracy
ODSNDS
Proposed model0.96120.9809
Table 8. Comparison with other works.
Table 8. Comparison with other works.
WorkModelAccuracy
Ruba Obiedat [14]Hybrid genetic programming approach0.9140
Sahar F. Sabbeh [15]AdaBoost0.9639
Hossam Faris [16]Hybrid swarm intelligent neural network model0.9630
This studyEnsemble system0.9809
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, T.; Ma, Y.; Kim, K. Telecom Churn Prediction System Based on Ensemble Learning Using Feature Grouping. Appl. Sci. 2021, 11, 4742. https://doi.org/10.3390/app11114742

AMA Style

Xu T, Ma Y, Kim K. Telecom Churn Prediction System Based on Ensemble Learning Using Feature Grouping. Applied Sciences. 2021; 11(11):4742. https://doi.org/10.3390/app11114742

Chicago/Turabian Style

Xu, Tianpei, Ying Ma, and Kangchul Kim. 2021. "Telecom Churn Prediction System Based on Ensemble Learning Using Feature Grouping" Applied Sciences 11, no. 11: 4742. https://doi.org/10.3390/app11114742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop