Forecasting Subscriber Churn: Comparison of Machine Learning Methods
Abstract
In order to remain competitive today in the telecommunications business, it is necessary to identify customers who are dissatisfied with the services provided. Therefore, forecasting subscriber churn has become an essential issue in this area. This article overviews different machine learning techniques including Decision Trees (DT), Naive Bayes Classifier (NB), Random Forest (RF), Artificial Neural Network (NN), KNearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM) and their ensembles (bagging and boosting) in order to demonstrate the superiority of the CatBoost technology in gaging the effectiveness of classifiers. To achieve the goal, data was classified and the specific advantages, when compared to others, of the CatBoost method were revealed based on obtained results. For the study, we analyzed four databases: 3 datasets are in open access and 1 dataset was provided by a Russian mobile company. Often, the dimension of these databases is high, which leads to a number of problems (including class imbalances, parameter correlations), which are solved by employing the dimensionality reduction method: Principal Component Analysis (PCA). The results obtained are compared with each other as well as with the results presented by other researchers based on open databases. The effectiveness of classifiers is evaluated using measures such as the area under the curve (AUC), accuracy, F1 -measure, and time.
References
2. Карякина А. А., Мельников А. В. Сравнение моделей прогнозирования оттока клиентов
интернет-провайдеров // Машинное обучение и анализ данных, 2017. Том 3, № 4. С. 250–256.
3. Пономарёв А. А. Сегментация пользователей мобильных операторов с помощью моделей
Больших Данных. СПбГУ, 2018. URL: https://dspace.spbu.ru/bitstream/11701/11992/1/vkr.docx (дата обращения 20.09.2018).
4. Чистяков C. П. Случайные леса: обзор // Труды Карельского научного центра РАН. 2013. № 1.
C. 117–136.
5. Akay M. F. Support vector machines combined with feature selection for breast cancer diagnosis //
Expert Systems with Applications. 2009. Vol. 36(2). doi:10.1.1.473.6145
6. Albadawi S., Latif K., Kharbat F. Telecom Churn Prediction Model Using Data Mining Techniques
[Bahria University Journal of Information & Communication Technologies], 2017. Vol 10. № Special
Issue. P. 8–14.
7. Dziaugyte S., Mzyk M. Churn analysis — machine learning. Bloomington, 2016.
8. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and
Prediction. N.Y.: Springer, 2009.
9. Jolliffe I.T. Principal Component Analysis / Springer Series in Statistics. N.Y.: Springer, 2002.
doi:10.1007/b98835
10. Karthik Subramanya. Enhanced feature mining and classifier models to predict customer churn for
an e-retailer / A thesis for the degree of MS. Iowa State University. Ames, 2016.
11. Keramati A., Jafari-Marandi R., Aliannejadi M., Ahmadianc I., Mozaffari M., Abbasia U. Improved
churn prediction in telecommunication industry using data mining techniques (Applied Soft
Computing), 2014. Vol. 24. P. 994–1012.
12. Kriti M. A Machine Learning Approach for Churn Prediction in Telecommunication // International
Conference on Energy, Communication, Data Analytics and Soft Computing. Chennai, India, 2017.
13. Lomax S.,Vadera S. Case Studies in Applying Data Mining for Churn Analysis [International Journal
of Conceptual Structures and Smart Applications], 2017. № 5 (2). P. 22–33.
14. Mullin M., Sukthankar R. Complete cross-validation for nearest neighbor classifiers // Proceedings of
International Conference on Machine Learning. San Francisco, CA, 2000.
15. Oates S. Churn Analysis. Sydney, 2018.
16. Prashanth R., Deepak K. High Accuracy Predictive Modelling for Customer Churn Prediction
in Telecom Industry // Machine Learning and Data Mining in Pattern Recognition. N.Y., 2017.
doi:10.1007/978-3-319-62416-7_28
17. Scott F.-R. Accurately Measuring Model Prediction Error, 2012. URL: http://scott.fortmannroe.com/docs/MeasuringError.html
18. Sowmya V. Using Linear Discriminant Analysis to Predict Customer, 2018. URL: https://www.
datascience.com/blog/predicting-customer-churn-with-a-discriminant-analysis
19. Viola P., Jones M. Rapid Object Detection using a Boosted Cascade of Simple Features, in Accepted
Conference on Computer Vision and Pattern Recognition, 2001.
20. Luqi Yao. Customer Churn Prediction, USA, 2016. URL: http://rpubs.com/LuqiYao/churn (дата обращения 20.09.2018).
This work is licensed under a Creative Commons Attribution 4.0 International License.