Document Type : Research Paper

Authors

1 Department of Information Technology Management, Faculty of Management and Economics,, Tarbiat Modares University, Tehran, Iran

2 IT Management Department, Tarbiat Modares University

3 Department of Information Technology Management, Tarbiat modares University, Tehran, Iran

Abstract

Customer churning is one of the most important issues facing Internet Service Providers in a competitive and rapidly saturating market. Due to the high costs associated with attracting new customers, ISPs have turned to a customer retention approach that explicitly seeks to reduce churn. This study has been surveyed the churning of internet service customers in one of the largest telecommunications companies in Iran. In order to predict the churn, customer data has been collected during six months, and their churning behavior has been investigated over a period of one year after. In addition to churn prediction, the most important factors affecting churn have been identified. In the preprocessing step, the "Random Under-Sampling" method is used to balance the data set and the "minimum-Redundancy, Maximum-Relevance" method is used to feature selection. Then, the "Random Forest", "Support Vector Machine" and "K-Nearest Neighbors" algorithms were applied to classify churning and non-churning customers, and the evaluation criteria show the superiority of the random forest algorithm. The final model, which was obtained from a combination of balancing, feature selection and classification methods, called the RUS-mRMR-RF model, is considered as an efficient model in customer churn prediction and identifying the most important factors affecting churn. The results of this study provide valuable insights to the company to develop customer retention strategies.

Keywords

خانلری، الف.، رییسی وانانی، الف.، و مقدسی، ز. (1395). پیش بینی رویگردانی مشتریان شرکت‌های ارایه‌دهنده خدمات اینترنت با ترکیبی از الگوریتم‌های داده کاوی؛ مطالعه موردی یک شرکت ایرانی. دومین کنفرانس بین‌المللی در مدیریت، حسابداری و اقتصاد، تهران. https://civilica.com/doc/610791/
سپهری، س.، و کوشا، ح. (1396). پیش‌بینی رویگردانی مشتریان با استفاده از کرنل‌های ترکیب شده در تکنیک ماشین بردار پشتیبان. سومین کنفرانس بین‌المللی مهندسی صنایع و سیستم‌ها (ICISE 2017)، مشهد. https://civilica.com/doc/669097
عاشوری، ع.، و البدوی، الف. (1394). مدل ترکیبی برای پیشبینی دلایل رویگردانی مشتریان شرکت‌های ارایه‌دهنده خدمات اینترنتی ISP. دوازدهمین کنفرانس بین‌المللی مهندسی صنایع، تهران. https://civilica.com/doc/515893
عسگری، م. تقوا، م. و تقوی فرد، م. (1398). پیش‌بینی رویگردانی جزئی مشتریان بانک‌ها با استفاده از مدل زنجیره وضعیت. فصلنامه مطالعات مدیریت کسب و کار هوشمند، 7(28)، 67110. https://doi.org/10.22054/IMS.2019.10230
کاظمی، م.، و حجازی نیا، ر. (1392). ارایه مدلی به منظور پیش‌بینی رویگردانی مشتریان شرکت‌های مخابراتی. دومین همایش ملی علوم مدیریت نوین، گرگان. https://civilica.com/doc/231685
کاظمی، م.، و حجازی نیا، ر. (1395). بررسی متغیرهای موثر در رویگردانی مشتریان تلفن همراه. فصلنامه مدیریت توسعه و تحول، دوره 1395، ویژه نامه: 115-121. http://jdem.ir/article_534563.html?lang=en
Amin, A., Anwar, S., Adnan, A., Nawaz, M., Alawfi, K., Hussain, A., & Huang K. (2017). Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing, 237: 242-254. https://doi.org/10.1016/j.neucom.2016.12.009
Bahari, T.F., & Elayidom, M.S. (2015). An efficient CRM-data mining framework for the prediction of customer behaviour. Procedia Computer Science, 46: 725-731. https://doi.org/10.1016/j.procs.2015.02.136
Bi, W., Cai, M., Liu, M., & Li G. (2016). A Big Data Clustering Algorithm for Mitigating the Risk of Customer Churn. IEEE Transactions on Industrial Informatic, 12(3): 1270-1281. https://doi.org/10.1109/tii.2016.2547584
Biau, G., & Scornet E. (2016). A random forest guided tour. TEST, 25: 197–227. https://doi.org/10.1007/s11749-016-0481-7
Breiman, L. (1999). Radnom Forests – Random Features. Technical Report 567, Department of Statistics, University of California: Berkeley, CA, USA. https://www.stat.berkeley.edu/~breiman/random-forests.pdf
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification And Regression Trees (1st ed.). Routledge. https://doi.org/10.1201/9781315139470
Caigny, A.D., Coussement, K., & De Bock, K.W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269(2): 760-772. https://doi.org/10.1016/j.ejor.2018.02.009
Caigny, A.D., Coussement, K., & De Bock, K.W. (2020). Leveraging fine-grained transaction data for customer life event predictions. Decision Support Systems, 130: 113232. https://doi.org/10.1016/j.dss.2019.113232
Chawla, N.V., Bowyer, K.W., Hall, L.O., & Kegelmeyer, W.P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16: 321-357. https://doi.org/10.1613/jair.953
Couronné, R., Probst, P., & Boulesteix, A.L. (2018). Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics, 19: 270. https://doi.org/10.1186/s12859-018-2264-5
Coussement, K., Lessmann, S., & Verstraetenc, G. (2017). A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry. Decision Support Systems, 95: 27-36. https://doi.org/10.1016/j.dss.2016.11.007
Dahiya, K., & Bhatia, S. (2015). Customer churn analysis in telecom industry. 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions): 1-6. https://doi.org/10.1109/icrito.2015.7359318
Gui, C. (2017). Analysis of imbalanced data set problem: The case of churn prediction for telecommunication. Artificial Intelligence Research, 6(2): 93-99. https://doi.org/10.5430/air.v6n2p93
Hadden, J., Tiwari, A., Roy, R., & Ruta, D. (2007). Computer assisted customer churn management: State-of-the-art and future trends. Computers & Operations Research, 34(10): 2902-2917. https://doi.org/10.1016/j.cor.2005.11.007
Harper, G., & Pickett, S.D. (2006). Methods for mining HTS data. Drug Discovery Today, 11(15–16): 694-699. https://doi.org/10.1016/j.drudis.2006.06.006
Hassouna, M., Tarhini, A., Elyas, T., & AbouTrab, M.S. (2016). Customer churn in mobile markets a comparison of techniques. International Business Research, 8(6): 224-237. https://doi.org/10.48550/arXiv.1607.07792
Idris, A., Rizwan, M., & Khan, A. (2012). Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies. Computers & Electrical Engineering, 38(6): 1808-1819. https://doi.org/10.1016/j.compeleceng.2012.09.001
Jain, H., Khunteta, A., & Srivastava, S. (2021). Telecom churn prediction and used techniques, datasets and performance measures: a review. Telecommun Systems, 76: 613-630. https://doi.org/10.1007/s11235-020-00727-0
Keramati, A., Jafari-Marandi, R., Aliannejadi, M., Ahmadian, I., Mozaffari, M., & Abbasi, U. (2014). Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing, 24: 994-1012. https://doi.org/10.1016/j.asoc.2014.08.041
Khodabandehlou, S., & Rahman, M.Z. (2017). Comparison of supervised machine learning techniques for customer churn prediction based on analysis of customer behavior. Journal of Systems and Information Technology, 19(1/2): 65-93. https://doi.org/10.1108/JSIT-10-2016-0061
Kotsiantis, S.B., Kanellopoulos, D., & Pintelas, P.E. (2006). Data preprocessing for supervised leaning. International Journal of Computer Science, 1(1): 111-117. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.8413&rep=rep1&type=pdf
Li, S.T., Shue, L.Y., & Lee, S.F. (2006). Enabling customer relationship management in ISP services through mining usage patterns. Expert Systems with Applications, 30(4): 621-632. https://doi.org/10.1016/j.eswa.2005.07.016
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3): 18-22. https://cogns.northwestern.edu/cbmg/LiawAndWiener2002.pdf
Mariscal, G., Marban, O., & Fernandez, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(2): 137-166. https://doi.org/10.1017/S0269888910000032
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8): 1226-1238. https://doi.org/10.1109/TPAMI.2005.159
Speiser, J.L., Miller, M.E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134: 93-101. https://doi.org/10.1016/j.eswa.2019.05.028
Torres, R., Sidorova, A., & Jones, M.C. (2018). Enabling firm performance through business intelligence and analytics: A dynamic capabilities perspective. Information & Management, 55(7): 822-839. https://doi.org/10.1016/j.im.2018.03.010
Ullah, I., Raza, B., Malik, A.K., Imran, M., Islam, S.U., & Kim, S.W. (2019). A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access, 7: 60134-60149. https://doi.org/10.1109/ACCESS.2019.2914999
Vafeiadis, T., Diamantaras, K.I., Sarigiannidis, G., & Chatzisavvas, K.Ch. (2015). A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55: 1-9. https://doi.org/10.1016/j.simpat.2015.03.003
Vo, N.N.Y., Liu, S., Li, X., & Xu, G. (2021). Leveraging unstructured call log data for customer churn prediction. Knowledge-Based Systems, 212: 106586. https://doi.org/10.1016/j.knosys.2020.106586
Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining: 29-39. http://www.cs.unibo.it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf
Wu, S., Yau, W.C., Ong, T.S., & Chong, S.C. (2021). Integrated Churn Prediction and Customer Segmentation Framework for Telco Business. IEEE Access, 9: 62118-62136. https://doi.org/10.1109/ACCESS.2021.3073776
Xiao, J., Jiang, X., He, C., & Teng, G. (2016). Churn prediction in customer relationship management via GMDH-based multiple classifiers ensemble. IEEE Intelligent Systems, 31(2): 37-44. https://doi.org/10.1109/MIS.2016.16
Yap, B.W., Abd Rani, K., Abd Rahman, H.A., Fong, S., Khairudin, Z., & Abdullah, N.N. (2014). An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013): 13-22. https://doi.org/10.1007/978-981-4585-18-7_2
Zhu, B., Baesens, B., & vanden Broucke, S.K.L.M. (2017). An empirical comparison of techniques for the class imbalance problem in churn prediction. Information Sciences, 408: 84-99. https://doi.org/10.1016/j.ins.2017.04.015
Asgari, M., Taghva, M., & Taghavifard, M.T. (2019). Prediction of Bank Customers’ Partial Churn Using State Chain Model. BI Management Studies, 7(28): 67 – 110. [In Persian] https://doi.org/10.22054/IMS.2019.10230
Ashoori, A., & Albadavi, A. (2016). Combined model for predicting customer churn reasons of Internet service providers (ISP). 12th International Conference in Industrial Engineering, Tehran. [In Persian] https://civilica.com/doc/515893
Kazami, M., & Hejazinia, R. (2013). Provide a Model for Customer Churn Prediction of telecommunication companies. The Second Conference on Modern Management Sciences, Gorgan. [In Persian] https://civilica.com/doc/231685
Kazami, M., & Hejazinia, R. (2017). Study affective variables in mobile customers churn. Journal of Development Evolution Management, 1395(special issue): 115-121. [In Persian]
Khanlari, A., Vanani, I.R., & Moghadasi, Z. (2017), Customer Churn Prediction of Internet service providers with a combination of data mining algorithms; Case study of an Iranian company. 2nd International Conference of Management, Accounting and Economics, Tehran. [In Persian] https://civilica.com/doc/610791/
Sepehri, S., & Koosha, H. (2017). Customer Churn Prediction using kernels combined in Support Vector Machine technique. 3rd International Conference on /industrial and Systems Engineering (ICISE 2017), Mashhad. [In Persian] https://civilica.com/doc/669097