Data science, intelligence and future analysis
Fariba Karimi; ameneh khadivar; Fatemeh Abbasi
Abstract
In recent years, the rapid growth of virtual space has made people devote more of their time in virtual space, especially to social networks, which can be attributed to the remarkable features of virtual space; including increasing the speed of information exchange, easy and free access to information ...
Read More
In recent years, the rapid growth of virtual space has made people devote more of their time in virtual space, especially to social networks, which can be attributed to the remarkable features of virtual space; including increasing the speed of information exchange, easy and free access to information and variety of knowledge topics. In this regard, the opinions recorded by users in virtual networks have grown day by day and have become very important, and extracting the opinions and feelings of users' opinions for more informed decision-making is of great help to businesses, on the other hand, virtual reality technology in the past few decades It has undergone technical changes and improved immersion and the feeling of remote presence; This technology is used in various fields such as education, tourism, health, sports, entertainment, architecture and construction, etc. The increasing progress of virtual reality technology has caused many businesses to operate in this field, but due to changes Continuous market and the need for timely information, companies should use differentiation and growth strategies, in this regard, they need to ask users' opinions and in line with that, try to grow and improve their business, considering that Users' comments are textual, and reading and summarizing them is time-consuming and difficult. Based on this, the aim of the current research was to categorize comments related to virtual reality technology using machine learning methods and a dictionary-based approach. Therefore, about one million tweets in the field of virtual reality technology were collected by the web crawler, and after data preprocessing, 480,432 samples remained in the data, then Dirichlet's hidden allocation topic modeling was implemented on the data. This modeling separated different topics by examining the distribution of words in tweets; The tweets whose distribution of words were similar were placed into a topic and the number of topics with the highest coherence score was selected, the number of topics 9 had higher coherence and the data were grouped into 9 topics, so once again the Dirichlet hidden allocation modeling was set to 9. The topic was done, with this the tweets were grouped into 9 different topics. To evaluate the model, considering that we had a probability distribution, the confusion criterion was used, the value of which was -9.44, and the coherence score was used for the degree of semantic similarity between words and the distinction between subjects, and the result was 0.47. The lower the confusion criterion and the higher the coherence score, the more efficient the model is. With the help of keyword weights obtained by Dirichlet hidden allocation modeling and examining at least 5 different tweets from each topic, 9 topics related to virtual reality technology were identified: "New Technology", "Creation and Make", "Technological Business", "Education", "Virtual Games", "Progress", "Gadget", "Metaverse", and "Indiegame", the topics were analyzed with the help of several graphs. We found that the number of neutral comments on topics such as "New Technology" and "Metaverse" is more than positive and negative comments, which indicates the lack of sufficient information or the lack of use of these technologies, and it is necessary for businesses in this field, to try more in this regard, in the same way, if we observe the graph of "Virtual Games" and "Technological Business", we can see that it changes almost with the same ratio in different years, in the sense that this The two graphs are related, in fact, businesses should keep in mind that the factors affecting these two issues are the same, but users pay more attention to the issue of "Virtual Games", as a result, if the creators of "Technological Business" Focus specifically on "Virtual Games", they will grow more due to the more attention of users, also the creators of games should consider that "Virtual Games" are a topic of more attention than "Indiegame". Is. In the subjects of "Education" and "Gadget", users lost their attention to these subjects in the field of virtual reality over time, in fact they showed their attention to other subjects, so it is better for businesses that operate in this field to take measures To advertise and attract users or change their user area if there is no growth.
Introduction
Constant changes in the market and the need for timely information force companies to use differentiation and growth strategies appropriate to the needs of customers. (Sánchez, Folgado-Fernández, & Sánchez, 2022). Companies can check and analyze their customers' opinions through microblogging sites (Facebook, Twitter, etc.) and finally improve the desired products or services (Ahmad, Aftab, Bashir, & Hameed, 2018). Today, users express their opinions and feelings and review products in online social networks. Therefore, user comments and the analysis of these comments have become a valuable resource for businesses (Kim et al., 2015; Loureiro et al., 2019).
Virtual reality and augmented reality have undergone technical developments in the past few decades and have improved immersion and the feeling of remote presence. Several examples of applications of such techniques can be found in stores, the tourism industry, hotels, restaurants, etc. (Loureiro, Guerreiro, & Ali, 2020). Due to the constant changes in the market and the need for timely information, companies should use differentiation and growth strategies, nowadays, due to the rapid evolution of the Internet, instead of collecting their opinions through time-consuming and expensive methods such as questionnaires and interviews, etc., they express in the context of social networks, which is very useful for businesses in their development, and they can measure the feelings of customers towards products and services, and understand the needs of users, and finally make appropriate and appropriate decisions in the direction of adopt growth, but in order to use the produced content correctly, text mining and sentiment analysis techniques should be used, which has not been researched in Iran so far. Analysis of users' opinions and feelings about virtual reality technology can help businesses that operate in the field of metaverse, virtual game production, virtual education, virtual tourism, etc., to make better decisions and plans.
Literature Review
Social media generates a large amount of real-time social signals that can provide new insights into human behavior and emotions. People around the world are constantly engaged with social media. (Al-Samarraie, Sarsam, & Alzahrani, 2023).
On the other hand, the amount of data is increasing day by day. Almost all institutions, organizations and business industries store their data electronically. A huge amount of text is circulating on the Internet in the form of digital libraries, repositories, and other textual information such as blogs, social media networks, and emails (Sagayam, Srinivasan, & Roshni, 2012).
Topic modeling is one of the most powerful techniques in text mining for data mining, discovering hidden data and finding relationships between data and textual documents (Jelodar et al., 2017).
The technological advances of the last century have confronted societies with new realities that have indisputably improved daily life, making it more convenient and interesting. In recent decades, technology using virtual reality and wearable devices have had a significant impact in the fields of education, tourism, health, sports, entertainment, architecture and construction, etc. (Kosti et al., 2023).
Virtual reality is a technology that allows a user to interact with a computer-simulated environment, whether that environment is a simulation of the real world or an imaginary one. With virtual reality, we can experience the most frightening and overwhelming situations with safe play and a learning perspective (Mandal, 2013). Most people are curious about the possibilities and future of new technologies, considering the various applications it is supposed to offer such as virtual meetings, learning environments and many others, however, there are also concerns about potential negative effects. because real world signals can be transmitted in the virtual world. In this regard, people express their feelings in different social networks (Bhattacharyya et al., 2023).
Methodology
According to the main goal of the research, which is to classify comments related to virtual reality technology using machine learning methods and a dictionary-based approach, therefore, about one million tweets in the field of virtual reality technology were collected by the web crawler and After data preprocessing, 480,432 samples remained in the data, then Dirichlet hidden allocation thematic modeling was implemented on the data. By examining the distribution of words in tweets, this modeling tries to separate different topics by detecting the distribution of words; The tweets whose distribution of words are similar were put into a topic, and the number of topics with the highest score was selected, the number of topics 9 has higher coherence, and the data was grouped into 9 topics, so once again, Dirichlet hidden allocation modeling was applied 9 topics were done, whereby the tweets were grouped into 9 different topics. Considering that we have a probability distribution, the confusion criterion was used to evaluate the model. The lower the confusion criterion and the higher the coherence score, the more efficient the model is. With the help of keyword weights obtained by Dirichlet hidden allocation modeling and examining at least 5 different tweets from each topic, 9 topics related to virtual reality technology were identified: "New Technologies", "Creation and Make", "Technological Business", "Education", "Virtual Games", "Progress", "Gadget", "Metaverse" and "Indiegame" were named.
Discussion and Conclusion
In this research, by examining topics in different years, we observed that the topic of "Progress" was the most popular topic among users from 2017 to the end of 2021, in early 2022, this topic gave way to "Metaverse", currently "Metaverse" is one of the most popular topics being discussed by users. Businesses in the field of virtual reality should strive for the attractiveness of "Metaverse" and attract users. Likewise, if we observe the "Virtual Games" and "Technological Business" graphs, we can see that they change with almost the same ratio in different years, meaning that these graphs are related to each other, in fact, business and keep in mind that the factors affecting these two issues are the same, but in the case of "Virtual Games" it has more effects, and if "Technological Businesses" specifically focus on virtual games, they will grow more due to the greater attention of users. had Similarly, "Indiegame" which have had a series of changes but in recent years have had a declining trend and then no change, now the creators of these games should check, and in general "Virtual Games" are a more interesting topic than "Indiegame". In the subjects of "Education" and "Gadget" it has been decreasing since the beginning of 2017, which shows that users lost their attention to these subjects in the field of virtual reality over time, in fact to other topics showed their attention, so it is better for businesses that are active in this field to take measures to advertise and attract users, or change their user field if there is no growth.
Keywords: Data Mining, Text Mining, Virtual Reality Technology, Topic Modeling, Latent Dirichlet Allocation.
Data science, intelligence and future analysis
Mozhdeh Salari; Reza Radfar; Mahdi Faghihi
Abstract
AbstractThe purpose of this research is to investigate the effective factors in predicting the academic performance of undergraduate students in the classification of four classes. To achieve this goal, the study follows the CRISP data mining method. The data set was extracted from the NAD educational ...
Read More
AbstractThe purpose of this research is to investigate the effective factors in predicting the academic performance of undergraduate students in the classification of four classes. To achieve this goal, the study follows the CRISP data mining method. The data set was extracted from the NAD educational system for the bachelor's degree in Shahed University for the entry of the years 2011 to 2021. 1468 records were used in data mining. First, the effective features on students' academic performance were extracted. Modeling was done using Rapidminer9.9 tool. To improve classification performance and satisfactory prediction accuracy, we use a combination of principal component analysis combined with machine learning algorithms and feature selection techniques and optimization algorithms. The performance of the prediction models is verified using 10-fold cross-validation. The results showed that the decision tree algorithm is the best algorithm in predicting students' performance with an accuracy of 84.71%. This algorithm correctly predicted the graduation of 77.88% of excellent students, 85.26% of good students, 84.69% of medium students, and 85.96% of weak students based on the final GPA. IntroductionThe main problem in this research is to identify the factors that are effective in predicting the academic performance of undergraduate students in Shahed University. Choosing the best machine learning algorithm in predicting academic performance among different modeling methods based on validation and evaluation of models is another issue in the present research. The purpose of this research is to investigate the effective factors in predicting the academic performance of undergraduate students in Shahed University using educational data mining based on classification models.Research questionsThe main question in this research is what factors affect the prediction of undergraduate students' performance and improving their performance?Sub questions1- Which modeling algorithms have better results in predicting student performance?2- What methods have been used to predict students' performance?3- What is the validity of the developed model for Shahed University students? 2- Research background1-2- Theoretical foundationsEducational data miningThe processing of educational data improves the prediction of student behavior and new approaches to educational policies (Capuano & Toti, 2019) (Viberg et al., 2018)Academic performanceAcademic performance of students means the extent to which they achieve educational goals (Banik & Kumar, 2019).2-2- review of past studiesThe highlighted cells in Table 1, based on past research, show the classification algorithms that have the most accuracy and effectiveness in predicting students' performance in the relevant research. The decision tree algorithm has been used the most in previous researches. The NB algorithm has been the most used in research after the decision tree. RF and ANN algorithms are next in use. After that, SVM and KNN algorithms have been used in researchTable 1. The results of research literature based on the use of classification algorithmsData mining algorithmDTRFNBKNNSVMANNLine RLLRAccuracy(Batool et al., 2023) * * (Marjan et al., 2023)****** (Abdelmagid & Qahmash, 2023) * ** * (Manoharan et al., 2023)** * * * (Alghamdi & Rahman, 2023)*** 99.34%(Alboaneen et al., 2022) * **** (Yağcı, 2022)* *** *70-75%(Dabhade et al., 2021)* * * 83.44%(Najafi & etal,2021)* 95%(Soltani & etal,2021)* ** (Cruz-Jesus et al., 2020) * ** *50-81%(Sokkhey & Okazaki, 2020)*** * (Rebai et al., 2020)** (Jayaprakash et al., 2020)*** (Zulfiker et al., 2020)** * (Musso et al., 2020) * (Waheed et al., 2020) * 85%(Salal & Abdullaev, 2019)* **** (Turabieh, 2019)* ** * (Xu et al., 2019)* ** (ghodoosi & etal,2019)* * (fadavi & etal,2019) * 95.84%(Ajibade et al., 2019)* *** 91.5%(Ahmad & Shahzadi, 2018) * 85%(Hasani & Bazrafshan, 2018)* * (Hussain et al., 2018)*** * (Umer et al., 2017)**** * (Khasanah, 2017)* * (Asif et al., 2017)* (Hoffait & Schyns, 2017) * * *92.34%(khosravi &etal,2017)* * (Mueen et al., 2016)* * * 86%(Amrieh et al., 2015)* ** (Yehuala, 2015)* * 92.34%(zahedi & etal,2015)* * * (Punlumjeak & Rachburee, 2015)* (Osmanbegović et al., 2014)** 71%(Shamloo & et al.,2014)* (Asadi & et al.,2013)* (Kabakchieva, 2013)* ** 60-75%(Oskouei & Askari, 2014)*** * 96%(Nghe et al., 2007)* * present research****** 94.17%3- MethodThis study follows the popular training data mining method CRISP. The data collection of Nad educational system for bachelor's degree in non-medical fields of Shahed University has been extracted from 2011 to 2021. We used the Label Encoder technique to encode the features. In this research, C4.5 and ID3 decision tree classification algorithms, random forest, Naïve Bayes, k-nearest neighbor and artificial neural network and gradient enhanced tree were used to analyze and classify students and predict the final GPA. Modeling was done using RapidMiner 9.9. To improve the classification performance and solve the misclassification problem, we use a combination of principal component analysis and feature selection techniques and optimization algorithms. In this research, prediction accuracy was evaluated using 10-fold cross-validation method for all algorithms. Also, different algorithms were compared using the analytical descriptive method and based on evaluation criteria, and the best prediction model was introduced in this research.4-Data analysis4-1 IntroductionThe best model is the model that has the best values for the selected performance measurement criteria(Lever et al., 2016). Figure 1 is a graph that compares the accuracy of the algorithms used in this research.Figure 1. Comparative chart of the accuracy of the algorithms According to Table 2, the DTC4.5 algorithm is able to predict the class of 1235 objects out of 1458, which gives it an accuracy value of 84.71%.Table 2. Confusion matrix of DT C4.5-GI&OSE research modelprecisionStudents with poor performanceStudents with average performanceStudents with good performanceStudents with excellent performance 78.64%002281Prediction 178.67%94929522Prediction 286.46%50498271Prediction 389.36%3614120Prediction 4 85.95%84.69%85.26%77.88%Recall4-2 important featuresThe prioritization of predictive variables based on their weight is as follows:Diploma GPA: 0.262Semester 1 GPA: 0.201Semester 2 GPA: 0.197Number of honors semesters: 0.122Conditional number: 0.114Year of entry: 0.1044-3 The results of the implementation of the student performance prediction modelThe results of the prediction model are shown in Table 3:Table 3. The results of the DT C4.5-GI&OSE model implementation 5- DiscussionIn the main method of research, namely DT C4.5-GI&OSE, in the classification mode of four classes, it is observed that the average of the diploma has the greatest effect on the process of predicting student performance. In response to the sub-question of a research, the best algorithm in the four-class mode is Decision Tree C4.5-GI&OSE with a prediction accuracy of 84.71. This model showed 84.17% accuracy, 83.42% sensitivity and 0.780 kappa. DT C4.5-GI&OSE technique correctly predicted the graduation of 77.88% of excellent students, 85.26% of good students, 84.69% of average students, and 85.96% of poor students.6-ConclusionThe obtained results show that there is a relationship between students' social and academic characteristics and their academic performance. DT C4.5-GI&OSE algorithm was the best algorithm for predicting the final GPA scores of students at the end of studies with a prediction accuracy of 84.71%. In this model, the average grade point average of the diploma has the greatest effect on the prediction process. Using machine learning models as a decision support tool improves the academic level of students and reduces the number of potential unsuccessful and dropout students. This study was carried out at the undergraduate level, which can be used in future research for the master's and doctoral level.Keywords: student performance prediction, data mining, machine learning, modeling, improving the quality of education
Fateme Rahimi; mohammad vahid sebt; nasim ghanbar tehrani
Abstract
In today's competitive world, applying new techniques to business development has a great impact. The restaurant industry is no exception. Therefore, in this research, using new methods of knowledge discovery and data mining, customer data of chain restaurant is investigated. The purpose of this study ...
Read More
In today's competitive world, applying new techniques to business development has a great impact. The restaurant industry is no exception. Therefore, in this research, using new methods of knowledge discovery and data mining, customer data of chain restaurant is investigated. The purpose of this study was to explore customer behavior patterns using data mining methods.In this study, one million and five hundred thousand customer records were reviewed in five branches of a chain restaurant and two stages of clustering modeling using RFM method and then classification modeling were performed on the data and the behavior rules chain restaurant customers were extracted. The results of this study have helped to identify the loyal and profitable customers of the chain restaurant which has led to the improvement of the profitability of the chain restaurant. One of the innovations of this research has been the communication between clustering and classification results.
Sina Raeesi Vanani; Iman Raeesi Vanani; Mohammad Taghi Taghavifard
Abstract
Educational performance measurement through the identification and analysis of data extracted from learners’ activities can effectively result in the improvement of educational performance. In this Article, data of international learners was analyzed based on design science methodology and using ...
Read More
Educational performance measurement through the identification and analysis of data extracted from learners’ activities can effectively result in the improvement of educational performance. In this Article, data of international learners was analyzed based on design science methodology and using data mining methods. In this regard, domestic and international research has been reviewed over the past decade and the academic and non-academic data of students were clustered into three categories: family, supportive, and academic behavior. After the validation of algorithms outputs and determining the number of optimal clusters in each category, clusters were labeled and analyzed. Analysis of labels presents the experience of success or failure of students and roots of effective performance in each cluster, and the labeling method proposed is a new and applicable method in most of the learning centers for segmenting and formulating the educational performance.
Hassan Rangriz; Zahra Bayrami Shahrivar
Abstract
With the expansion of the Internet, various tools have been used to communicate with customers in organizations, and organizations use different E-CRM methods to create competitive advantages. Since customer loyalty is critical to achieving competitive advantage and profitability for organizations, one ...
Read More
With the expansion of the Internet, various tools have been used to communicate with customers in organizations, and organizations use different E-CRM methods to create competitive advantages. Since customer loyalty is critical to achieving competitive advantage and profitability for organizations, one of the goals of organizations in using E-CRM is to maintain and increase customer loyalty. Therefore, considering the importance of the impact of various E-CRM services on customers’ loyalty, the purpose of this study is to investigate the impact of E-CRM on the loyalty of Bank Mellat customers using data mining techniques. The data required for this research were extracted from Bank Mellat databases. Data mining techniques include clustering with K-means algorithm and neural networks (using error-relay algorithm) and LRFM model through programming in MATLAB and Excel software were used to analyze the data. The results showed that with increasing use of E-CRM services, customers’ loyalty increases. The relationship between E-CRM, the components of LRFM model, and loyalty is a nonlinear and the change in loyalty as E-CRM changes is not a constant. The increase in loyalty is a function of LRFM components, the amount of E-CRM and weights obtained in the neural network.
Maryam Shoar; Ali Asghar Salarnezhad
Abstract
Given the high volume of web information, more attention has been paid to the automatic data extraction systems. One of the most important methods of data extraction is clustering. Today, many clustering methods are provided which are mostly based on vector models. In these models, each document ...
Read More
Given the high volume of web information, more attention has been paid to the automatic data extraction systems. One of the most important methods of data extraction is clustering. Today, many clustering methods are provided which are mostly based on vector models. In these models, each document is treated like a set of words, and the sequence of words in the sentence is ignored. Since the meanings in the natural language are completely dependent on the sequence of words, a great deal of shortcomings is observed in these methods. To overcome these shortcomings, this paper presents a new method for clustering HTML documents in which STC algorithm is considered for clustering snippets. This method, called clustering based on KS_STC key sentences, provides a weighted vector for each document and using this vector, the key sentences of each text are extracted from the document. Finally, these key sentences are given for clustering to the STC algorithm.
Mojtaba Salehi; Alireza Korde Katooli
Abstract
Credit risk interprets as the probability of obligations non-repayment by customer in due date is considered as one of causes financial institutions bankruptcy. For this purpose, data mining techniques such as neural networks, Decision Tree, Bayesian networks, Support Vector Machine is used for customer ...
Read More
Credit risk interprets as the probability of obligations non-repayment by customer in due date is considered as one of causes financial institutions bankruptcy. For this purpose, data mining techniques such as neural networks, Decision Tree, Bayesian networks, Support Vector Machine is used for customer segmentation to high-risk and low-risk groups. In this paper, we present the hybrid Imperialist Competitive optimization algorithm and neural network for increasing classification accuracy in evaluation and measurement credit risk of bank customers. The proposed method identifies the optimistic features and eliminates unnecessary features decreases problem dimension and increases classification accuracy. To validate this method, it implements on UCI dataset and also on a reality dataset of a private Iranian bank. The experimental results show this method is more satisfactory than other data mining techniques. The neural network error for the test set decreases with selection of effective features and elimination of low-impact features by the Binary Imperialist Competitive Optimization Algorithm. In addition test data error rate remains at acceptable level for other used classification methods. This article is the first use of algorithms Imperialist Competitive for credit risk assessment of bank customers.
Masoud Abessi; Elahe Hajigol Yazdi; Hassan Hoseini Nasab; Mohammad Bagher Fakhrzad
Volume 3, Issue 12 , September 2015, , Pages 1-20
Abstract
Learning logic of exceptions is a considerable challenge in data mining and knowledge discovery. Exceptions are the rare phenomenon with positive unusual behavior in a database. Creating an efficient framework to increase the reliability in the detection of exceptions in the knowledge and learning is ...
Read More
Learning logic of exceptions is a considerable challenge in data mining and knowledge discovery. Exceptions are the rare phenomenon with positive unusual behavior in a database. Creating an efficient framework to increase the reliability in the detection of exceptions in the knowledge and learning is quite important. This paper presents a novel framework to promote the confidence to a limited number of records (exceptions) for effective learning of exceptions. In this study, a new approach based on the abnormality theory and computing theory is presented to detect exceptional phenomena and learn their behavior. First, Renyi entropy function is implemented to detect exceptional data which is differentiated data according to their hidden knowledge. Then, the novel E-RISE algorithm which follows bottom-up learning strategy is introduced to learn exceptional data behavior. Efficiency of the proposed model is determined by applying it to the Iran stock market data. Mining the number of 1334 stocks data points, 2.6% of them had exceptional behavior. The extracted rules represent the exceptional stocks attitudes. After that, an expert system is designed to use the extracted knowledge for recognizing new exceptional stocks. Faced with new stock, this expert system can recognize exceptions by comparing its characteristics with normal and exceptional behavior. Exceptions behave in compliance with exceptional rules or in contradiction with any normal pattern. This acquisition knowledge is the basis of exceptional portfolio selection which aims to make exceptional wealth for investors. Findings of the proposed method are compared with the outcomes of applying traditional methods as decision tree and support vector machine which is considerable. The results show the capability of the proposed method in exceptional data detection and learning their behaviors.
Jamshid salehisadegheiani; Samaneh Sorournejad; Reza Ebrahimi Atani; Maryam Akhavan Kharazian; Mousa Rezvani Chamazamin
Volume 1, Issue 2 , December 2013, , Pages 147-162
Abstract
In recent years, a number of new payment solutions have been introduced in mobile commerce although with less success. The existence of standardized and widely accepted mobile payment (also known as MP) procedures is crucial for successful business-to-customer mobile commerce. On the other hand, Non-acceptance ...
Read More
In recent years, a number of new payment solutions have been introduced in mobile commerce although with less success. The existence of standardized and widely accepted mobile payment (also known as MP) procedures is crucial for successful business-to-customer mobile commerce. On the other hand, Non-acceptance of innovations has generally been attributed to the failure of laggards to keep up with the time. While our previous survey has investigated positive adoption decisions, this paper focuses on consumer resistance against innovation instead. In this paper we examine the conditions of resistance to the MP procedures by Iranian customers. In fact we attempts to discover causes that force an Iranian customer in particular to resist to innovation such as mobile payment. Based on the theory of innovation resistance and the mobile payment literature, we designed a questionnaire which is distributed amongst our understudy society of Ordinary Iranians. We employed data mining techniques in order to extract useful patterns from data, which describe the reasons of that refusal associated with different social and cultural levels of the society.