Data science, intelligence and future analysis
Abbas Bagherian Kasgari; Iman Raeesi Vanani; Maghsoud Amiri; Saeid Homayoun
Abstract
Most traditional fraud detection systems primarily focus on financial criteria to identify financial fraud, often overlooking the potential for fraudulent companies to engage in various types of non-financial misconduct. Recent studies have predominantly highlighted the significance of financial data ...
Read More
Most traditional fraud detection systems primarily focus on financial criteria to identify financial fraud, often overlooking the potential for fraudulent companies to engage in various types of non-financial misconduct. Recent studies have predominantly highlighted the significance of financial data as the sole indicator of fraud, neglecting the exploration of non-financial or Environmental, Social, and Governance (ESG) metrics as supplementary predictors. This research aims to enhance fraud prediction by integrating financial and ESG data through sophisticated machine learning and deep learning models. It examines the effectiveness of supervised machine learning and deep learning algorithms in detecting financial fraud over a 10-year period ending in 1401. This study innovatively demonstrates that a hybrid model, which combines financial and non-financial criteria, yields superior predictive accuracy for financial fraud than models based solely on financial data. The results of this study, addressing the first research question, indicate that among various machine learning and deep learning algorithms, the classification or bagging algorithm demonstrated superior efficiency. Furthermore, in response to the second research question, it was found that the dataset encompassing all features—integrating both financial and non-financial data—outperformed those datasets limited to either financial or non-financial data alone. The research results indicated that the bagging machine learning algorithms act the best with combined feature set including financial and ESG metrics combined. The adoption of the proposed model significantly improves the accuracy and effectiveness of fraud detection systems.
Data science, intelligence and future analysis
Yaqub Ahmadlou; Alireza pourebrahimi; jafar tanha; Ali Rajabzadeh Ghatari
Abstract
Fraud cases have increased in recent years, especially in important and sensitive financial and insurance fields. Therefore, to deal with such frauds, there is a need for different measures than traditional inspection methods. Agricultural insurance is also not exempted from this threat due to its nature ...
Read More
Fraud cases have increased in recent years, especially in important and sensitive financial and insurance fields. Therefore, to deal with such frauds, there is a need for different measures than traditional inspection methods. Agricultural insurance is also not exempted from this threat due to its nature and wide extent and every year a lot of money is spent on paying fake damages. This research was presented with the aim of providing a model to discover unrealistic damage claims in agricultural insurance by using data mining and machine learning techniques. It was used to build a deep learning model. The data used was obtained from the Agricultural Insurance Fund and related to wet and rainfed wheat insurance policies of Khuzestan province, for which compensation was paid in the 2018-2019 crop year. After preparing and preprocessing the data, using deep learning to discover unusual cases, the action and results were evaluated by the experts of the Agricultural Insurance Fund. After analyzing the results, it was found that 1% of the damages paid were related to unrealistic requests and more care should be taken in paying the damages. The accuracy of the model in detecting unusual cases for wet and dry wheat was 53.53 and 63.37 percent, respectively. In the review of the results, it was found that 5 categories of unusual behavior have led to the payment of unrealistic damages, and the behavior of not providing damage documentation was more frequent than the others.IntroductionInsurance fraud refers to the immoral act of committing a crime with the intention of abusing an insurance policy to obtain illegal profit from an insurance company; In general, insurance is made to protect the assets and business of individuals or organizations against financial loss and may occur at any stage of the insurance process by anyone such as customers or fraudulent agents (Al -Hashedi & Magalingam, 2021). Insurance fraud not only reduces the profit of the insurance company and leads to major losses, but also affects the pricing strategy of the insurance company and its socio-economic benefits in the long term (Yaram, 2016). Every year, significant sums of money are defrauded from the insurance industry, but not all of them are discovered. According to the statistics published by the Insurance Anti-Fraud Coalition, an amount of about eighty billion dollars is added to customers' expenses in the United States through fraud, and they must compensate for the amount of fraud by paying higher insurance premiums in the following year (Fraud statistics, 2020). In Iran, there is no accurate estimate of the amount of compensations paid to unreal damage claims or any other fraud, and one of the goals of this research is to estimate the amount of fraud in wheat crop insurance using deep learning. Research Question(s)This research seeks to find answers to these questions: In rainfed and irrigated wheat crop insurance, what percentage of the paid compensations are related to unrealistic and fictitious damage claims, and what is the accuracy of deep learning detection for this purpose?Literature ReviewGhahari et al. (2019) in their study investigated the use of deep learning in predicting agricultural performance in time and space with unstable weather conditions. They compared the performance of machine learning next to weather stations with conventional methods. Their findings showed that deep learning provides the highest prediction accuracy compared to other approaches. It can also be inferred from this result that the use of deep learning can play a role in reducing agricultural insurance costs by knowing the exact measures of crop yield (Newlands et al., 2019). Gomez et al. (2021) presented a new deep learning method to gain pragmatic insight into the behavior of an insured individual using the unsupervised effective variable. Their proposed method can be used in the fields of pension insurance, investment and other broader areas of the insurance industry. Their proposed method enables auto encoder and variable auto encoder to be used in semi-supervised/unsupervised effective variable analysis to identify cheating agents (Gomes et al., 2021). Xia et al. (2022) in their study proposed a deep learning model to detect car insurance fraud by combining convolutional neural network, long-term and short-term memory, and deep neural network. In their proposed method, more abstract features were extracted and helped the experts in the complex process of feature extraction which is very critical in traditional machine learning algorithms. The results of the experiments showed that their method can effectively improve the accuracy of car insurance fraud detection.MethodologyThe current research method is practical from the point of view of the objective and is data-oriented from the point of view of its nature. For machine learning modeling, the standard CRISP process has been used, which includes the stages of data collection, data preparation and preprocessing, modeling and model evaluation, and obtaining results. Figure 1 shows the general process of anomaly detection and analysis.Figure 1. Anomaly detection process framework In this research, the data related to one agricultural year of wet and dry wheat crop were obtained from the Agricultural Insurance Fund. The national code of the insurers has been removed from the data set to maintain confidentiality. The extracted data is related to the crop insurance policies of wet and rainfed wheat for the crop year 2018-2019 of Khuzestan province. In this crop year, compensation has been paid for these insurance policies according to the claim of the damage they had, in other words, the data set includes those insurance policies of wet and dry wheat whose product is damage Seen and compensated for them. The data were obtained from the comprehensive system of the insurance fund in the form of a csv report. The obtained data set had 23 features.ConclusionThe results of the research show that in wheat insurance, about 1% of the compensations paid are allocated to unrealistic claims, so they need to be further investigated by experts before payment. This amount of compensations paid to unrealistic claims was close to the prediction of insurance fund inspection experts who stated that about 1.5% of claims are unrealistic. Also, according to the results, 5 categories of behavior or methods were identified in the beneficiaries to receive compensation for unrealistic claims, which are mentioned below:Lack of sufficient documentation to prove the damage: This means that the necessary documents that should be uploaded in the system according to the implementation methods are not available or some of them have not been uploaded. Payment of compensation without the existence of documents indicating the occurrence of damage can be caused by the negligence or collusion of the appraiser or broker with the insured.The documents are not in accordance with the declared damage: the documents uploaded in the system according to the relevant instructions do not show the occurrence of the type of registered damage. For example, the speed of storm damage is mentioned as 50 km/h, but in meteorological documents it is 15 km/h.The damage documentation is not true: for example, in some documents, the risk factor is mentioned in the expert form of drought, but the picture sent shows flood damage. In this case, it is probably due to negligence. In another possibility, it is also possible to send the image of damaged agricultural land instead of healthy agricultural land. Non-observance of the damage notification period: According to the executive instructions of the insurance fund, the time limit for the declaration of damage until the time of payment of compensation is one month. Outside of that, it is against the instructions. Sometimes it was observed that the damage had been declared before the accident. The date of damage does not match with the time of its announcement: according to the executive instructions of the insurance fund, in the case of damage to agriculture, the visit must be done one week after the occurrence of the damage; before removing the damage, the type and amount of the damage should be carefully checked. In some cases, it was observed that the announcement date was recorded one month after the damage occurred. It is clear that after removing the effects of damage, the payment of compensation can seem suspicious because there may not have been any damage in the past.Keywords: Anomaly Detection, Crop Insurance, Deep Learning, Auto Encoder.
Data science, intelligence and future analysis
Monireh Hosseini; Elnaz Galavi
Abstract
Community detection is an important topic for social network analysis and is also essential to understanding complex networks structure. In community detection, the goal is to determine the groups in which the group nodes are densely connected to each other. In this research, deep learning techniques ...
Read More
Community detection is an important topic for social network analysis and is also essential to understanding complex networks structure. In community detection, the goal is to determine the groups in which the group nodes are densely connected to each other. In this research, deep learning techniques have been used to control graph data with high dimensions, while presenting a comprehensive and integrated architecture of community recognition methods with deep learning. Community detection classic approaches are suitable for networks with low dimensions. Therefore, the reduction of complex network dimensions is counted as a significant topic in community detection. In this paper, in order to reveal the direct and indirect connections among nodes, first a new similarity matrix of network topology is built. Then, a stacked auto-encoder is designed to decrease dimensions based on unsupervised learning. In order to detect communities, various clustering algorithms are then tested and utilized. Evaluation of the proposed research model is performed by surveying various experiments on standard criteria and six real data sets of Karate, Dolphins, Football, Polbooks, Cora and Citeseer. The proposed method evaluation outcomes show a higher accuracy in the identification of communities in the football data set compared to the twelve proposed algorithms used in past researches, and show a significant improvement in other data sets compared to the thirteen algorithms.
Introduction
Today, due to the increasing use of the Internet, social networks have found an important role in the real life of people. In social networks, some nodes are more connected than the entire network nodes, which are called communities(Sperli, 2019). Community Detection is an important topic for social network analysis and is also essential to understanding complex network structure In community detection, the goal is to determine the groups in which the group nodes are densely connected.
There are many methods for community detection, but deep learning has shown excellent performance in a wide range of research fields, such as social networks, graph embedding, etc.
In this research, deep learning techniques have been used to control graph data with high dimensions, while presenting a comprehensive and integrated architecture of community detection methods with deep learning.
Research Questions
Is it possible to create a new similarity matrix from the graph of complex networks that fully reveals the similarity relationships between network nodes?
What is the appropriate method of deep learning to represent the features of complex networks in low dimensions?
Is it possible to provide a suitable framework with model flexibility for networks of different sizes for community detection using the deep learning method?
Can more accurate clustering results be achieved for community detection?
Literature Review
2.1.Community detection classic approaches are suitable for networks with low dimensions. Therefore, the reduction of complex network dimensions is counted as a significant topic in community detection. The disadvantage of the high-dimensional network is the huge computational costs incurred by community detection methods. Therefore, a method is needed to transform high-dimensional graphs into a lower-dimensional space, where important information about network structure and node properties is still preserved. According to past research, autoencoders are the dominant method for mapping data points in lower-dimensional spaces (Souravlas et al, 2021).
2.2.To display the network, using the proximity matrix as the network similarity matrix can describe the similarity relationship between the nodes in the network. But the relationship between nodes in a social network is complex. On the other hand, in addition to the similarity between nodes that are directly connected, there are different degrees of similarity between nodes that are not directly connected (Su et al., 2020).
2.3. Wu et al. (2020) and Geng et al. (2020) reconstructed the adjacency matrix to represent the network. Dhilber and Bhavani (2020) used a cubic matrix for the input of the stack autoencoders, as did the work of Yang et al. (2016). Xie et al. (2018) first proposed a new representation of network similarity and then fed it with a sparse filtering model to extract meaningful features of network nodes. But in addition to the problem of lack of neighbor information in the proximity matrix based on Su et al.'s (2020) research, using only one function to check the similarity between nodes cannot fully reveal the topological information of the network. Therefore, a similarity matrix should be presented that can solve the proposed gaps.
Methodology
In this paper, to reveal the direct and indirect connections among nodes, first, a new similarity matrix of network topology is built. To construct the new similarity matrix, two matrices are used, i.e. proximity matrix and S∅rensen–Dice's (S∅) similarity matrix in Xie et al. (2018) 's research. In the next step to extract low-dimensional graph features, the new similarity matrix is given as input to the stack autoencoder networks, which have several hidden layers for unsupervised training. Then, using the newly learned features that are in the low-dimensional matrix with the help of K-means, DBSCAN, and SNNDPC clustering algorithms, communities are detected.
Conclusion
Evaluation of the proposed research model is performed by surveying various experiments on standard criteria and six real data sets of Karate, Dolphins, Football, Polbooks, Cora, and Citeseer. The proposed method evaluation outcomes show a higher accuracy in the detection of communities in the football data set compared to the twelve proposed algorithms used in past research and show a significant improvement in other data sets compared to the thirteen algorithms. In addition to these cases, the superiority of the similarity matrix used in this research was proved as a key prerequisite for community detection.
Keywords: Community Detection, Deep Learning, Autoencoder, Complex Networks.
Armina Mohseni; ameneh khadivar; Fatemeh Abbasi
Abstract
The growth of the Internet, social networks and e-commerce websites provide a platform for users to express their opinions. In recent years, many users have expressed their positive or negative opinions about food, service, and quality and restaurant atmosphere online. These comments are very important ...
Read More
The growth of the Internet, social networks and e-commerce websites provide a platform for users to express their opinions. In recent years, many users have expressed their positive or negative opinions about food, service, and quality and restaurant atmosphere online. These comments are very important for the decision of other users as well as restaurants to maintain quality, product development and their brand. Sentiment analysis is a natural language processing approach and allows systematic analysis of users' opinions. Due to the importance of this issue, the purpose of this study is to present a model for analyzing the sentiment of TripAdvisor's comments about Iranian restaurants. In this research, we propose an aspect-based sentiment analysis based on a deep learning algorithm which is the standard long short-term memory neural network to extract users' sentiments about restaurants. To teach the model, 4000 comments were labeled according to four aspects in three classes of not related, positive and negative, and the study steps were done based on Crisp methodology. Accuracy for food, service, value and atmosphere were 82%, 86%, 87% and 81%, respectively. These results indicate the efficiency and acceptable performance of the model for aspect-based sentiment analysis of restaurants. Furthermore, food and atmosphere are the most important aspects for the customers of Iranian restaurants, respectively. Restaurant owners can use the developed model to gain a competitive advantage and find their strengths and weaknesses.