Maryam Shoar; Ali Asghar Salarnezhad
Abstract
Given the high volume of web information, more attention has been paid to the automatic data extraction systems. One of the most important methods of data extraction is clustering. Today, many clustering methods are provided which are mostly based on vector models. In these models, each document ...
Read More
Given the high volume of web information, more attention has been paid to the automatic data extraction systems. One of the most important methods of data extraction is clustering. Today, many clustering methods are provided which are mostly based on vector models. In these models, each document is treated like a set of words, and the sequence of words in the sentence is ignored. Since the meanings in the natural language are completely dependent on the sequence of words, a great deal of shortcomings is observed in these methods. To overcome these shortcomings, this paper presents a new method for clustering HTML documents in which STC algorithm is considered for clustering snippets. This method, called clustering based on KS_STC key sentences, provides a weighted vector for each document and using this vector, the key sentences of each text are extracted from the document. Finally, these key sentences are given for clustering to the STC algorithm.
Mojtaba Salehi; Alireza Korde Katooli
Abstract
Credit risk interprets as the probability of obligations non-repayment by customer in due date is considered as one of causes financial institutions bankruptcy. For this purpose, data mining techniques such as neural networks, Decision Tree, Bayesian networks, Support Vector Machine is used for customer ...
Read More
Credit risk interprets as the probability of obligations non-repayment by customer in due date is considered as one of causes financial institutions bankruptcy. For this purpose, data mining techniques such as neural networks, Decision Tree, Bayesian networks, Support Vector Machine is used for customer segmentation to high-risk and low-risk groups. In this paper, we present the hybrid Imperialist Competitive optimization algorithm and neural network for increasing classification accuracy in evaluation and measurement credit risk of bank customers. The proposed method identifies the optimistic features and eliminates unnecessary features decreases problem dimension and increases classification accuracy. To validate this method, it implements on UCI dataset and also on a reality dataset of a private Iranian bank. The experimental results show this method is more satisfactory than other data mining techniques. The neural network error for the test set decreases with selection of effective features and elimination of low-impact features by the Binary Imperialist Competitive Optimization Algorithm. In addition test data error rate remains at acceptable level for other used classification methods. This article is the first use of algorithms Imperialist Competitive for credit risk assessment of bank customers.