Authors

1  Assistant Professor, Department of Industrial Engineering, Yazd University, Yazd, Iran

2 Ph.D. Candidate in Industrial Engineering, Department of Industrial Engineering, Yazd University, Yazd, Iran (Corresponding author)

3 Associate Professor, Department of Industrial Engineering, Yazd University, Yazd, Iran

4 Assistant Professor, Department of Industrial Engineering, Yazd University, Yazd, Iran

Abstract

Learning logic of exceptions is a considerable challenge in data mining and knowledge discovery. Exceptions are the rare phenomenon with positive unusual behavior in a database. Creating an efficient framework to increase the reliability in the detection of exceptions in the knowledge and learning is quite important. This paper presents a novel framework to promote the confidence to a limited number of records (exceptions) for effective learning of exceptions. In this study, a new approach based on the abnormality theory and computing theory is presented to detect exceptional phenomena and learn their behavior. First, Renyi entropy function is implemented to detect exceptional data which is differentiated data according to their hidden knowledge. Then, the novel E-RISE algorithm which follows bottom-up learning strategy is introduced to learn exceptional data behavior. Efficiency of the proposed model is determined by applying it to the Iran stock market data. Mining the number of 1334 stocks data points, 2.6% of them had exceptional behavior. The extracted rules represent the exceptional stocks attitudes. After that, an expert system is designed to use the extracted knowledge for recognizing new exceptional stocks. Faced with new stock, this expert system can recognize exceptions by comparing its characteristics with normal and exceptional behavior. Exceptions behave in compliance with exceptional rules or in contradiction with any normal pattern. This acquisition knowledge is the basis of exceptional portfolio selection which aims to make exceptional wealth for investors.  Findings of the proposed method are compared with the outcomes of applying traditional methods as decision tree and support vector machine which is considerable. The results show the capability of the proposed method in exceptional data detection and learning their behaviors. 

Keywords

 
Albanis G. Batchelor R. Combining heterogeneous classifiers for stock selection, Intelligent Systems in Accounting, Finance and Management, vol. 15, no. 1-2, pp. 1-27, 2007.
Burez J. Van den Poel D. Handling class imbalance in customer churn prediction, Expert Systems with Applications 36, 4626–4636, 2009
Califf M. E. Mooney R. J. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction, Journal of Machine Learning Research 4,177-210, 2003.
Cao L. Zhao Y. Zhang C. Mining Impact-Targeted Activity Patternsin Imbalanced Data, IEEE Transactions on knowledge and data engineering, Vol. 20, NO. 8, 2008.
Chawla N. V. Japkowicz N. lcz A. K. Editorial: Special Issue on Learning from Imbalanced Data Sets, Sigkdd Explorations, 6(1):1–6, 2004.
Chen M. C. Chen L. S. Hsu C. C. Zeng W. R. An information granulation based data mining approach for classifying imbalanced data, Information Sciences 178, 3214–3227, 2008.
Clark E. Exploiting stochastic dominance to generate abnormal stock returns, Journal of Financial Markets 20, 20–38, 2014.
Cover T. M. Thomas J. A. Entropy, Relative Entropy and Mutual Information; Elements of Information Theory, ISBN 0-471-06259-6-pp: 12-49, 1991.
Duong T. V. Bui H. H. Phung D. Q. Venkatesh S. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005.
García V. Sánchez J.S. Mollineda R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance, Knowledge-Based Systems 25, 13–21, 2012.
Gong R.S. A Segmentation and Re-balancing Approach for Classification of Imbalanced Data, PHD theses, University of Cincinnati, 2010.
Hoffman M. L. Moral internalization: Current theory and research, In L. Berkowitz (Ed.), Advances in experimental social psychology10, 85-133, 1977.
Hu D. H. Zhang X. X. Yin J. Zheng V. W. Yang Q. Abnormal Activity Recognition Based on HDP-HMM Models, the Twenty-First International Joint Conference on Artificial Intelligence, 2009.
Japkowicz, N., The class imbalance problem: Significance and strategies, the international conference on artificial intelligence: Special track on inductive learning, 2000.
Joshi M. V, Learning Classifier Models for Predicting Rare Phenomena, PhD thesis, University of Minnesota, Twin Cites, Minnesota, USA, 2002.
Kim Y. Sohn S.Y. Stock fraud detection using peer group analysis, Expert Systems with Applications 39, 8986–8992, 2012.
Kou Y, Abnormal Pattern Recognition in Spatial Data, PHD theses, Faculty of Virginia Polytechnic Institute and State University, 2006.
Li X. Rao F. Outlier Detection Using the Information Entropy of Neighborhood Rough Sets, Journal of Information & Computational Science, 3339–3350, 2012.
McCarthy J. Applications of circumscription to formalizing common-sense knowledge,Artificial Intelligence 28, 89-116, 1986.
Nagi J. An intelligent system for detection of non-technical losses in Tanaga National Berhad (TNB) Malaysia low voltage distribution network, PhD Thesis, Tenaga national university,2009.
QamarU.Automated Entropy Value Frequency (AEVF) Algorithm for OutlierDetection in Categorical Data, Recent Advances in Knowledge Engineering and Systems Science,28-35, 2011.
Reiter R. A Theory of Diagnosis from First Principles, Artificial Intelligence 32, 57-95, 1987.
Setyohadi D. B. Abu Bakar A. Othman Z.A. Rough K-means Outlier Factor Based on Entropy Computation, Research Journal of Applied Sciences, Engineering and Technology 8(3): 398-409, 2014.
Weiss G. Mining with rarity: A unifying framework. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,6(1):7–19, 2004.
Xiang T. Gong S. Video Behavior Profiling for Anomaly Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(5), 893–908, 2008.
 

 

Albanis G. Batchelor R. Combining heterogeneous classifiers for stock selection, Intelligent Systems in Accounting, Finance and Management, vol. 15, no. 1-2, pp. 1-27, 2007.
Burez J. Van den Poel D. Handling class imbalance in customer churn prediction, Expert Systems with Applications 36, 4626–4636, 2009
Califf M. E. Mooney R. J. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction, Journal of Machine Learning Research 4,177-210, 2003.
Cao L. Zhao Y. Zhang C. Mining Impact-Targeted Activity Patternsin Imbalanced Data, IEEE Transactions on knowledge and data engineering, Vol. 20, NO. 8, 2008.
Chawla N. V. Japkowicz N. lcz A. K. Editorial: Special Issue on Learning from Imbalanced Data Sets, Sigkdd Explorations, 6(1):1–6, 2004.
Chen M. C. Chen L. S. Hsu C. C. Zeng W. R. An information granulation based data mining approach for classifying imbalanced data, Information Sciences 178, 3214–3227, 2008.
Clark E. Exploiting stochastic dominance to generate abnormal stock returns, Journal of Financial Markets 20, 20–38, 2014.
Cover T. M. Thomas J. A. Entropy, Relative Entropy and Mutual Information; Elements of Information Theory, ISBN 0-471-06259-6-pp: 12-49, 1991.
Duong T. V. Bui H. H. Phung D. Q. Venkatesh S. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005.
García V. Sánchez J.S. Mollineda R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance, Knowledge-Based Systems 25, 13–21, 2012.
Gong R.S. A Segmentation and Re-balancing Approach for Classification of Imbalanced Data, PHD theses, University of Cincinnati, 2010.
Hoffman M. L. Moral internalization: Current theory and research, In L. Berkowitz (Ed.), Advances in experimental social psychology10, 85-133, 1977.
Hu D. H. Zhang X. X. Yin J. Zheng V. W. Yang Q. Abnormal Activity Recognition Based on HDP-HMM Models, the Twenty-First International Joint Conference on Artificial Intelligence, 2009.
Japkowicz, N., The class imbalance problem: Significance and strategies, the international conference on artificial intelligence: Special track on inductive learning, 2000.
Joshi M. V, Learning Classifier Models for Predicting Rare Phenomena, PhD thesis, University of Minnesota, Twin Cites, Minnesota, USA, 2002.
Kim Y. Sohn S.Y. Stock fraud detection using peer group analysis, Expert Systems with Applications 39, 8986–8992, 2012.
Kou Y, Abnormal Pattern Recognition in Spatial Data, PHD theses, Faculty of Virginia Polytechnic Institute and State University, 2006.
Li X. Rao F. Outlier Detection Using the Information Entropy of Neighborhood Rough Sets, Journal of Information & Computational Science, 3339–3350, 2012.
McCarthy J. Applications of circumscription to formalizing common-sense knowledge,Artificial Intelligence 28, 89-116, 1986.
Nagi J. An intelligent system for detection of non-technical losses in Tanaga National Berhad (TNB) Malaysia low voltage distribution network, PhD Thesis, Tenaga national university,2009.
QamarU.Automated Entropy Value Frequency (AEVF) Algorithm for OutlierDetection in Categorical Data, Recent Advances in Knowledge Engineering and Systems Science,28-35, 2011.
Reiter R. A Theory of Diagnosis from First Principles, Artificial Intelligence 32, 57-95, 1987.
Setyohadi D. B. Abu Bakar A. Othman Z.A. Rough K-means Outlier Factor Based on Entropy Computation, Research Journal of Applied Sciences, Engineering and Technology 8(3): 398-409, 2014.
Weiss G. Mining with rarity: A unifying framework. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,6(1):7–19, 2004.
Xiang T. Gong S. Video Behavior Profiling for Anomaly Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(5), 893–908, 2008.
 

 

Albanis G. Batchelor R. Combining heterogeneous classifiers for stock selection, Intelligent Systems in Accounting, Finance and Management, vol. 15, no. 1-2, pp. 1-27, 2007.
Burez J. Van den Poel D. Handling class imbalance in customer churn prediction, Expert Systems with Applications 36, 4626–4636, 2009
Califf M. E. Mooney R. J. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction, Journal of Machine Learning Research 4,177-210, 2003.
Cao L. Zhao Y. Zhang C. Mining Impact-Targeted Activity Patternsin Imbalanced Data, IEEE Transactions on knowledge and data engineering, Vol. 20, NO. 8, 2008.
Chawla N. V. Japkowicz N. lcz A. K. Editorial: Special Issue on Learning from Imbalanced Data Sets, Sigkdd Explorations, 6(1):1–6, 2004.
Chen M. C. Chen L. S. Hsu C. C. Zeng W. R. An information granulation based data mining approach for classifying imbalanced data, Information Sciences 178, 3214–3227, 2008.
Clark E. Exploiting stochastic dominance to generate abnormal stock returns, Journal of Financial Markets 20, 20–38, 2014.
Cover T. M. Thomas J. A. Entropy, Relative Entropy and Mutual Information; Elements of Information Theory, ISBN 0-471-06259-6-pp: 12-49, 1991.
Duong T. V. Bui H. H. Phung D. Q. Venkatesh S. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005.
García V. Sánchez J.S. Mollineda R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance, Knowledge-Based Systems 25, 13–21, 2012.
Gong R.S. A Segmentation and Re-balancing Approach for Classification of Imbalanced Data, PHD theses, University of Cincinnati, 2010.
Hoffman M. L. Moral internalization: Current theory and research, In L. Berkowitz (Ed.), Advances in experimental social psychology10, 85-133, 1977.
Hu D. H. Zhang X. X. Yin J. Zheng V. W. Yang Q. Abnormal Activity Recognition Based on HDP-HMM Models, the Twenty-First International Joint Conference on Artificial Intelligence, 2009.
Japkowicz, N., The class imbalance problem: Significance and strategies, the international conference on artificial intelligence: Special track on inductive learning, 2000.
Joshi M. V, Learning Classifier Models for Predicting Rare Phenomena, PhD thesis, University of Minnesota, Twin Cites, Minnesota, USA, 2002.
Kim Y. Sohn S.Y. Stock fraud detection using peer group analysis, Expert Systems with Applications 39, 8986–8992, 2012.
Kou Y, Abnormal Pattern Recognition in Spatial Data, PHD theses, Faculty of Virginia Polytechnic Institute and State University, 2006.
Li X. Rao F. Outlier Detection Using the Information Entropy of Neighborhood Rough Sets, Journal of Information & Computational Science, 3339–3350, 2012.
McCarthy J. Applications of circumscription to formalizing common-sense knowledge,Artificial Intelligence 28, 89-116, 1986.
Nagi J. An intelligent system for detection of non-technical losses in Tanaga National Berhad (TNB) Malaysia low voltage distribution network, PhD Thesis, Tenaga national university,2009.
QamarU.Automated Entropy Value Frequency (AEVF) Algorithm for OutlierDetection in Categorical Data, Recent Advances in Knowledge Engineering and Systems Science,28-35, 2011.
Reiter R. A Theory of Diagnosis from First Principles, Artificial Intelligence 32, 57-95, 1987.
Setyohadi D. B. Abu Bakar A. Othman Z.A. Rough K-means Outlier Factor Based on Entropy Computation, Research Journal of Applied Sciences, Engineering and Technology 8(3): 398-409, 2014.
Weiss G. Mining with rarity: A unifying framework. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,6(1):7–19, 2004.
Xiang T. Gong S. Video Behavior Profiling for Anomaly Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(5), 893–908, 2008.