رویکردی نوین به منظور کشف و تجزیه وتحلیل دانش پدیده های استثنایی با استفاده از داده کاوی

نویسندگان

1 استادیار گروه مهندسی صنایع، دانشکده فنی مهندسی، دانشگاه یزد، یزد، ایران

2 دانشجوی دکتری مهندسی صنایع، دانشکده فنی مهندسی، دانشگاه یزد، یزد، ایران

3 دانشیار گروه مهندسی صنایع، دانشکده فنی مهندسی، دانشگاه یزد، یزد، ایران

چکیده

 
منطق یادگیری ازاستثنائات چالشی قابل توجه در حوزه داده­کاوی است. استثنائات پدیده­های نادری هستند که رفتاری مثبت و متفاوت از الگوهای اصلی و مورد انتظار موجود در پایگاه­داده از خود بروز می دهند. ایجاد چارچوبی کارا برای افزایش اطمینان به پدیده­های استثنایی در کشف دانش و یادگیری موثر از آن حائز اهمیت است. در این پژوهش، الگویی بر اساس تئوری استثنائات و تئوری اطلاعات ارائه شده است تا چالش‌های پیش­روی داده‌کاوی داده‌های استثنایی را برطرف نماید. نخست از تابع آنتروپی رنی برای شناسایی استثنائات استفاده و سپس با بکارگیری رویکرد یادگیری پایین به بالا بر مبنای الگوریتم پیشنهادی RISE ارتقا یافته، قوانین حاکم بر بروز رفتار استثنایی استخراج می­گردد. به منظور تعیین کارایی مدل پیشنهادی، کشف سهام استثنایی و یادگیری رفتار آن­ها مورد بررسی قرار گرفته است. از مجموع 1334 سهم مورد بررسی 36 سهم رفتار استثنایی داشته اند که رفتار آن ها در قالب سه قانون مشخص شده است. ارجحیت نتایج حاصل از مدل پیشنهادی نسبت به نتایج بدست آمده از بکارگیری الگوریتم­های معمول یادگیری بیانگرکارایی مدل ارائه شده است. 

است. 

کلیدواژه‌ها


عنوان مقاله [English]

A New Approach for Exceptional Phenomena Knowledge Detection and Analysis by Data Mining

نویسندگان [English]

  • Masoud Abessi 1
  • Elahe Hajigol Yazdi 2
  • Hassan Hoseini Nasab 3
  • Mohammad Bagher Fakhrzad 1
1  Assistant Professor, Department of Industrial Engineering, Yazd University, Yazd, Iran
2 Ph.D. Candidate in Industrial Engineering, Department of Industrial Engineering, Yazd University, Yazd, Iran (Corresponding author)
3 Associate Professor, Department of Industrial Engineering, Yazd University, Yazd, Iran
چکیده [English]

Learning logic of exceptions is a considerable challenge in data mining and knowledge discovery. Exceptions are the rare phenomenon with positive unusual behavior in a database. Creating an efficient framework to increase the reliability in the detection of exceptions in the knowledge and learning is quite important. This paper presents a novel framework to promote the confidence to a limited number of records (exceptions) for effective learning of exceptions. In this study, a new approach based on the abnormality theory and computing theory is presented to detect exceptional phenomena and learn their behavior. First, Renyi entropy function is implemented to detect exceptional data which is differentiated data according to their hidden knowledge. Then, the novel E-RISE algorithm which follows bottom-up learning strategy is introduced to learn exceptional data behavior. Efficiency of the proposed model is determined by applying it to the Iran stock market data. Mining the number of 1334 stocks data points, 2.6% of them had exceptional behavior. The extracted rules represent the exceptional stocks attitudes. After that, an expert system is designed to use the extracted knowledge for recognizing new exceptional stocks. Faced with new stock, this expert system can recognize exceptions by comparing its characteristics with normal and exceptional behavior. Exceptions behave in compliance with exceptional rules or in contradiction with any normal pattern. This acquisition knowledge is the basis of exceptional portfolio selection which aims to make exceptional wealth for investors.  Findings of the proposed method are compared with the outcomes of applying traditional methods as decision tree and support vector machine which is considerable. The results show the capability of the proposed method in exceptional data detection and learning their behaviors. 

کلیدواژه‌ها [English]

  • Data mining
  • Abnormality Theory
  • Information Theory
  • E-RISE Learning Algorithm
  • Exceptional Phenomena
 

Albanis G. Batchelor R. Combining heterogeneous classifiers for stock selection, Intelligent Systems in Accounting, Finance and Management, vol. 15, no. 1-2, pp. 1-27, 2007.

Burez J. Van den Poel D. Handling class imbalance in customer churn prediction, Expert Systems with Applications 36, 4626–4636, 2009

Califf M. E. Mooney R. J. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction, Journal of Machine Learning Research 4,177-210, 2003.

Cao L. Zhao Y. Zhang C. Mining Impact-Targeted Activity Patternsin Imbalanced Data, IEEE Transactions on knowledge and data engineering, Vol. 20, NO. 8, 2008.

Chawla N. V. Japkowicz N. lcz A. K. Editorial: Special Issue on Learning from Imbalanced Data Sets, Sigkdd Explorations, 6(1):1–6, 2004.

Chen M. C. Chen L. S. Hsu C. C. Zeng W. R. An information granulation based data mining approach for classifying imbalanced data, Information Sciences 178, 3214–3227, 2008.

Clark E. Exploiting stochastic dominance to generate abnormal stock returns, Journal of Financial Markets 20, 20–38, 2014.

Cover T. M. Thomas J. A. Entropy, Relative Entropy and Mutual Information; Elements of Information Theory, ISBN 0-471-06259-6-pp: 12-49, 1991.

Duong T. V. Bui H. H. Phung D. Q. Venkatesh S. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005.

García V. Sánchez J.S. Mollineda R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance, Knowledge-Based Systems 25, 13–21, 2012.

Gong R.S. A Segmentation and Re-balancing Approach for Classification of Imbalanced Data, PHD theses, University of Cincinnati, 2010.

Hoffman M. L. Moral internalization: Current theory and research, In L. Berkowitz (Ed.), Advances in experimental social psychology10, 85-133, 1977.

Hu D. H. Zhang X. X. Yin J. Zheng V. W. Yang Q. Abnormal Activity Recognition Based on HDP-HMM Models, the Twenty-First International Joint Conference on Artificial Intelligence, 2009.

Japkowicz, N., The class imbalance problem: Significance and strategies, the international conference on artificial intelligence: Special track on inductive learning, 2000.

Joshi M. V, Learning Classifier Models for Predicting Rare Phenomena, PhD thesis, University of Minnesota, Twin Cites, Minnesota, USA, 2002.

Kim Y. Sohn S.Y. Stock fraud detection using peer group analysis, Expert Systems with Applications 39, 8986–8992, 2012.

Kou Y, Abnormal Pattern Recognition in Spatial Data, PHD theses, Faculty of Virginia Polytechnic Institute and State University, 2006.

Li X. Rao F. Outlier Detection Using the Information Entropy of Neighborhood Rough Sets, Journal of Information & Computational Science, 3339–3350, 2012.

McCarthy J. Applications of circumscription to formalizing common-sense knowledge,Artificial Intelligence 28, 89-116, 1986.

Nagi J. An intelligent system for detection of non-technical losses in Tanaga National Berhad (TNB) Malaysia low voltage distribution network, PhD Thesis, Tenaga national university,2009.

QamarU.Automated Entropy Value Frequency (AEVF) Algorithm for OutlierDetection in Categorical Data, Recent Advances in Knowledge Engineering and Systems Science,28-35, 2011.

Reiter R. A Theory of Diagnosis from First Principles, Artificial Intelligence 32, 57-95, 1987.

Setyohadi D. B. Abu Bakar A. Othman Z.A. Rough K-means Outlier Factor Based on Entropy Computation, Research Journal of Applied Sciences, Engineering and Technology 8(3): 398-409, 2014.

Weiss G. Mining with rarity: A unifying framework. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,6(1):7–19, 2004.

Xiang T. Gong S. Video Behavior Profiling for Anomaly Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(5), 893–908, 2008.

 

 

Albanis G. Batchelor R. Combining heterogeneous classifiers for stock selection, Intelligent Systems in Accounting, Finance and Management, vol. 15, no. 1-2, pp. 1-27, 2007.

Burez J. Van den Poel D. Handling class imbalance in customer churn prediction, Expert Systems with Applications 36, 4626–4636, 2009

Califf M. E. Mooney R. J. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction, Journal of Machine Learning Research 4,177-210, 2003.

Cao L. Zhao Y. Zhang C. Mining Impact-Targeted Activity Patternsin Imbalanced Data, IEEE Transactions on knowledge and data engineering, Vol. 20, NO. 8, 2008.

Chawla N. V. Japkowicz N. lcz A. K. Editorial: Special Issue on Learning from Imbalanced Data Sets, Sigkdd Explorations, 6(1):1–6, 2004.

Chen M. C. Chen L. S. Hsu C. C. Zeng W. R. An information granulation based data mining approach for classifying imbalanced data, Information Sciences 178, 3214–3227, 2008.

Clark E. Exploiting stochastic dominance to generate abnormal stock returns, Journal of Financial Markets 20, 20–38, 2014.

Cover T. M. Thomas J. A. Entropy, Relative Entropy and Mutual Information; Elements of Information Theory, ISBN 0-471-06259-6-pp: 12-49, 1991.

Duong T. V. Bui H. H. Phung D. Q. Venkatesh S. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005.

García V. Sánchez J.S. Mollineda R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance, Knowledge-Based Systems 25, 13–21, 2012.

Gong R.S. A Segmentation and Re-balancing Approach for Classification of Imbalanced Data, PHD theses, University of Cincinnati, 2010.

Hoffman M. L. Moral internalization: Current theory and research, In L. Berkowitz (Ed.), Advances in experimental social psychology10, 85-133, 1977.

Hu D. H. Zhang X. X. Yin J. Zheng V. W. Yang Q. Abnormal Activity Recognition Based on HDP-HMM Models, the Twenty-First International Joint Conference on Artificial Intelligence, 2009.

Japkowicz, N., The class imbalance problem: Significance and strategies, the international conference on artificial intelligence: Special track on inductive learning, 2000.

Joshi M. V, Learning Classifier Models for Predicting Rare Phenomena, PhD thesis, University of Minnesota, Twin Cites, Minnesota, USA, 2002.

Kim Y. Sohn S.Y. Stock fraud detection using peer group analysis, Expert Systems with Applications 39, 8986–8992, 2012.

Kou Y, Abnormal Pattern Recognition in Spatial Data, PHD theses, Faculty of Virginia Polytechnic Institute and State University, 2006.

Li X. Rao F. Outlier Detection Using the Information Entropy of Neighborhood Rough Sets, Journal of Information & Computational Science, 3339–3350, 2012.

McCarthy J. Applications of circumscription to formalizing common-sense knowledge,Artificial Intelligence 28, 89-116, 1986.

Nagi J. An intelligent system for detection of non-technical losses in Tanaga National Berhad (TNB) Malaysia low voltage distribution network, PhD Thesis, Tenaga national university,2009.

QamarU.Automated Entropy Value Frequency (AEVF) Algorithm for OutlierDetection in Categorical Data, Recent Advances in Knowledge Engineering and Systems Science,28-35, 2011.

Reiter R. A Theory of Diagnosis from First Principles, Artificial Intelligence 32, 57-95, 1987.

Setyohadi D. B. Abu Bakar A. Othman Z.A. Rough K-means Outlier Factor Based on Entropy Computation, Research Journal of Applied Sciences, Engineering and Technology 8(3): 398-409, 2014.

Weiss G. Mining with rarity: A unifying framework. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,6(1):7–19, 2004.

Xiang T. Gong S. Video Behavior Profiling for Anomaly Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(5), 893–908, 2008.

 

 

Albanis G. Batchelor R. Combining heterogeneous classifiers for stock selection, Intelligent Systems in Accounting, Finance and Management, vol. 15, no. 1-2, pp. 1-27, 2007.

Burez J. Van den Poel D. Handling class imbalance in customer churn prediction, Expert Systems with Applications 36, 4626–4636, 2009

Califf M. E. Mooney R. J. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction, Journal of Machine Learning Research 4,177-210, 2003.

Cao L. Zhao Y. Zhang C. Mining Impact-Targeted Activity Patternsin Imbalanced Data, IEEE Transactions on knowledge and data engineering, Vol. 20, NO. 8, 2008.

Chawla N. V. Japkowicz N. lcz A. K. Editorial: Special Issue on Learning from Imbalanced Data Sets, Sigkdd Explorations, 6(1):1–6, 2004.

Chen M. C. Chen L. S. Hsu C. C. Zeng W. R. An information granulation based data mining approach for classifying imbalanced data, Information Sciences 178, 3214–3227, 2008.

Clark E. Exploiting stochastic dominance to generate abnormal stock returns, Journal of Financial Markets 20, 20–38, 2014.

Cover T. M. Thomas J. A. Entropy, Relative Entropy and Mutual Information; Elements of Information Theory, ISBN 0-471-06259-6-pp: 12-49, 1991.

Duong T. V. Bui H. H. Phung D. Q. Venkatesh S. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005.

García V. Sánchez J.S. Mollineda R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance, Knowledge-Based Systems 25, 13–21, 2012.

Gong R.S. A Segmentation and Re-balancing Approach for Classification of Imbalanced Data, PHD theses, University of Cincinnati, 2010.

Hoffman M. L. Moral internalization: Current theory and research, In L. Berkowitz (Ed.), Advances in experimental social psychology10, 85-133, 1977.

Hu D. H. Zhang X. X. Yin J. Zheng V. W. Yang Q. Abnormal Activity Recognition Based on HDP-HMM Models, the Twenty-First International Joint Conference on Artificial Intelligence, 2009.

Japkowicz, N., The class imbalance problem: Significance and strategies, the international conference on artificial intelligence: Special track on inductive learning, 2000.

Joshi M. V, Learning Classifier Models for Predicting Rare Phenomena, PhD thesis, University of Minnesota, Twin Cites, Minnesota, USA, 2002.

Kim Y. Sohn S.Y. Stock fraud detection using peer group analysis, Expert Systems with Applications 39, 8986–8992, 2012.

Kou Y, Abnormal Pattern Recognition in Spatial Data, PHD theses, Faculty of Virginia Polytechnic Institute and State University, 2006.

Li X. Rao F. Outlier Detection Using the Information Entropy of Neighborhood Rough Sets, Journal of Information & Computational Science, 3339–3350, 2012.

McCarthy J. Applications of circumscription to formalizing common-sense knowledge,Artificial Intelligence 28, 89-116, 1986.

Nagi J. An intelligent system for detection of non-technical losses in Tanaga National Berhad (TNB) Malaysia low voltage distribution network, PhD Thesis, Tenaga national university,2009.

QamarU.Automated Entropy Value Frequency (AEVF) Algorithm for OutlierDetection in Categorical Data, Recent Advances in Knowledge Engineering and Systems Science,28-35, 2011.

Reiter R. A Theory of Diagnosis from First Principles, Artificial Intelligence 32, 57-95, 1987.

Setyohadi D. B. Abu Bakar A. Othman Z.A. Rough K-means Outlier Factor Based on Entropy Computation, Research Journal of Applied Sciences, Engineering and Technology 8(3): 398-409, 2014.

Weiss G. Mining with rarity: A unifying framework. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,6(1):7–19, 2004.

Xiang T. Gong S. Video Behavior Profiling for Anomaly Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(5), 893–908, 2008.