DocumentCode :
1611935
Title :
Miner for OACCR: Case of medical data analysis in knowledge discovery
Author :
Ali, Sufian H.
Author_Institution :
Dept. of Software, Univ. of Babylon, Hilla, Iraq
fYear :
2012
Firstpage :
962
Lastpage :
975
Abstract :
Modern scientific data consist of huge datasets which gathered by a very large number of techniques and stored in much diversified and often incompatible data repositories as data of bioinformatics, geoinformatics, astroinformatics and Scientific World Wide Web. At the other hand, lack of reference data is very often responsible for poor performance of learning where one of the key problems in supervised learning is due to the insufficient size of the training dataset. Therefore, we try to suggest a new development a theoretically and practically valid tool for analyzing small of sample data remains a critical and challenging issue for researches. This paper presents a methodology for Obtaining Accurate and Comprehensible Classification Rules (OACCR) of both small and huge datasets with the use of hybrid techniques represented by knowledge discovering. In this article the searching capability of a Genetic Programming Data Construction Method (GPDCM) has been exploited for automatically creating more visual samples from the original small dataset. Add to that, this paper attempts to developing Random Forest data mining algorithm to handle missing value problem. Then database which describes depending on their components were built by Principle Component Analysis (PCA), after that, association rule algorithm to the FP-Growth algorithm (FP-Tree) was used. At the last, TreeNet classifier determines the class under which each association rules belongs to was used. The proposed methodology provides fast, Accurate and comprehensible classification rules. Also, this methodology can be use to compression dataset in two dimensions (number of features, number of records).
Keywords :
data mining; genetic algorithms; medical administrative data processing; OACCR; TreeNet classifier; astroinformatics; bioinformatics; data mining algorithm; datasets; genetic programming data construction method; geoinformatics; hybrid techniques; knowledge discovery; medical data analysis; obtaining accurate and comprehensible classification rules; principle component analysis; scientific World Wide Web; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Data mining; Databases; Training; Vegetation; Adboosting; FP-Growth; GPDCM; PCA; Random Forest;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2012 6th International Conference on
Conference_Location :
Sousse
Print_ISBN :
978-1-4673-1657-6
Type :
conf
DOI :
10.1109/SETIT.2012.6482043
Filename :
6482043
Link To Document :
بازگشت