DocumentCode :
1646026
Title :
Data mining the PIMA dataset using rough set theory with a special emphasis on rule reduction
Author :
Khan, Aurangieb ; Revett, Kenneth
Author_Institution :
Dept. of CIS, Luton Univ., UK
fYear :
2004
Firstpage :
334
Lastpage :
339
Abstract :
This paper describes how rough set theory can be utilized as a tool for analyzing relatively complex decision tables like the Pima Indian Diabetes Database (PIDD). We utilized Rosetta, a public domain implementation of rough sets on the PIDD in order to determine how we could generate a predictive rule set with the highest accuracy and the fewest number of rules. Having a reduced rule set is advantageous as it provides focus on the salient attributes and makes application in clinical practice more efficient (and likely). In this paper, we report the use of a genetic algorithm based rough set approach to classification of diabetes and achieved a success rate on the test set of 83%. This classification accuracy favors highly compared to other reported results, which ranged from 65% to 75%. In addition, we were able to achieve this accuracy with less than 100 rules. The high accuracy and low rule number provides support to the use of rough sets as a data mining tool in biological databases.
Keywords :
biology computing; data mining; database management systems; genetic algorithms; rough set theory; Pima Indian Diabetes Database; biological databases; data mining; genetic algorithm; predictive rule set; rough set theory; rule reduction; Computational Intelligence Society; Data mining; Databases; Diseases; Genetics; Medical diagnostic imaging; Neural networks; Rough sets; Set theory; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multitopic Conference, 2004. Proceedings of INMIC 2004. 8th International
Print_ISBN :
0-7803-8680-9
Type :
conf
DOI :
10.1109/INMIC.2004.1492899
Filename :
1492899
Link To Document :
بازگشت