Title :
Applying noise handling techniques to genomic data: a case study
Author_Institution :
Inst. for Human & Machine Cognition, Pensacola, FL, USA
Abstract :
Osteogenesis Imperfecta (OI) is a genetic collagenous disease associated with mutations in one or both of the genes COLIA1 and COLIA2. There are at least four known phenotypes of OI, of which type II is the severest and often lethal. We identified three approaches to noise handling, namely, robust algorithms, filtering, and polishing, and evaluated their effectiveness when applied to the problem of classifying the disease OI based on a data set of amino acid sequences and associated information of point mutations of COLIA1. Preliminary results suggest that each noise handling mechanism is useful under different circumstances. Filtering is stable across all cases. Pruning with robust c4.5 increased the classification accuracy in some cases, and polishing gave rise to some additional improvement in classifying the lethal OI phenotype.
Keywords :
data mining; diseases; genetics; information filters; medical computing; noise; proteins; Osteogenesis Imperfecta phenotype; amino acid sequence; filtering; genetic collagenous disease; genomic data; noise handling technique; robust algorithm; Amino acids; Bioinformatics; Bone diseases; Computer aided software engineering; Genetic mutations; Genomics; Humans; Information filtering; Information filters; Noise robustness;
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
DOI :
10.1109/ICDM.2003.1251022