Title :
The research of missing value estimation of gene sequence based on improved KNN
Author :
Qing, Cai ; Qingfeng, Wu ; Huailin, Dong ; Han, Liu
Author_Institution :
Software Sch., Xiamen Univ., Xiamen, China
Abstract :
Gene based data mining has been received wider and wider attention as gene carries genetic information of living creature. While mining gene information, one of the tasks is to estimate the missing values reasonably and effectively, so as to reflect the original information of gene sequence. By analyzing the theory of KNN (K nearest neighbor algorithm), an improved KNN for gene sequence was proposed, which resolves the problem of missing values while mining gene data. Results show the feasibility of the algorithm with experiments using data from genbank.
Keywords :
DNA; biology computing; data mining; genetics; sequences; DNA; K nearest neighbor algorithm; gene sequence; gene-based data mining; improved KNN; missing value estimation; Computer science; Computer science education; Data mining; Diseases; Gene therapy; Immune system; Medical treatment; Pattern recognition; Predictive models; Sequences; Gene sequence; KNN; Missing values;
Conference_Titel :
Computer Science & Education, 2009. ICCSE '09. 4th International Conference on
Conference_Location :
Nanning
Print_ISBN :
978-1-4244-3520-3
Electronic_ISBN :
978-1-4244-3521-0
DOI :
10.1109/ICCSE.2009.5228472