DocumentCode :
2675360
Title :
Validity of cluster technique for genome expression data
Author :
Zhang, Xiao ; Li, Aichen ; Zhang, You ; Xiao, Yongpeng
Author_Institution :
Sch. of Comput. Sci. & Inf. Technol., Northeast Normal Univ., Changchun, China
fYear :
2012
fDate :
23-25 May 2012
Firstpage :
3737
Lastpage :
3741
Abstract :
With the rapid development of the database technology and the wide application of DBMS, people have more and more data containing a great amount of valuable information. People want to deepen the analysis of the data, which helps people make batter use of these data information. Now Database System can realize data input, search and statistics, etc, but it cant forecast the development trend of future data stored in the database. The short of measures to mine knowledge hiding behind the data results in the phenomenon which is that there is a large amount of data but poor knowledge. In the era of computer network, it has been a focus of attention that How we obtain knowledge from large data effectively and rapidly. The abilities of data acquiring has been increasingly incompatible with the abilities of data analysis, so an automatic technology that can process data in a deeper level is needed. Data Mining is such a technology. As an important branch of data mining, clustering analysis, which can be an independent data-mining tool or preprocessing procedures of other data-mining algorithms, is attracting wide attention. Clustering is an unsupervised classification, and it is an important method with which people know the society and nature. As one of the most important components of data mining, clustering has been widely used in biological science. Several clustering algorithms have been suggested to analyse genome expression data, but fewer solutions have been implemented to guide the design of clustering-based experiments and assess the quality of their outcomes. A cluster validity framework provides insights into the problem of predicting the correct the number of clusters. This paper presents several validation techniques for gene expression data analysis. Normalization and validity aggregation strategies are proposed to improve the prediction about the number of relevant clusters. The results obtained indicate that this systematic evaluation approach may signif- cantly support genome expression analyses for knowledge discovery applications.
Keywords :
biology computing; data analysis; data encapsulation; data mining; database management systems; genetics; genomics; pattern clustering; DBMS; biological science; cluster technique; clustering analysis; computer network; data information; data mining algorithm; data mining tool; database system; database technology; gene expression data analysis; genome expression data; knowledge discovery; knowledge hiding; normalization strategies; unsupervised classification; validity aggregation strategies; Bioinformatics; Cancer; Classification algorithms; Data mining; Gene expression; Genomics; Indexes; Cluster validation; Clustering; Genome expression; Genomic data mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control and Decision Conference (CCDC), 2012 24th Chinese
Conference_Location :
Taiyuan
Print_ISBN :
978-1-4577-2073-4
Type :
conf
DOI :
10.1109/CCDC.2012.6244599
Filename :
6244599
Link To Document :
بازگشت