DocumentCode :
3394547
Title :
Improving feature representation of natural language gene functional annotations using automatic term expansion
Author :
He, Ji
Author_Institution :
Bioinf. Core Facility, Samuel Roberts Noble Found., Ardmore, OK
fYear :
2008
fDate :
15-17 Sept. 2008
Firstpage :
173
Lastpage :
179
Abstract :
Despite increasing work for describing gene functions using controlled vocabulary, natural language style gene functional annotations are most easily available and are most widely used by biologists. And intelligent analysis of these data in large scale is of great importance in the post-genome era. While the vector space model (VSM) based TF*IDF feature representation is widely adopted for text document analysis, it has significant limitations when applied to these data, primarily due to the high conciseness and high noisiness of the functional annotations. To improve TF*IDF feature representation, this paper proposes two automatic term expansion (ATE) methods based on query expansion (QE) in information retrieval (IR) theory. The effectiveness of ATE was examined through its application to the measurement of pattern proximity of gene functional annotations. Our comparative results show that ATE is effective in retrieving functionally correlated genes corresponding to a random query gene on this particular data type, and has the capability to produce more accurate measurement of the pattern similarity, with reference to genespsila biological functions.
Keywords :
bioinformatics; genetics; information retrieval; natural languages; text analysis; automatic term expansion; biological function; controlled vocabulary; feature representation; information retrieval theory; intelligent analysis; natural language gene functional annotation; pattern proximity; pattern similarity; query expansion; random query gene; text document analysis; vector space model; Automatic control; Biological control systems; Biological system modeling; Data analysis; Functional analysis; Information retrieval; Large-scale systems; Natural languages; Text analysis; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
Conference_Location :
Sun Valley, ID
Print_ISBN :
978-1-4244-1778-0
Electronic_ISBN :
978-1-4244-1779-7
Type :
conf
DOI :
10.1109/CIBCB.2008.4675775
Filename :
4675775
Link To Document :
بازگشت