DocumentCode
3394547
Title
Improving feature representation of natural language gene functional annotations using automatic term expansion
Author
He, Ji
Author_Institution
Bioinf. Core Facility, Samuel Roberts Noble Found., Ardmore, OK
fYear
2008
fDate
15-17 Sept. 2008
Firstpage
173
Lastpage
179
Abstract
Despite increasing work for describing gene functions using controlled vocabulary, natural language style gene functional annotations are most easily available and are most widely used by biologists. And intelligent analysis of these data in large scale is of great importance in the post-genome era. While the vector space model (VSM) based TF*IDF feature representation is widely adopted for text document analysis, it has significant limitations when applied to these data, primarily due to the high conciseness and high noisiness of the functional annotations. To improve TF*IDF feature representation, this paper proposes two automatic term expansion (ATE) methods based on query expansion (QE) in information retrieval (IR) theory. The effectiveness of ATE was examined through its application to the measurement of pattern proximity of gene functional annotations. Our comparative results show that ATE is effective in retrieving functionally correlated genes corresponding to a random query gene on this particular data type, and has the capability to produce more accurate measurement of the pattern similarity, with reference to genespsila biological functions.
Keywords
bioinformatics; genetics; information retrieval; natural languages; text analysis; automatic term expansion; biological function; controlled vocabulary; feature representation; information retrieval theory; intelligent analysis; natural language gene functional annotation; pattern proximity; pattern similarity; query expansion; random query gene; text document analysis; vector space model; Automatic control; Biological control systems; Biological system modeling; Data analysis; Functional analysis; Information retrieval; Large-scale systems; Natural languages; Text analysis; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
Conference_Location
Sun Valley, ID
Print_ISBN
978-1-4244-1778-0
Electronic_ISBN
978-1-4244-1779-7
Type
conf
DOI
10.1109/CIBCB.2008.4675775
Filename
4675775
Link To Document