• DocumentCode
    3394547
  • Title

    Improving feature representation of natural language gene functional annotations using automatic term expansion

  • Author

    He, Ji

  • Author_Institution
    Bioinf. Core Facility, Samuel Roberts Noble Found., Ardmore, OK
  • fYear
    2008
  • fDate
    15-17 Sept. 2008
  • Firstpage
    173
  • Lastpage
    179
  • Abstract
    Despite increasing work for describing gene functions using controlled vocabulary, natural language style gene functional annotations are most easily available and are most widely used by biologists. And intelligent analysis of these data in large scale is of great importance in the post-genome era. While the vector space model (VSM) based TF*IDF feature representation is widely adopted for text document analysis, it has significant limitations when applied to these data, primarily due to the high conciseness and high noisiness of the functional annotations. To improve TF*IDF feature representation, this paper proposes two automatic term expansion (ATE) methods based on query expansion (QE) in information retrieval (IR) theory. The effectiveness of ATE was examined through its application to the measurement of pattern proximity of gene functional annotations. Our comparative results show that ATE is effective in retrieving functionally correlated genes corresponding to a random query gene on this particular data type, and has the capability to produce more accurate measurement of the pattern similarity, with reference to genespsila biological functions.
  • Keywords
    bioinformatics; genetics; information retrieval; natural languages; text analysis; automatic term expansion; biological function; controlled vocabulary; feature representation; information retrieval theory; intelligent analysis; natural language gene functional annotation; pattern proximity; pattern similarity; query expansion; random query gene; text document analysis; vector space model; Automatic control; Biological control systems; Biological system modeling; Data analysis; Functional analysis; Information retrieval; Large-scale systems; Natural languages; Text analysis; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
  • Conference_Location
    Sun Valley, ID
  • Print_ISBN
    978-1-4244-1778-0
  • Electronic_ISBN
    978-1-4244-1779-7
  • Type

    conf

  • DOI
    10.1109/CIBCB.2008.4675775
  • Filename
    4675775