• DocumentCode
    3519012
  • Title

    A Mixture Language Model for Class-Attribute Mining from Biomedical Literature Digital Library

  • Author

    Zhou, Xiaohua ; Hu, Xiaohua ; Zhang, Xiaodan ; Wu, Daniel D. ; He, Tingting ; Luo, Aijing

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA
  • fYear
    2008
  • fDate
    3-5 Nov. 2008
  • Firstpage
    17
  • Lastpage
    22
  • Abstract
    We define and study a novel text mining problem for biomedical literature digital library, referred to as the class-attribute mining. Given a collection of biomedical literature from a digital library addressing a set of objects (e.g., proteins) and their descriptions (e.g., protein functions), the tasks of class-attribute mining include: (1) to identify and summarize latent classes in the space of objects, (2) to discover latent attribute themes in the space of object descriptions, and (3) to summarize the commonalities and differences among identified classes along each attribute theme. We approach this mining problem through a mixture language model and estimate the parameters of the model using the EM algorithm. We demonstrate the effectiveness of the model with an application called protein community identification and annotation from Medline, the largest biomedical literature digital library with more than 16 millions abstracts.
  • Keywords
    bioinformatics; data mining; digital libraries; expectation-maximisation algorithm; full-text databases; proteins; text analysis; EM algorithm; Medline; biomedical literature digital library; class-attribute mining; latent attribute theme discovery; latent class identification; latent class summary; mixture language model; object description space; object space; protein community identification and annotation; protein function; text mining problem; Abstracts; Bioinformatics; Context modeling; Data mining; Educational institutions; Information science; Parameter estimation; Proteins; Software libraries; Text mining; class attribute; clustering; language model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-0-7695-3452-7
  • Type

    conf

  • DOI
    10.1109/BIBM.2008.40
  • Filename
    4684867