• DocumentCode
    2717974
  • Title

    Text classification based on limited bibliographic metadata

  • Author

    Denecke, Kerstin ; Risse, Thomas ; Baehr, Thomas

  • Author_Institution
    L3S Res. Center, Hannover, Germany
  • fYear
    2009
  • fDate
    1-4 Nov. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document´s metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document based on identified core features. The system is evaluated on a real-world data set and the influence of different feature combinations and settings is studied. Although the available information is limited, the results show that the approach is capable to efficiently classify data items representing documents.
  • Keywords
    classification; learning (artificial intelligence); meta data; text analysis; author name; bibliographic metadata; conference names; digital item categorization; document category assignment; document metadata; item topic; journal titles; lexical resource; machine-learning classifier; text classification; title information; Automatic control; Data mining; Feature extraction; Information retrieval; Knowledge engineering; Machine learning; Pipelines; Software libraries; Space technology; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2009. ICDIM 2009. Fourth International Conference on
  • Conference_Location
    Ann Arbor, MI
  • Print_ISBN
    978-1-4244-4253-9
  • Electronic_ISBN
    978-1-4244-4254-6
  • Type

    conf

  • DOI
    10.1109/ICDIM.2009.5356767
  • Filename
    5356767