Title :
Text classification based on limited bibliographic metadata
Author :
Denecke, Kerstin ; Risse, Thomas ; Baehr, Thomas
Author_Institution :
L3S Res. Center, Hannover, Germany
Abstract :
In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document´s metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document based on identified core features. The system is evaluated on a real-world data set and the influence of different feature combinations and settings is studied. Although the available information is limited, the results show that the approach is capable to efficiently classify data items representing documents.
Keywords :
classification; learning (artificial intelligence); meta data; text analysis; author name; bibliographic metadata; conference names; digital item categorization; document category assignment; document metadata; item topic; journal titles; lexical resource; machine-learning classifier; text classification; title information; Automatic control; Data mining; Feature extraction; Information retrieval; Knowledge engineering; Machine learning; Pipelines; Software libraries; Space technology; Text categorization;
Conference_Titel :
Digital Information Management, 2009. ICDIM 2009. Fourth International Conference on
Conference_Location :
Ann Arbor, MI
Print_ISBN :
978-1-4244-4253-9
Electronic_ISBN :
978-1-4244-4254-6
DOI :
10.1109/ICDIM.2009.5356767