DocumentCode
2717974
Title
Text classification based on limited bibliographic metadata
Author
Denecke, Kerstin ; Risse, Thomas ; Baehr, Thomas
Author_Institution
L3S Res. Center, Hannover, Germany
fYear
2009
fDate
1-4 Nov. 2009
Firstpage
1
Lastpage
6
Abstract
In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document´s metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document based on identified core features. The system is evaluated on a real-world data set and the influence of different feature combinations and settings is studied. Although the available information is limited, the results show that the approach is capable to efficiently classify data items representing documents.
Keywords
classification; learning (artificial intelligence); meta data; text analysis; author name; bibliographic metadata; conference names; digital item categorization; document category assignment; document metadata; item topic; journal titles; lexical resource; machine-learning classifier; text classification; title information; Automatic control; Data mining; Feature extraction; Information retrieval; Knowledge engineering; Machine learning; Pipelines; Software libraries; Space technology; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management, 2009. ICDIM 2009. Fourth International Conference on
Conference_Location
Ann Arbor, MI
Print_ISBN
978-1-4244-4253-9
Electronic_ISBN
978-1-4244-4254-6
Type
conf
DOI
10.1109/ICDIM.2009.5356767
Filename
5356767
Link To Document