DocumentCode :
2717974
Title :
Text classification based on limited bibliographic metadata
Author :
Denecke, Kerstin ; Risse, Thomas ; Baehr, Thomas
Author_Institution :
L3S Res. Center, Hannover, Germany
fYear :
2009
fDate :
1-4 Nov. 2009
Firstpage :
1
Lastpage :
6
Abstract :
In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document´s metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document based on identified core features. The system is evaluated on a real-world data set and the influence of different feature combinations and settings is studied. Although the available information is limited, the results show that the approach is capable to efficiently classify data items representing documents.
Keywords :
classification; learning (artificial intelligence); meta data; text analysis; author name; bibliographic metadata; conference names; digital item categorization; document category assignment; document metadata; item topic; journal titles; lexical resource; machine-learning classifier; text classification; title information; Automatic control; Data mining; Feature extraction; Information retrieval; Knowledge engineering; Machine learning; Pipelines; Software libraries; Space technology; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information Management, 2009. ICDIM 2009. Fourth International Conference on
Conference_Location :
Ann Arbor, MI
Print_ISBN :
978-1-4244-4253-9
Electronic_ISBN :
978-1-4244-4254-6
Type :
conf
DOI :
10.1109/ICDIM.2009.5356767
Filename :
5356767
Link To Document :
بازگشت