Title :
Automatic Class Labeling for CiteSeerX
Author :
Kashireddy, Surya Dhairya ; Gauch, Susan ; Billah, Syed Masum
Author_Institution :
Comput. Sci. & Comput. Eng., Univ. of Arkansas, Fayetteville, AR, USA
Abstract :
The CiteSeerx project at the University of Arkansas uses a browsing interface is based on the Association for Computing Machinery´s Computing Classification System (ACM CCS). CCS contains just 369 categories whereas the CiteSeerx database contains over 2 million documents. This results in more than 6500 documents per category, far too many to browse. To address this problem, we are exploring ways to automatically expand the CCS ontology. Previous work has focused on using clustering to automatically identify the new classes. This work focuses on how to label the subclasses in a semantically meaningful way to that they can support user browsing. We develop methods based on text mining from the subclass members to extract class labels. We evaluate three methods by comparing the suggested labels with human-assigned labels for existing categories.
Keywords :
data analysis; data mining; database management systems; online front-ends; ontologies (artificial intelligence); pattern classification; text analysis; ACM CCS; Association for Computing Machinery Computing Classification System; CCS ontology; CiteSeerx project; CiteSeerx database; University of Arkansas; automatic class labeling; browsing interface; human-assigned labels; subclass members; text mining; user browsing; Clustering algorithms; Encyclopedias; Labeling; Ontologies; Programming; Semantic Web; Text mining; labeling; ontologies; text mining;
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4799-2902-3
DOI :
10.1109/WI-IAT.2013.35