DocumentCode :
1797443
Title :
Tagging documents using neural networks based on local word features
Author :
Azcarraga, Amulfo P. ; Tensuan, Paolo ; Setiono, Rudy
Author_Institution :
Coll. of Comput. Studies, De La Salle Univ., Manila, Philippines
fYear :
2014
fDate :
6-11 July 2014
Firstpage :
724
Lastpage :
731
Abstract :
Keywords and key-phrases that concisely represent text documents are integral to many knowledge management and text information retrieval systems, as well as digital libraries in general. Not all text documents, however, are annotated with good keywords; and the quality of these keywords is often dependent on a tedious, sometimes manual, extraction and tagging process. To automatically extract high quality keywords without the need for a semantic analysis of the document, it is shown that artificial neural networks (ANN) can be trained to only consider in-document word features such as word frequency, word distribution in document, use of word in special parts of the document, and use of word formatting features (i.e. bold-faced, italicized, large-font size). Results show that purely local features are adequate in determining whether a word in a document is a keyword or not. Classification performance yields a G mean of a least 0.83, and weighted f-measure of 0.96 for both keywords and non-keywords. Precision for keywords alone, however, is not as high. To understand the basis for classifying keywords, C4.5 is used to extract rules from the ANN. The extracted rules from C4.5, in the form of a decision tree, show the relative importance of the different document features that were extracted.
Keywords :
decision trees; feature extraction; neural nets; pattern classification; text analysis; ANN training; artificial neural networks; automatic high quality keyword extraction; bold-faced word; decision tree; digital libraries; document feature extraction; document semantic analysis; document tagging; document word distribution; in-document word features; italicized word; key-phrases; keyword classification; knowledge management system; large-font size; local word features; text documents; text information retrieval system; weighted f-measure; word formatting features; word frequency; Abstracts; Artificial neural networks; Feature extraction; Tagging; Training; Vectors; artificial neural network; document tagging; feature selection; keyword extraction; scientific documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
Type :
conf
DOI :
10.1109/IJCNN.2014.6889456
Filename :
6889456
Link To Document :
بازگشت