Title :
Keyword extraction using backpropagation neural networks and rule extraction
Author :
Azcarraga, Arnulfo ; Liu, Michael David ; Setiono, Rudy
Author_Institution :
Coll. of Comput. Studies, De La Salle Univ., Manila, Philippines
Abstract :
Keyword extraction is vital for Knowledge Management System, Information Retrieval System, and Digital Libraries as well as for general browsing of the web. Keywords are often the basis of document processing methods such as clustering and retrieval since processing all the words in the document can be slow. Common models for automating the process of keyword extraction are usually done by using several statistics-based methods such as Bayesian, K-Nearest Neighbor, and Expectation-Maximization. These models are limited by word-related features that can be used since adding more features will make the models more complex and difficult to comprehend. In this research, a Neural Network, specifically a backpropagation network, will be used in generalizing the relationship of the title and the content of articles in the archive by following word features other than TF-IDF, such as position of word in the sentence, paragraph, or in the entire document, and formats such as heading, and other attributes defined beforehand. In order to explain how the backpropagation network works, a rule extraction method will be used to extract symbolic data from the resulting backpropagation network. The rules extracted can then be transformed into decision trees performing almost as accurate as the network plus the benefit of being in an easily comprehensible format.
Keywords :
backpropagation; digital libraries; document handling; feature extraction; information retrieval; knowledge management; neural nets; statistical analysis; TF-IDF; Web browsing; backpropagation network; backpropagation networks; decision trees; digital libraries; document processing methods; information retrieval system; keyword extraction; knowledge management system; neural network; process automation; rule extraction method; statistics-based methods; symbolic data extraction; word-related features; Abstracts; Accuracy; Backpropagation; Data mining; Feature extraction; Neural networks; Training; Backpropagation; document analysis; keyword extraction; rule extraction;
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
DOI :
10.1109/IJCNN.2012.6252618