Title :
Keyword Extraction from Documents Using a Neural Network Model
Author :
Jo, Taeho ; Lee, Malrey ; Gatton, Thomas M.
Author_Institution :
University of Ottawa, 800 King Edward
Abstract :
A document surrogate is usually represented in a list of words. Because not all words in a document reflect its content, it is necessary to select important words from the document that relate to its content. Such important words are called keywords and are selected with a particular equation based on Term Frequency (TF) and Inverted Document Frequency (IDF). Additionally, the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicated to be applied to the selection of keywords. This paper proposes a neural network back propagation model in which these factors are used as the features and feature vectors are generated to select keywords. This paper will show that the proposed neural network backpropagation approach outperforms the equation in distinguishing keywords.
Keywords :
Data mining; Equations; Frequency; Indexing; Information retrieval; Information technology; Natural languages; Neural networks; Text categorization; Text mining; keyword extraction; neural networks.;
Conference_Titel :
Hybrid Information Technology, 2006. ICHIT '06. International Conference on
Conference_Location :
Cheju Island
Print_ISBN :
0-7695-2674-8
DOI :
10.1109/ICHIT.2006.253612