Title :
Constructing features for document classification by using temporal patterns of term usages
Author :
Abe, Hidenao ; Tsumoto, Shusaku
Author_Institution :
Dept. of Med. Inf., Shimane Univ., Izumo, Japan
Abstract :
In document classification method by using appeared words as features, it is important to determine keywords for the features to characterize each document. However, conventional methods select the keywords based on their frequency or/and particular importance index such as tf-idf, and cut-off the other appeared words by using a threshold value. This omits remaining information such as rare combinations of the appeared words and time dependent differences of their usages. In this paper, we present the availability of the features based on temporal patterns of the overall words and phrases for temporally published documents in one domain. Thus, the documents are characterized by the temporal patterns of one or more importance indices for considering temporal differences of the overall term usages. In the experiment, we compare document classification results of two sets of bibliographical documents on the time dependency by using the two types of the feature set. For an exploratory class labels, we show the availability for obtaining classification rules that mention the relationship between the class and the important temporal patterns for the prediction.
Keywords :
classification; document handling; appeared words; bibliographical documents; classification rules; document classification; temporal patterns; temporally published documents; term usages; threshold value; time dependency; Availability; Cognition; Frequency conversion; Indexes; Learning systems; Machine learning; Vectors; Document Classification; Feature Construction; Temporal Clustering; Text Mining;
Conference_Titel :
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4577-0372-0
DOI :
10.1109/GRC.2011.6122563