Title :
Text feature extraction based on joint conditional entropy
Author :
Yanmin Chen ; Xinwei Wang
Author_Institution :
Dept. of Comput. Sci. & Technol., East China Normal Univ., Shanghai, China
Abstract :
It is an important task for data mining and summarizing to extracting features of data. The task of extracting text feature is to extract useful information from texts with identifying and exploring interested patterns. We propose a strategy to extracting feature based on joint conditional entropy and genetic algorithm. Joint conditional entropy is the uncertainty measure of a set of variables given conditions. It is used to get the feature words which represent texts. Genetic algorithm has been applied successfully in many fields. The algorithm is useful for obtaining solutions of optimizing search problems. In this paper, we firstly preprocess texts in order to get the words, then, present the joint conditional entropy which can be applied to define the fitness function of genetic algorithm for discovering proper words which can represent texts. Finally, experimental result shows that this approach is suitable for extracting ideal features of text.
Keywords :
data mining; entropy; feature extraction; genetic algorithms; text analysis; data mining; fitness function; genetic algorithm; joint conditional entropy; text feature extraction; useful information; genetic algorithm; joint condition entropy; text mining;
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
Conference_Location :
Changchun
Print_ISBN :
978-1-4673-2963-7
DOI :
10.1109/ICCSNT.2012.6526323