Title :
A Feature Selection Algorithm Based on Poisson Estimates
Author :
Gao, Yingfan ; Wang Hui-lin
Author_Institution :
Inst. of Sci. & Tech. Inf. of China, Beijing, China
Abstract :
Feature selection is one of the key technologies for text categorization. Currently, it mainly includes technologies based statistics which is primarily from information theory and technologies based semantics which covers natural language processing, semantic Web etc.. Based on Poisson hypothesis, this article presents a new method combining both and tries to find features in documents with more semantic information. The contrast experiments carried on the Reuters-21578 corpus with the IG, Chi2 and WN algorithms show that this method has more advantages than other algorithms.
Keywords :
natural language processing; stochastic processes; text analysis; Poisson estimates; Poisson hypothesis; Reuters-21578 corpus; feature selection algorithm; information theory; natural language processing; semantic Web; text categorization; Frequency shift keying; Fuzzy systems; H infinity control; Information theory; Mutual information; Natural language processing; Random variables; Semantic Web; Statistics; Text categorization; Feature selection from categories; Poisson Hypothesis; Semantic Feature;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
Conference_Location :
Tianjin
Print_ISBN :
978-0-7695-3735-1
DOI :
10.1109/FSKD.2009.712