Title :
TCBPLK: A New Method of Text Categorization
Author_Institution :
Beijing Univ., Beijing
Abstract :
This paper presents a new text categorization method based on P-L theory and Kohonen network, which called TCBPLK method. The Kohonen network is applied to realizing text categorization, which has a defect of too slowly speed of training. To text vector of high dimension, the defect is more obvious. Even the result of text categorization can not be acquired. The new method establishes vector space model of term weight by the theory of P-L, which enhances the function of the words from the viewpoint of categorization effect, and decreases the dimension of vector through eliminating redundant features. Experimental results confirm that TCBPLK method decreases the number of vector, and enhances the generalization and precision of text categorization.
Keywords :
text analysis; Kohonen network; P-L theory; text categorization; vector space model; Cybernetics; Functional analysis; Information analysis; Information retrieval; Learning systems; Machine learning; Matrix decomposition; Pattern analysis; Text categorization; Vocabulary; Kohonen network; P-L theory; Text categorization; Vector space model;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370825