Title :
A discriminant based document analysis for text classification
Author :
Lin, Yi-Xian ; Chien, Been-Chian
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Univ. of Tainan, Tainan, Taiwan
Abstract :
Text classification technologies rely heavily on the distribution of features, and the selection of discriminant features with regards to the classes as the main basis for classification. In this paper, we propose the discriminant coefficient to represent the features of a document. Based on the discriminant coefficient, the classification coefficient for each document class is defined and computed. Then, a correlation measure approach is designed for text classification. The experimental results show that the proposed approach of document analysis has good effectiveness in comparison with the method of TF-IDF with cosine similarity for a single class text classification. Especially, as a document set with nearly equivalent number of documents for each class, the proposed approach can achieve better results than the traditional vector based methods.
Keywords :
document handling; pattern classification; TF-IDF method; correlation measure approach; cosine similarity; discriminant based document analysis; discriminant coefficient; term frequency-inverse document frequency; text classification; Classification algorithms; Correlation; Frequency measurement; Support vector machine classification; Text categorization; Training; classification coefficient; correlation measure; discriminant coefficient; document analysis; text classification;
Conference_Titel :
Computer Symposium (ICS), 2010 International
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-7639-8
DOI :
10.1109/COMPSYM.2010.5685442