DocumentCode :
2029550
Title :
A discriminant based document analysis for text classification
Author :
Lin, Yi-Xian ; Chien, Been-Chian
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Univ. of Tainan, Tainan, Taiwan
fYear :
2010
fDate :
16-18 Dec. 2010
Firstpage :
594
Lastpage :
599
Abstract :
Text classification technologies rely heavily on the distribution of features, and the selection of discriminant features with regards to the classes as the main basis for classification. In this paper, we propose the discriminant coefficient to represent the features of a document. Based on the discriminant coefficient, the classification coefficient for each document class is defined and computed. Then, a correlation measure approach is designed for text classification. The experimental results show that the proposed approach of document analysis has good effectiveness in comparison with the method of TF-IDF with cosine similarity for a single class text classification. Especially, as a document set with nearly equivalent number of documents for each class, the proposed approach can achieve better results than the traditional vector based methods.
Keywords :
document handling; pattern classification; TF-IDF method; correlation measure approach; cosine similarity; discriminant based document analysis; discriminant coefficient; term frequency-inverse document frequency; text classification; Classification algorithms; Correlation; Frequency measurement; Support vector machine classification; Text categorization; Training; classification coefficient; correlation measure; discriminant coefficient; document analysis; text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Symposium (ICS), 2010 International
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-7639-8
Type :
conf
DOI :
10.1109/COMPSYM.2010.5685442
Filename :
5685442
Link To Document :
بازگشت