DocumentCode :
2828281
Title :
Documents Clustering Based on Optimized Compressibility Vector Space
Author :
Zhang, Nuo ; Watanabe, Toshinori
Author_Institution :
Grad. Sch. of Inf. Syst., Univ. of Electro-Commun., Chofu, Japan
fYear :
2009
fDate :
11-13 Dec. 2009
Firstpage :
1
Lastpage :
5
Abstract :
To access and store large-scale electrical documents becomes possible due to the high performance of computer hardware and broadband accessible network. In order to handle these increasing number of documents properly, a efficient document representation model is as important as the classification algorithms. Several text representation methods, such as bag-of-words and N-gram models, have been widely used. Another representation approach named pattern representation scheme using data compression (PRDC) has been proposed lately. It does not only independently process data of linguistic text, but also processes multimedia data effectively. In this study, we will propose a method to improve PRDC approach and compare it with the two aforementioned methods. The performances will be compared in terms of clustering ability. Experiment results will show that the proposed method can provide better performance than that of the other two methods and also the PRDC.
Keywords :
computational linguistics; data compression; multimedia systems; pattern classification; pattern clustering; text analysis; broadband accessible network; classification algorithms; computer hardware; data compression; documents clustering; electrical documents; linguistic text; multimedia data; optimized compressibility vector space; pattern representation; text representation; Classification algorithms; Computer networks; Data compression; Hardware; High performance computing; Information management; Information retrieval; Information systems; Large-scale systems; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
Type :
conf
DOI :
10.1109/CISE.2009.5363976
Filename :
5363976
Link To Document :
بازگشت