DocumentCode :
1797707
Title :
Classifying web documents using term spectral transforms and Multi-Dimensional Latent Semantic representation
Author :
Haijun Zhang ; Shifu Bie ; Bin Luo
Author_Institution :
Dept. of Comput. Sci., Harbin Inst. of Technol., Shenzhen, China
fYear :
2014
fDate :
6-11 July 2014
Firstpage :
1320
Lastpage :
1327
Abstract :
This research investigates the potential of document semantic representation considering both term frequencies and term associations. In particular, we proposed a general framework of the use of term spectra to represent term spatial distributions and associations through a document. The term spectra we explored involved the use of three typical techniques: Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), and Discrete Wavelet Transform (DWT). A term affinity graph was established to represent each document. We then employed a new document analysis method (recently developed by authors), named Multi-Dimensional Latent Semantic Analysis (MDLSA), which enables us to formulate an efficient semantic representation of a document based on the term affinity graph. Our algorithm was examined in the application of Web document classification. Experimental results demonstrate that the proposed technique not only gains much computational efficiency compared to Direct Graph Matching (DGM), but also outperforms the state-of-art algorithms such as VSM, PCA, RAP, and MLM.
Keywords :
Internet; discrete Fourier transforms; discrete cosine transforms; discrete wavelet transforms; document handling; graph theory; natural language processing; pattern classification; DCT; DFT; DGM; DWT; Web document classification; direct graph matching; discrete Fourier transform; discrete cosine transform; discrete wavelet transform; document analysis method; document semantic representation; multidimensional latent semantic analysis; multidimensional latent semantic representation; term affinity graph; term associations; term frequencies; term spatial distributions; term spectra; term spectral transforms; Accuracy; Discrete Fourier transforms; Discrete wavelet transforms; Principal component analysis; Semantics; Vectors; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
Type :
conf
DOI :
10.1109/IJCNN.2014.6889582
Filename :
6889582
Link To Document :
بازگشت