DocumentCode :
1896376
Title :
A Principle Component Analysis Based Method to Normalize Term Weights
Author :
Xia, Tian ; Chai, Yanmei
Author_Institution :
Dept. of Comput. & Inf. Sci., Shanghai Second Polytech. Univ., Shanghai, China
fYear :
2010
fDate :
25-26 Dec. 2010
Firstpage :
1
Lastpage :
4
Abstract :
Term Weighting is a significant step in Document formalization in Natural Language Processing. It greatly interferes the accuracy of natural language processing systems. Term weight consists of three parts: Global Term Weight, Local Term Weight and standardization factor. Many term weight algorithms have been presented to address each part. And currently, the final term weight is the product of multiple term weight algorithms. However, the results of different term weight algorithms are correlated to each other, which indicates the redundant overlapped information between them. Simply multiplying the results leads to inaccurate final term weighting. This paper puts forward a Principle Component Analysis based Term Weights Normalizing Method, which is able to remove the redundant overlapped information and come up with a more accurate final term weight.
Keywords :
document handling; natural language processing; principal component analysis; document formalization; global term weight; local term weight; natural language processing; principle component analysis; standardization factor; term weights normalizing method; Algorithm design and analysis; Correlation; Covariance matrix; Equations; Mathematical model; Natural language processing; Principal component analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Engineering and Computer Science (ICIECS), 2010 2nd International Conference on
Conference_Location :
Wuhan
ISSN :
2156-7379
Print_ISBN :
978-1-4244-7939-9
Electronic_ISBN :
2156-7379
Type :
conf
DOI :
10.1109/ICIECS.2010.5678139
Filename :
5678139
Link To Document :
بازگشت