DocumentCode
466015
Title
Index Words Selection with ICA
Author
Yokoi, Takeru ; Yanagimoto, Hidekazu ; Omatu, Sigeru
Author_Institution
Osaka Prefecture Univ., Osaka
Volume
4
fYear
2006
fDate
8-11 Oct. 2006
Firstpage
3348
Lastpage
3353
Abstract
We propose here a method to select index words for the construction of a document vector from a corpus using the independent component analysis (ICA). It is useful to select index words of a document vector since its dimension is large. The ICA is one of the methods in analyzing the latent semantics of documents. It is reported the independent components obtained by the ICA represent the topics in the documents. The words in the independent component are considered to be the key words of the topic. The proposed method selects the key words which have high weight in each independent component and adds them to a set of index words. In addition, we selected other words related to the key words according to the chi-squared measure between the co-occurrence of the key words and each word and the appearance of the key words, and have also added them to the set of index words. Finally, an evaluation of the index words obtained has been carried out.
Keywords
independent component analysis; indexing; information retrieval; ICA; chi-squared measure; document vector; independent component analysis; index word selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
Conference_Location
Taipei
Print_ISBN
1-4244-0099-6
Electronic_ISBN
1-4244-0100-3
Type
conf
DOI
10.1109/ICSMC.2006.384635
Filename
4274399
Link To Document