DocumentCode
1772663
Title
A hybrid feature selection algorithm for web document clustering
Author
Benghabrit, Asmaa ; Ouhbi, Brahim ; Zemmouri, El Moukhtar ; Frikh, Bouchra ; Behja, Hicham
Author_Institution
LM2I Lab., Moulay Ismail Univ., Meknès, Morocco
fYear
2014
fDate
28-30 May 2014
Firstpage
216
Lastpage
222
Abstract
Knowing that not all the features in a dataset are important since some are redundant or irrelevant, the use of feature selection, an effective dimensionality reduction technique, is essential for web document clustering. For the clustering process, it represents the task of selecting important features for the underlying clusters. Therefore in order to pilot the web document clustering process, we propose a hybrid feature selection algorithm that selects simultaneously the most statistical and semantic informative features through a weighting model. The clustering process selects relevant features and performs document clustering iteratively until stability. The experimental results demonstrate the practical aspects of our algorithm and show that it generates more efficient clustering than the one obtained by other existing algorithms.
Keywords
Internet; feature selection; stability; statistical analysis; Web document clustering process; hybrid feature selection algorithm; semantic informative features; stability; statistical informative features; weighting model; Algorithm design and analysis; Clustering algorithms; Convergence; Feature extraction; Mutual information; Semantics; Vectors; Clustering; Feature selection methods; Performance analysis; Statistical and semantic analysis; Web documents;
fLanguage
English
Publisher
ieee
Conference_Titel
Next Generation Networks and Services (NGNS), 2014 Fifth International Conference on
Conference_Location
Casablanca
Print_ISBN
978-1-4799-6608-0
Type
conf
DOI
10.1109/NGNS.2014.6990255
Filename
6990255
Link To Document