• DocumentCode
    1772663
  • Title

    A hybrid feature selection algorithm for web document clustering

  • Author

    Benghabrit, Asmaa ; Ouhbi, Brahim ; Zemmouri, El Moukhtar ; Frikh, Bouchra ; Behja, Hicham

  • Author_Institution
    LM2I Lab., Moulay Ismail Univ., Meknès, Morocco
  • fYear
    2014
  • fDate
    28-30 May 2014
  • Firstpage
    216
  • Lastpage
    222
  • Abstract
    Knowing that not all the features in a dataset are important since some are redundant or irrelevant, the use of feature selection, an effective dimensionality reduction technique, is essential for web document clustering. For the clustering process, it represents the task of selecting important features for the underlying clusters. Therefore in order to pilot the web document clustering process, we propose a hybrid feature selection algorithm that selects simultaneously the most statistical and semantic informative features through a weighting model. The clustering process selects relevant features and performs document clustering iteratively until stability. The experimental results demonstrate the practical aspects of our algorithm and show that it generates more efficient clustering than the one obtained by other existing algorithms.
  • Keywords
    Internet; feature selection; stability; statistical analysis; Web document clustering process; hybrid feature selection algorithm; semantic informative features; stability; statistical informative features; weighting model; Algorithm design and analysis; Clustering algorithms; Convergence; Feature extraction; Mutual information; Semantics; Vectors; Clustering; Feature selection methods; Performance analysis; Statistical and semantic analysis; Web documents;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Next Generation Networks and Services (NGNS), 2014 Fifth International Conference on
  • Conference_Location
    Casablanca
  • Print_ISBN
    978-1-4799-6608-0
  • Type

    conf

  • DOI
    10.1109/NGNS.2014.6990255
  • Filename
    6990255