• DocumentCode
    2775695
  • Title

    SISC: A Text Classification Approach Using Semi Supervised Subspace Clustering

  • Author

    Ahmed, Mohammad Salim ; Khan, Latifur

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Dallas, Dallas, TX, USA
  • fYear
    2009
  • fDate
    6-6 Dec. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this paper, we propose semi-supervised impurity based subspace clustering (SISC) in conjunction with k-nearest neighbor approach, based on semi-supervised subspace clustering that considers the high dimensionality as well as the sparse nature of them in text data. SISC finds clusters in the subspaces of the high dimensional text data where each text document has fuzzy cluster membership. This fuzzy clustering exploits two factors - chi square statistic of the dimensions and the impurity measure within each cluster. Empirical evaluation on real world data sets reveals the effectiveness of our approach as it significantly outperforms other state-of-the-art text classification and subspace clustering algorithms.
  • Keywords
    learning (artificial intelligence); text analysis; factors chi square statistic; fuzzy cluster membership; high dimensional text data; k-nearest neighbor approach; semisupervised impurity based subspace clustering; text classification approach; Availability; Clustering algorithms; Computer science; Conferences; Data mining; Impurities; Labeling; Statistics; Testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4244-5384-9
  • Electronic_ISBN
    978-0-7695-3902-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2009.61
  • Filename
    5360537